labelling rows in one data set based on the date of the measurement compared to two other dates in another dataset

labelling rows in one data set based on the date of the measurement compared to two other dates in another dataset


library(data.table)
testset <- data.table(date=as.Date(c("2013-07-02","2013-08-03","2013-09-04",
"2013-10-05","2013-11-06")),
yr = c(2013,2013,2013,2013,2013),
mo = c(07,08,09,10,11),
da = c(02,03,04,05,06),
plant = LETTERS[1:5],
product = as.factor(letters[26:22]),
rating = runif(25))



I have this dataset that for each row I want to create a category or naming on that row depending on the date column. I want to compare this date with dates in another dataset:


library(lubridate)
splitDates <- ymd(c("2013-06-10", "2013-08-15", "2013-10-06"))



Using splitDates I want to evaluate which value in splitDates came last before the measurement was taken. (If you imagine that a new experiment took place from 2013-06-10 and until but not including 2013-08-15, I want to decide what experiment a measurement belongs to).



As I can see the first five rows in this new column should look like this:


NewColumn <- c("2013-06-10", "2013-06-10", "2013-08-15", "2013-08-15", "2013-10-06")

date yr mo da plant product rating NewColumn
1: 2013-07-02 2013 7 2 A z 0.02522850 2013-06-10
2: 2013-08-03 2013 8 3 B y 0.28274066 2013-06-10
3: 2013-09-04 2013 9 4 C x 0.86314441 2013-08-15
4: 2013-10-05 2013 10 5 D w 0.01670862 2013-08-15
5: 2013-11-06 2013 11 6 E v 0.16034175 2013-10-06
...



I can't figure out how to do this.





if i understand correctly, the values always come from the splitDates
– Salman Lashkarara
Sep 4 '18 at 17:37


splitDates





testset[, v := splitDates[findInterval(date, splitDates)]] seems to work? Related: stackoverflow.com/q/15712826
– Frank
Sep 4 '18 at 18:06



testset[, v := splitDates[findInterval(date, splitDates)]]




3 Answers
3



Here's my take


library(dplyr)
dta <- data.frame(NewColumn=splitDates,newvar=1:3)
testset$newvar <- sapply(testset[,1], function(x) ifelse(x<splitDates[2],1,ifelse(x<splitDates[3],2,3)))
final_data <- semi_join(testset,dta,by="newvar")



Data:


testset <- data.table(date=as.Date(c("2013-07-02","2013-08-03","2013-09-04",
"2013-10-05","2013-11-06")),
yr = c(2013,2013,2013,2013,2013),
mo = c(07,08,09,10,11),
da = c(02,03,04,05,06),
plant = LETTERS[1:5],
product = as.factor(letters[26:22]),
rating = runif(25))

splitDates <- ymd(c("2013-06-10", "2013-08-15", "2013-10-06"))



For me, understanding your question was more difficult than solving it. Please review the answer, and give me a feedback. It has 3 steps:



make a function to return the latest date from the other dataset


findLatest<-function(date)which.min( abs( splitDates-date ))



Then call the function over all the dates in testset:


testset


names<-splitDates[ sapply(testset[,1], findLatest ) ]



Add the result to the dataset


testset$names<-names



So, the first 10 rows are:


date yr mo da plant product rating V8
1 2013-07-02 2013 7 2 A z 0.75801493 2013-06-10
2 2013-08-03 2013 8 3 B y 0.06370597 2013-08-15
3 2013-09-04 2013 9 4 C x 0.25375231 2013-08-15
4 2013-10-05 2013 10 5 D w 0.42900236 2013-10-06
5 2013-11-06 2013 11 6 E v 0.97613291 2013-10-06
6 2013-07-02 2013 7 2 A z 0.78094927 2013-06-10
7 2013-08-03 2013 8 3 B y 0.91312684 2013-08-15
8 2013-09-04 2013 9 4 C x 0.29345599 2013-08-15
9 2013-10-05 2013 10 5 D w 0.80870134 2013-10-06
10 2013-11-06 2013 11 6 E v 0.18735280 2013-10-06





I get the error names<-splitDates[ sapply(testset[,1], findLatest ) ] Advarselsbesked: I unclass(time1) - unclass(time2) : longer object length is not a multiple of shorter object length I like the logic behind the solution. However already in the second row of your output the V8 value is after the date, which shouldn't be possible considering V8 is "the initiation of the experiment and the date is the value of result of that experiment"
– Jakn09ab
Sep 5 '18 at 6:28





sorry what is the error on names<-splitDates[ sapply(testset[,1], findLatest ) ]? Did you load the findLatest beforehand?
– Salman Lashkarara
Sep 5 '18 at 6:58


names<-splitDates[ sapply(testset[,1], findLatest ) ]


findLatest





Yes I did. It just means warning in Danish.
– Jakn09ab
Sep 6 '18 at 6:23





ok, please upvote me if it was helpful. @Jakn09ab
– Salman Lashkarara
Sep 6 '18 at 6:31



I have to hand the answer to Frank, which commented on my first post.


testset[, v := splitDates[findInterval(date, splitDates)]]



does the trick.





I admire this solution as well. But, it is not good to use @Frank comment as a your own solution. Maybe you can make it as community.
– Salman Lashkarara
Sep 5 '18 at 7:07





Also, although it is short, i cannot fully understand how it works
– Salman Lashkarara
Sep 5 '18 at 7:09





me neither. I have not used as a solution, but hope Frank will come and claim it. I created a new entry to make it easier for others to find the answer.
– Jakn09ab
Sep 6 '18 at 10:07



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Some of your past answers have not been well-received, and you're in danger of being blocked from answering.



Please pay close attention to the following guidance:



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)