Store first non-missing value in a new column

Store first non-missing value in a new column



Ciao, I have several columns that represents scores. For each STUDENT I want to take the first non-NA score and store it in a new column called TEST.



Here is my replicating example. This is the data I have now:


df <- data.frame(STUDENT=c(1,2,3,4,5),
CLASS=c(90,91,92,93,95),
SCORE1=c(10,NA,NA,NA,NA),
SCORE2=c(2,NA,8,NA,NA),
SCORE3=c(9,6,6,NA,NA),
SCORE4=c(NA,7,5,1,9),
ROOM=c(01,02, 03, 04, 05))



This is the column I am aiming to add:


df$FIRST <- c(10,6,8,1,9)



This is my attempt:


df$FIRSTGUESS <- max.col(!is.na(df[3:6]), "first")




4 Answers
4



This is exactly what coalesce from package dplyr does. As described in its documentation:


coalesce


dplyr



Given a set of vectors, coalesce() finds the first non-missing value
at each position.



Therefore, you can simplify do:


library(dplyr)
df$FIRST <- do.call(coalesce, df[grepl('SCORE', names(df))])



This is the result:


> df
STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM FIRST
1 1 90 10 2 9 NA 1 10
2 2 91 NA NA 6 7 2 6
3 3 92 NA 8 6 5 3 8
4 4 93 NA NA NA 1 4 1
5 5 95 NA NA NA 9 5 9



You can do this with apply and which.min(is.na(...))


apply


which.min(is.na(...))


df$FIRSTGUESS <- apply(df[, grep("^SCORE", names(df))], 1, function(x)
x[which.min(is.na(x))])
df
# STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM FIRSTGUESS
#1 1 90 10 2 9 NA 1 10
#2 2 91 NA NA 6 7 2 6
#3 3 92 NA 8 6 5 3 8
#4 4 93 NA NA NA 1 4 1
#5 5 95 NA NA NA 9 5 9



Note that we need is.na instead of !is.na because FALSE corresponds to 0 and we want to return the first (which.min) FALSE value.


is.na


!is.na


FALSE


0


which.min


FALSE



Unfortunately, max.col gives indices of max values and not the values itself. However, we can subset the values from the original dataframe using the mapply call.


max.col


mapply


#Select only columns which has "SCORE" in it
sub_df <- df[grepl("SCORE", names(df))]

#Get the first non-NA value by row
inds <- max.col(!is.na(sub_df), ties.method = "first")

#Get the inds value by row
df$FIRSTGUESS <- mapply(function(x, y) sub_df[x,y], 1:nrow(sub_df), inds)

df

# STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM FIRST FIRSTGUESS
#1 1 90 10 2 9 NA 1 10 10
#2 2 91 NA NA 6 7 2 6 6
#3 3 92 NA 8 6 5 3 8 8
#4 4 93 NA NA NA 1 4 1 1
#5 5 95 NA NA NA 9 5 9 9






Just borrowing your setting up :-)

– Wen-Ben
Sep 13 '18 at 2:16




Using zoo,na.locf, borrowing the setting up of sub_df from Ronak


zoo


na.locf


sub_df


df['New']=zoo::na.locf(t(sub_df),fromLast=T)[1,]
df
STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM New
1 1 90 10 2 9 NA 1 10
2 2 91 NA NA 6 7 2 6
3 3 92 NA 8 6 5 3 8
4 4 93 NA NA NA 1 4 1
5 5 95 NA NA NA 9 5 9



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)