Store first non-missing value in a new column
Store first non-missing value in a new column
Ciao, I have several columns that represents scores. For each STUDENT I want to take the first non-NA score and store it in a new column called TEST.
Here is my replicating example. This is the data I have now:
df <- data.frame(STUDENT=c(1,2,3,4,5),
CLASS=c(90,91,92,93,95),
SCORE1=c(10,NA,NA,NA,NA),
SCORE2=c(2,NA,8,NA,NA),
SCORE3=c(9,6,6,NA,NA),
SCORE4=c(NA,7,5,1,9),
ROOM=c(01,02, 03, 04, 05))
This is the column I am aiming to add:
df$FIRST <- c(10,6,8,1,9)
This is my attempt:
df$FIRSTGUESS <- max.col(!is.na(df[3:6]), "first")
4 Answers
4
This is exactly what coalesce
from package dplyr
does. As described in its documentation:
coalesce
dplyr
Given a set of vectors, coalesce() finds the first non-missing value
at each position.
Therefore, you can simplify do:
library(dplyr)
df$FIRST <- do.call(coalesce, df[grepl('SCORE', names(df))])
This is the result:
> df
STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM FIRST
1 1 90 10 2 9 NA 1 10
2 2 91 NA NA 6 7 2 6
3 3 92 NA 8 6 5 3 8
4 4 93 NA NA NA 1 4 1
5 5 95 NA NA NA 9 5 9
You can do this with apply
and which.min(is.na(...))
apply
which.min(is.na(...))
df$FIRSTGUESS <- apply(df[, grep("^SCORE", names(df))], 1, function(x)
x[which.min(is.na(x))])
df
# STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM FIRSTGUESS
#1 1 90 10 2 9 NA 1 10
#2 2 91 NA NA 6 7 2 6
#3 3 92 NA 8 6 5 3 8
#4 4 93 NA NA NA 1 4 1
#5 5 95 NA NA NA 9 5 9
Note that we need is.na
instead of !is.na
because FALSE
corresponds to 0
and we want to return the first (which.min
) FALSE
value.
is.na
!is.na
FALSE
0
which.min
FALSE
Unfortunately, max.col
gives indices of max values and not the values itself. However, we can subset the values from the original dataframe using the mapply
call.
max.col
mapply
#Select only columns which has "SCORE" in it
sub_df <- df[grepl("SCORE", names(df))]
#Get the first non-NA value by row
inds <- max.col(!is.na(sub_df), ties.method = "first")
#Get the inds value by row
df$FIRSTGUESS <- mapply(function(x, y) sub_df[x,y], 1:nrow(sub_df), inds)
df
# STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM FIRST FIRSTGUESS
#1 1 90 10 2 9 NA 1 10 10
#2 2 91 NA NA 6 7 2 6 6
#3 3 92 NA 8 6 5 3 8 8
#4 4 93 NA NA NA 1 4 1 1
#5 5 95 NA NA NA 9 5 9 9
Using zoo
,na.locf
, borrowing the setting up of sub_df
from Ronak
zoo
na.locf
sub_df
df['New']=zoo::na.locf(t(sub_df),fromLast=T)[1,]
df
STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM New
1 1 90 10 2 9 NA 1 10
2 2 91 NA NA 6 7 2 6
3 3 92 NA 8 6 5 3 8
4 4 93 NA NA NA 1 4 1
5 5 95 NA NA NA 9 5 9
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Just borrowing your setting up :-)
– Wen-Ben
Sep 13 '18 at 2:16