Store first non-missing value in a new column

Ciao, I have several columns that represents scores. For each STUDENT I want to take the first non-NA score and store it in a new column called TEST.

Here is my replicating example. This is the data I have now:

df <- data.frame(STUDENT=c(1,2,3,4,5), CLASS=c(90,91,92,93,95), SCORE1=c(10,NA,NA,NA,NA), SCORE2=c(2,NA,8,NA,NA), SCORE3=c(9,6,6,NA,NA), SCORE4=c(NA,7,5,1,9), ROOM=c(01,02, 03, 04, 05))

This is the column I am aiming to add:

df$FIRST <- c(10,6,8,1,9)

This is my attempt:

df$FIRSTGUESS <- max.col(!is.na(df[3:6]), "first")

4 Answers
4

This is exactly what coalesce from package dplyr does. As described in its documentation:

coalesce

dplyr

Given a set of vectors, coalesce() finds the first non-missing value
at each position.

Therefore, you can simplify do:

library(dplyr) df$FIRST <- do.call(coalesce, df[grepl('SCORE', names(df))])

This is the result:

> df STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM FIRST 1 1 90 10 2 9 NA 1 10 2 2 91 NA NA 6 7 2 6 3 3 92 NA 8 6 5 3 8 4 4 93 NA NA NA 1 4 1 5 5 95 NA NA NA 9 5 9

You can do this with apply and which.min(is.na(...))

apply

which.min(is.na(...))

df$FIRSTGUESS <- apply(df[, grep("^SCORE", names(df))], 1, function(x) x[which.min(is.na(x))]) df # STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM FIRSTGUESS #1 1 90 10 2 9 NA 1 10 #2 2 91 NA NA 6 7 2 6 #3 3 92 NA 8 6 5 3 8 #4 4 93 NA NA NA 1 4 1 #5 5 95 NA NA NA 9 5 9

Note that we need is.na instead of !is.na because FALSE corresponds to 0 and we want to return the first (which.min) FALSE value.

is.na

!is.na

FALSE

0

which.min

FALSE

Unfortunately, max.col gives indices of max values and not the values itself. However, we can subset the values from the original dataframe using the mapply call.

max.col

mapply

#Select only columns which has "SCORE" in it sub_df <- df[grepl("SCORE", names(df))] #Get the first non-NA value by row inds <- max.col(!is.na(sub_df), ties.method = "first") #Get the inds value by row df$FIRSTGUESS <- mapply(function(x, y) sub_df[x,y], 1:nrow(sub_df), inds) df # STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM FIRST FIRSTGUESS #1 1 90 10 2 9 NA 1 10 10 #2 2 91 NA NA 6 7 2 6 6 #3 3 92 NA 8 6 5 3 8 8 #4 4 93 NA NA NA 1 4 1 1 #5 5 95 NA NA NA 9 5 9 9

Just borrowing your setting up :-)

– Wen-Ben
Sep 13 '18 at 2:16

Using zoo,na.locf, borrowing the setting up of sub_df from Ronak

zoo

na.locf

sub_df

df['New']=zoo::na.locf(t(sub_df),fromLast=T)[1,] df STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM New 1 1 90 10 2 9 NA 1 10 2 2 91 NA NA 6 7 2 6 3 3 92 NA 8 6 5 3 8 4 4 93 NA NA NA 1 4 1 5 5 95 NA NA NA 9 5 9

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt