Filter from a dataframe and create another dataframe using r

Filter from a dataframe and create another dataframe using r



I have two data frames as following:


df <- data.frame(month = c("mazda", "yamaha"),
april = c(11,12),
may = c(14,15))

df_whole <- data.frame(month = c("car", "bikes"),
april = c(.1,.2),
may = c(.5,.2))



What I want to do is create a row called total_car which would be a product of car and mazda. However if df_whole doesnt have row car I would still like to create a row total_car which would mazda*0.The output I want to get to is as below. How could I calculate something like this in R


total_car


car


mazda


df_whole


car


total_car


mazda*0


df_car <- data.frame(month = c("mazda", "yamaha","total"),
april = c(11,12,1.1),
may = c(14,15,7))






How do I know that mazda is multiplied by car and not by bikes? What happens to the bikes row of df_whole? What do you mean by "if df_whole doesnt have row car"? What's the general format of df_whole?

– Maurits Evers
Sep 17 '18 at 21:21


mazda


car


bikes


bikes


df_whole


df_whole


car


df_whole






I want to show three tables one is df ,second being df_whole and third being df_car .The dataframedf_whole is being derived from some other table which always might not have the row car. So if car doesnt exist I would still like to create df_whole and multiply mazda with 0 . I would like filter df_whole %>% filter(df_whole$month =="car")

– SNT
Sep 17 '18 at 21:47



df


df_whole


df_car


df_whole


car


car


mazda


0


df_whole %>% filter(df_whole$month =="car")




1 Answer
1



It sounds like a sql query would work better to join your tables and get the summary data.



Here's a tidyverse solution (note the inner_join(), which is sql):


tidyverse


inner_join()


library(dplyr)
library(tidyr)

df <- data_frame(month = c("mazda", "yamaha"),
april = c(11,12),
may = c(14,15))

df_whole <- data_frame(month = c("car", "bikes"),
april = c(.1,.2),
may = c(.5,.2))

# adds car row if missing - there is probably a better way to handle the missing data
if (!"car" %in% df_whole$month) df_whole <- bind_rows(df_whole, data_frame(month = "car"))

# converts NAs to 0
df_whole[is.na(df_whole)] <- 0

# convert to long format
df_long <- df %>%
rename("brand" = "month") %>%
gather(key = "month", value = "val", april:may)

df_whole_long <- df_whole %>%
rename("type" = "month") %>%
gather(key = "month", value = "val", april:may)

# calcualte the multiplication
dat <- inner_join(df_long, df_whole_long, by = "month") %>% # combine dfs
filter(brand == "mazda", type == "car") %>% # filter out key rows
mutate(total = val.x * val.y) # do multiplication and add into new column

# reformat and append
df_car <- bind_rows(df,
bind_cols(data_frame(month = "total"),
dat %>%
select(month, total) %>%
spread(month, total)))



df_car is:


month april may
1 mazda 11.0 14
2 yamaha 12.0 15
3 total 1.1 7



I used tidyverse data frames (data_frame() rather than data.frame()) so strings weren't converted to factors (the same can be achieved by adding stringsAsFactors = F to data.frame()) and I converted your data to long format, which would probably be a better structure for your database anyway.


tidyverse


data_frame()


data.frame()


stringsAsFactors = F


data.frame()



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you agree to our terms of service, privacy policy and cookie policy

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)