Filter from a dataframe and create another dataframe using r
Filter from a dataframe and create another dataframe using r
I have two data frames as following:
df <- data.frame(month = c("mazda", "yamaha"),
april = c(11,12),
may = c(14,15))
df_whole <- data.frame(month = c("car", "bikes"),
april = c(.1,.2),
may = c(.5,.2))
What I want to do is create a row called total_car which would be a product of car and mazda. However if df_whole doesnt have row car I would still like to create a row total_car which would mazda*0.The output I want to get to is as below. How could I calculate something like this in R
total_car
car
mazda
df_whole
car
total_car
mazda*0
df_car <- data.frame(month = c("mazda", "yamaha","total"),
april = c(11,12,1.1),
may = c(14,15,7))
mazda
car
bikes
bikes
df_whole
df_whole
car
df_whole
I want to show three tables one is
df ,second being df_whole and third being df_car .The dataframedf_whole is being derived from some other table which always might not have the row car. So if car doesnt exist I would still like to create df_whole and multiply mazda with 0 . I would like filter df_whole %>% filter(df_whole$month =="car")– SNT
Sep 17 '18 at 21:47
df
df_whole
df_car
df_whole
car
car
mazda
0
df_whole %>% filter(df_whole$month =="car")
1 Answer
1
It sounds like a sql query would work better to join your tables and get the summary data.
Here's a tidyverse solution (note the inner_join(), which is sql):
tidyverse
inner_join()
library(dplyr)
library(tidyr)
df <- data_frame(month = c("mazda", "yamaha"),
april = c(11,12),
may = c(14,15))
df_whole <- data_frame(month = c("car", "bikes"),
april = c(.1,.2),
may = c(.5,.2))
# adds car row if missing - there is probably a better way to handle the missing data
if (!"car" %in% df_whole$month) df_whole <- bind_rows(df_whole, data_frame(month = "car"))
# converts NAs to 0
df_whole[is.na(df_whole)] <- 0
# convert to long format
df_long <- df %>%
rename("brand" = "month") %>%
gather(key = "month", value = "val", april:may)
df_whole_long <- df_whole %>%
rename("type" = "month") %>%
gather(key = "month", value = "val", april:may)
# calcualte the multiplication
dat <- inner_join(df_long, df_whole_long, by = "month") %>% # combine dfs
filter(brand == "mazda", type == "car") %>% # filter out key rows
mutate(total = val.x * val.y) # do multiplication and add into new column
# reformat and append
df_car <- bind_rows(df,
bind_cols(data_frame(month = "total"),
dat %>%
select(month, total) %>%
spread(month, total)))
df_car is:
month april may
1 mazda 11.0 14
2 yamaha 12.0 15
3 total 1.1 7
I used tidyverse data frames (data_frame() rather than data.frame()) so strings weren't converted to factors (the same can be achieved by adding stringsAsFactors = F to data.frame()) and I converted your data to long format, which would probably be a better structure for your database anyway.
tidyverse
data_frame()
data.frame()
stringsAsFactors = F
data.frame()
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you agree to our terms of service, privacy policy and cookie policy
How do I know that
mazdais multiplied bycarand not bybikes? What happens to thebikesrow ofdf_whole? What do you mean by "ifdf_wholedoesnt have rowcar"? What's the general format ofdf_whole?– Maurits Evers
Sep 17 '18 at 21:21