Filter from a dataframe and create another dataframe using r
Filter from a dataframe and create another dataframe using r
I have two data frames as following:
df <- data.frame(month = c("mazda", "yamaha"),
april = c(11,12),
may = c(14,15))
df_whole <- data.frame(month = c("car", "bikes"),
april = c(.1,.2),
may = c(.5,.2))
What I want to do is create a row called total_car
which would be a product of car
and mazda
. However if df_whole
doesnt have row car
I would still like to create a row total_car
which would mazda*0
.The output I want to get to is as below. How could I calculate something like this in R
total_car
car
mazda
df_whole
car
total_car
mazda*0
df_car <- data.frame(month = c("mazda", "yamaha","total"),
april = c(11,12,1.1),
may = c(14,15,7))
mazda
car
bikes
bikes
df_whole
df_whole
car
df_whole
I want to show three tables one is
df
,second being df_whole
and third being df_car
.The dataframedf_whole
is being derived from some other table which always might not have the row car
. So if car
doesnt exist I would still like to create df_whole and multiply mazda
with 0
. I would like filter df_whole %>% filter(df_whole$month =="car")
– SNT
Sep 17 '18 at 21:47
df
df_whole
df_car
df_whole
car
car
mazda
0
df_whole %>% filter(df_whole$month =="car")
1 Answer
1
It sounds like a sql query would work better to join your tables and get the summary data.
Here's a tidyverse
solution (note the inner_join()
, which is sql):
tidyverse
inner_join()
library(dplyr)
library(tidyr)
df <- data_frame(month = c("mazda", "yamaha"),
april = c(11,12),
may = c(14,15))
df_whole <- data_frame(month = c("car", "bikes"),
april = c(.1,.2),
may = c(.5,.2))
# adds car row if missing - there is probably a better way to handle the missing data
if (!"car" %in% df_whole$month) df_whole <- bind_rows(df_whole, data_frame(month = "car"))
# converts NAs to 0
df_whole[is.na(df_whole)] <- 0
# convert to long format
df_long <- df %>%
rename("brand" = "month") %>%
gather(key = "month", value = "val", april:may)
df_whole_long <- df_whole %>%
rename("type" = "month") %>%
gather(key = "month", value = "val", april:may)
# calcualte the multiplication
dat <- inner_join(df_long, df_whole_long, by = "month") %>% # combine dfs
filter(brand == "mazda", type == "car") %>% # filter out key rows
mutate(total = val.x * val.y) # do multiplication and add into new column
# reformat and append
df_car <- bind_rows(df,
bind_cols(data_frame(month = "total"),
dat %>%
select(month, total) %>%
spread(month, total)))
df_car is:
month april may
1 mazda 11.0 14
2 yamaha 12.0 15
3 total 1.1 7
I used tidyverse
data frames (data_frame()
rather than data.frame()
) so strings weren't converted to factors (the same can be achieved by adding stringsAsFactors = F
to data.frame()
) and I converted your data to long format, which would probably be a better structure for your database anyway.
tidyverse
data_frame()
data.frame()
stringsAsFactors = F
data.frame()
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you agree to our terms of service, privacy policy and cookie policy
How do I know that
mazda
is multiplied bycar
and not bybikes
? What happens to thebikes
row ofdf_whole
? What do you mean by "ifdf_whole
doesnt have rowcar
"? What's the general format ofdf_whole
?– Maurits Evers
Sep 17 '18 at 21:21