remove rows in R data frame based on group and another data frame

remove rows in R data frame based on group and another data frame



Have two data frames


dat1 <- data.frame(group= c(11,11,12,12,13,13,14,14,15,15,16,16,17,17,17,18,18,18),name= c("A","B","C","D","E","F","G","H","I","J","A","B","E","F","W","A","B","V"))

dat2 <- data.frame(ID=c(1,1,2,2,3,3),name =c("A","B","E","F","X","Y"))



Second data frame have combinations of two values grouped by ID Column. And based on second data frame (dat2) ,need to remove rows in first data fame (dat1) if that particular group combination exists in dat2.



For example: If "A" and "B" both exists in dat1 and then it should get removed.



Thus, the desired output is


desiredat <- data.frame(group= c(12,12,13,13,15,15),name= c("C","D","G","H","I","J"))



Looking for ways in R to achieve the same.





check your desiredat I think there are some values missed
– Jilber Urbina
Aug 29 at 15:58


desiredat




2 Answers
2



This can be solved by an anti-join. However, we need to identify which group ids group have to be removed from dat1.


group


dat1


library(data.table)
# count names per ID
setDT(dat2)[, n.id := .N, by = ID]
# identify groups to remove by joining and ...
groups_to_remove <- dat2[setDT(dat1), on = "name", nomatch = 0L][
# ... check which groups have a match with the complete set of names
, which(n.id == .N), by = .(ID, group)]
# anti join
dat1[!groups_to_remove, on = "group"]


group name
1: 12 C
2: 12 D
3: 14 G
4: 14 H
5: 15 I
6: 15 J
7: 19 A
8: 19 X



Group 19 is not removed because names "A" and "X" belong to different IDs in dat2.


dat2



A more streamlined approach uses all() instead of counting the unique names:


all()


library(data.table)
setDT(dat1)
setDT(dat2)
groups_to_remove <- dat1[dat2, on = "name"][, which(all(ID == ID[1])), by = group]
dat1[!groups_to_remove, on = "group"]


group name
1: 12 C
2: 12 D
3: 14 G
4: 14 H
5: 15 I
6: 15 J
7: 19 A
8: 19 X



The same as above in dplyr syntax:


dplyr


library(dplyr)
dat2 %>%
left_join(dat1, by = "name") %>%
group_by(group) %>%
summarise(all_have_same_id = all(ID == ID[1L])) %>%
filter(all_have_same_id) %>%
anti_join(dat1, ., by = "group")


group name
1 12 C
2 12 D
3 14 G
4 14 H
5 15 I
6 15 J
7 19 A
8 19 X
Warning message:
Column `name` joining factors with different levels, coercing to character vector



The sample dataset dat1 provided by the OP consists of groups where either no name is included in dat2 or all names are in one ID of dat2 (perhaps plus an additional name) but it lacks a use case where only one name is contained in an ID of dat2. Therefore, I have added this use case (as group 19):


dat1


dat2


dat2


dat2


dat1 <- data.frame(
group= c(11,11,12,12,13,13,14,14,15,15,16,16,17,17,17,18,18,18,19,19),
name= c("A","B","C","D","E","F","G","H","I","J","A","B","E","F","W","A","B","V","A","X"))

dat2 <- data.frame(ID=c(1,1,2,2,3,3),name =c("A","B","E","F","X","Y"))





perfect .... Thank you!
– string
Aug 29 at 17:29



Something like this...?


dat1[dat1$name %in% setdiff(dat1$name, dat2$name), ]
3 12 C
4 12 D
7 14 G
8 14 H
9 15 I
10 15 J
15 17 W
18 18 V



Required, but never shown



Required, but never shown






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

How do I collapse sections of code in Visual Studio Code for Windows?

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ