how to match values in columns by group or within category (groupwise) with other column(having multiple values) in same dataframe in R
So, I have data table in R which looks like follows:
Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X
So the idea here is for every group (claim, failure and part) if 'any' of the code in 'code' matches with list of the codes in 'matchcode' column then than group should have Match column as True.
So expected output should be as following:
Claim failure Part Match
23 F1 P1 TRUE
23 F2 P2 FALSE
45 F1 P4 TRUE
45 F1 P1 FALSE
The size of this data.table is huge so will be needing a optimized solution.
please help :)
r data.table
add a comment |
So, I have data table in R which looks like follows:
Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X
So the idea here is for every group (claim, failure and part) if 'any' of the code in 'code' matches with list of the codes in 'matchcode' column then than group should have Match column as True.
So expected output should be as following:
Claim failure Part Match
23 F1 P1 TRUE
23 F2 P2 FALSE
45 F1 P4 TRUE
45 F1 P1 FALSE
The size of this data.table is huge so will be needing a optimized solution.
please help :)
r data.table
If you want to vectorize/optimize regex operation, always use the stringi package. That will be by far faster than any alternative (at Ieast I know of). For your purposes I would just dolibrary(stringi) ; DT[, .(Match = any(stri_detect_fixed(matchcode, code))), by = .(Claim, failure, Part)]
. And I would also do some searching as vectorizinggrepl
is something that was discussed many times already
– David Arenburg
Nov 11 '18 at 11:20
add a comment |
So, I have data table in R which looks like follows:
Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X
So the idea here is for every group (claim, failure and part) if 'any' of the code in 'code' matches with list of the codes in 'matchcode' column then than group should have Match column as True.
So expected output should be as following:
Claim failure Part Match
23 F1 P1 TRUE
23 F2 P2 FALSE
45 F1 P4 TRUE
45 F1 P1 FALSE
The size of this data.table is huge so will be needing a optimized solution.
please help :)
r data.table
So, I have data table in R which looks like follows:
Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X
So the idea here is for every group (claim, failure and part) if 'any' of the code in 'code' matches with list of the codes in 'matchcode' column then than group should have Match column as True.
So expected output should be as following:
Claim failure Part Match
23 F1 P1 TRUE
23 F2 P2 FALSE
45 F1 P4 TRUE
45 F1 P1 FALSE
The size of this data.table is huge so will be needing a optimized solution.
please help :)
r data.table
r data.table
asked Nov 10 '18 at 22:08
Rahul RajaramRahul Rajaram
225
225
If you want to vectorize/optimize regex operation, always use the stringi package. That will be by far faster than any alternative (at Ieast I know of). For your purposes I would just dolibrary(stringi) ; DT[, .(Match = any(stri_detect_fixed(matchcode, code))), by = .(Claim, failure, Part)]
. And I would also do some searching as vectorizinggrepl
is something that was discussed many times already
– David Arenburg
Nov 11 '18 at 11:20
add a comment |
If you want to vectorize/optimize regex operation, always use the stringi package. That will be by far faster than any alternative (at Ieast I know of). For your purposes I would just dolibrary(stringi) ; DT[, .(Match = any(stri_detect_fixed(matchcode, code))), by = .(Claim, failure, Part)]
. And I would also do some searching as vectorizinggrepl
is something that was discussed many times already
– David Arenburg
Nov 11 '18 at 11:20
If you want to vectorize/optimize regex operation, always use the stringi package. That will be by far faster than any alternative (at Ieast I know of). For your purposes I would just do
library(stringi) ; DT[, .(Match = any(stri_detect_fixed(matchcode, code))), by = .(Claim, failure, Part)]
. And I would also do some searching as vectorizing grepl
is something that was discussed many times already– David Arenburg
Nov 11 '18 at 11:20
If you want to vectorize/optimize regex operation, always use the stringi package. That will be by far faster than any alternative (at Ieast I know of). For your purposes I would just do
library(stringi) ; DT[, .(Match = any(stri_detect_fixed(matchcode, code))), by = .(Claim, failure, Part)]
. And I would also do some searching as vectorizing grepl
is something that was discussed many times already– David Arenburg
Nov 11 '18 at 11:20
add a comment |
3 Answers
3
active
oldest
votes
... and a solution using data.table
.
library(data.table)
grepl_v <- Vectorize(grepl)
DT[, .(Match = any(grepl_v(code, matchcode))), by = .(Claim, failure, Part)]
# Claim failure Part Match
#1: 23 F1 P1 TRUE
#2: 23 F2 P2 FALSE
#3: 45 F1 P4 TRUE
#4: 45 F1 P1 FALSE
data
DT <- fread("Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X")
add a comment |
Here is a dplyr
solution.
library(dplyr)
dat %>%
rowwise() %>%
mutate(Match = grepl(code, matchcode)) %>%
group_by(Claim, failure, Part) %>%
mutate(Match = any(Match)) %>%
select(-code, -matchcode) %>%
unique
## A tibble: 4 x 4
## Groups: Claim, failure, Part [4]
# Claim failure Part Match
# <int> <fct> <fct> <lgl>
#1 23 F1 P1 TRUE
#2 23 F2 P2 FALSE
#3 45 F1 P4 TRUE
#4 45 F1 P1 FALSE
#Warning message:
#Grouping rowwise data frame strips rowwise nature
Don't worry about the warning, it simply tells you that after grep
'ing rowwise, the pipe groups by certain variables and therefore the processing is no longer row by row.
Data.
dat <- read.table(text = "
Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X
", header = TRUE)
add a comment |
base
solution:
d$match <- apply(d, 1, function(x) return(grepl(x[['code']], x['matchcode'])))
# Claim failure Part code matchcode match
# 1 23 F1 P1 A B,A,C 1
# 2 23 F1 P1 D B,A,C 0
# 3 23 F2 P2 D B,A,C 0
# 4 23 F2 P2 E B,A,C 0
# 5 45 F1 P4 X Y,Z,X 1
# 6 45 F1 P4 Y Y,Z,X 1
# 7 45 F1 P4 A Y,Z,X 0
# 8 45 F1 P1 F Y,Z,X 0
# 9 45 F1 P1 H Y,Z,X 0
Original version of this answer used grep()
; thanks to markus for suggesting grepl()
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53243908%2fhow-to-match-values-in-columns-by-group-or-within-category-groupwise-with-othe%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
... and a solution using data.table
.
library(data.table)
grepl_v <- Vectorize(grepl)
DT[, .(Match = any(grepl_v(code, matchcode))), by = .(Claim, failure, Part)]
# Claim failure Part Match
#1: 23 F1 P1 TRUE
#2: 23 F2 P2 FALSE
#3: 45 F1 P4 TRUE
#4: 45 F1 P1 FALSE
data
DT <- fread("Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X")
add a comment |
... and a solution using data.table
.
library(data.table)
grepl_v <- Vectorize(grepl)
DT[, .(Match = any(grepl_v(code, matchcode))), by = .(Claim, failure, Part)]
# Claim failure Part Match
#1: 23 F1 P1 TRUE
#2: 23 F2 P2 FALSE
#3: 45 F1 P4 TRUE
#4: 45 F1 P1 FALSE
data
DT <- fread("Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X")
add a comment |
... and a solution using data.table
.
library(data.table)
grepl_v <- Vectorize(grepl)
DT[, .(Match = any(grepl_v(code, matchcode))), by = .(Claim, failure, Part)]
# Claim failure Part Match
#1: 23 F1 P1 TRUE
#2: 23 F2 P2 FALSE
#3: 45 F1 P4 TRUE
#4: 45 F1 P1 FALSE
data
DT <- fread("Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X")
... and a solution using data.table
.
library(data.table)
grepl_v <- Vectorize(grepl)
DT[, .(Match = any(grepl_v(code, matchcode))), by = .(Claim, failure, Part)]
# Claim failure Part Match
#1: 23 F1 P1 TRUE
#2: 23 F2 P2 FALSE
#3: 45 F1 P4 TRUE
#4: 45 F1 P1 FALSE
data
DT <- fread("Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X")
edited Nov 10 '18 at 22:43
answered Nov 10 '18 at 22:30
markusmarkus
10.8k1030
10.8k1030
add a comment |
add a comment |
Here is a dplyr
solution.
library(dplyr)
dat %>%
rowwise() %>%
mutate(Match = grepl(code, matchcode)) %>%
group_by(Claim, failure, Part) %>%
mutate(Match = any(Match)) %>%
select(-code, -matchcode) %>%
unique
## A tibble: 4 x 4
## Groups: Claim, failure, Part [4]
# Claim failure Part Match
# <int> <fct> <fct> <lgl>
#1 23 F1 P1 TRUE
#2 23 F2 P2 FALSE
#3 45 F1 P4 TRUE
#4 45 F1 P1 FALSE
#Warning message:
#Grouping rowwise data frame strips rowwise nature
Don't worry about the warning, it simply tells you that after grep
'ing rowwise, the pipe groups by certain variables and therefore the processing is no longer row by row.
Data.
dat <- read.table(text = "
Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X
", header = TRUE)
add a comment |
Here is a dplyr
solution.
library(dplyr)
dat %>%
rowwise() %>%
mutate(Match = grepl(code, matchcode)) %>%
group_by(Claim, failure, Part) %>%
mutate(Match = any(Match)) %>%
select(-code, -matchcode) %>%
unique
## A tibble: 4 x 4
## Groups: Claim, failure, Part [4]
# Claim failure Part Match
# <int> <fct> <fct> <lgl>
#1 23 F1 P1 TRUE
#2 23 F2 P2 FALSE
#3 45 F1 P4 TRUE
#4 45 F1 P1 FALSE
#Warning message:
#Grouping rowwise data frame strips rowwise nature
Don't worry about the warning, it simply tells you that after grep
'ing rowwise, the pipe groups by certain variables and therefore the processing is no longer row by row.
Data.
dat <- read.table(text = "
Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X
", header = TRUE)
add a comment |
Here is a dplyr
solution.
library(dplyr)
dat %>%
rowwise() %>%
mutate(Match = grepl(code, matchcode)) %>%
group_by(Claim, failure, Part) %>%
mutate(Match = any(Match)) %>%
select(-code, -matchcode) %>%
unique
## A tibble: 4 x 4
## Groups: Claim, failure, Part [4]
# Claim failure Part Match
# <int> <fct> <fct> <lgl>
#1 23 F1 P1 TRUE
#2 23 F2 P2 FALSE
#3 45 F1 P4 TRUE
#4 45 F1 P1 FALSE
#Warning message:
#Grouping rowwise data frame strips rowwise nature
Don't worry about the warning, it simply tells you that after grep
'ing rowwise, the pipe groups by certain variables and therefore the processing is no longer row by row.
Data.
dat <- read.table(text = "
Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X
", header = TRUE)
Here is a dplyr
solution.
library(dplyr)
dat %>%
rowwise() %>%
mutate(Match = grepl(code, matchcode)) %>%
group_by(Claim, failure, Part) %>%
mutate(Match = any(Match)) %>%
select(-code, -matchcode) %>%
unique
## A tibble: 4 x 4
## Groups: Claim, failure, Part [4]
# Claim failure Part Match
# <int> <fct> <fct> <lgl>
#1 23 F1 P1 TRUE
#2 23 F2 P2 FALSE
#3 45 F1 P4 TRUE
#4 45 F1 P1 FALSE
#Warning message:
#Grouping rowwise data frame strips rowwise nature
Don't worry about the warning, it simply tells you that after grep
'ing rowwise, the pipe groups by certain variables and therefore the processing is no longer row by row.
Data.
dat <- read.table(text = "
Claim failure Part code matchcode
23 F1 P1 A B,A,C
23 F1 P1 D B,A,C
23 F2 P2 D B,A,C
23 F2 P2 E B,A,C
45 F1 P4 X Y,Z,X
45 F1 P4 Y Y,Z,X
45 F1 P4 A Y,Z,X
45 F1 P1 F Y,Z,X
45 F1 P1 H Y,Z,X
", header = TRUE)
answered Nov 10 '18 at 22:26
Rui BarradasRui Barradas
16.3k51730
16.3k51730
add a comment |
add a comment |
base
solution:
d$match <- apply(d, 1, function(x) return(grepl(x[['code']], x['matchcode'])))
# Claim failure Part code matchcode match
# 1 23 F1 P1 A B,A,C 1
# 2 23 F1 P1 D B,A,C 0
# 3 23 F2 P2 D B,A,C 0
# 4 23 F2 P2 E B,A,C 0
# 5 45 F1 P4 X Y,Z,X 1
# 6 45 F1 P4 Y Y,Z,X 1
# 7 45 F1 P4 A Y,Z,X 0
# 8 45 F1 P1 F Y,Z,X 0
# 9 45 F1 P1 H Y,Z,X 0
Original version of this answer used grep()
; thanks to markus for suggesting grepl()
add a comment |
base
solution:
d$match <- apply(d, 1, function(x) return(grepl(x[['code']], x['matchcode'])))
# Claim failure Part code matchcode match
# 1 23 F1 P1 A B,A,C 1
# 2 23 F1 P1 D B,A,C 0
# 3 23 F2 P2 D B,A,C 0
# 4 23 F2 P2 E B,A,C 0
# 5 45 F1 P4 X Y,Z,X 1
# 6 45 F1 P4 Y Y,Z,X 1
# 7 45 F1 P4 A Y,Z,X 0
# 8 45 F1 P1 F Y,Z,X 0
# 9 45 F1 P1 H Y,Z,X 0
Original version of this answer used grep()
; thanks to markus for suggesting grepl()
add a comment |
base
solution:
d$match <- apply(d, 1, function(x) return(grepl(x[['code']], x['matchcode'])))
# Claim failure Part code matchcode match
# 1 23 F1 P1 A B,A,C 1
# 2 23 F1 P1 D B,A,C 0
# 3 23 F2 P2 D B,A,C 0
# 4 23 F2 P2 E B,A,C 0
# 5 45 F1 P4 X Y,Z,X 1
# 6 45 F1 P4 Y Y,Z,X 1
# 7 45 F1 P4 A Y,Z,X 0
# 8 45 F1 P1 F Y,Z,X 0
# 9 45 F1 P1 H Y,Z,X 0
Original version of this answer used grep()
; thanks to markus for suggesting grepl()
base
solution:
d$match <- apply(d, 1, function(x) return(grepl(x[['code']], x['matchcode'])))
# Claim failure Part code matchcode match
# 1 23 F1 P1 A B,A,C 1
# 2 23 F1 P1 D B,A,C 0
# 3 23 F2 P2 D B,A,C 0
# 4 23 F2 P2 E B,A,C 0
# 5 45 F1 P4 X Y,Z,X 1
# 6 45 F1 P4 Y Y,Z,X 1
# 7 45 F1 P4 A Y,Z,X 0
# 8 45 F1 P1 F Y,Z,X 0
# 9 45 F1 P1 H Y,Z,X 0
Original version of this answer used grep()
; thanks to markus for suggesting grepl()
edited Nov 10 '18 at 22:32
answered Nov 10 '18 at 22:27
12b345b6b7812b345b6b78
782115
782115
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53243908%2fhow-to-match-values-in-columns-by-group-or-within-category-groupwise-with-othe%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If you want to vectorize/optimize regex operation, always use the stringi package. That will be by far faster than any alternative (at Ieast I know of). For your purposes I would just do
library(stringi) ; DT[, .(Match = any(stri_detect_fixed(matchcode, code))), by = .(Claim, failure, Part)]
. And I would also do some searching as vectorizinggrepl
is something that was discussed many times already– David Arenburg
Nov 11 '18 at 11:20