Create all combinations of letter substitution in string
Create all combinations of letter substitution in string
I have a string "ECET" and I would like to create all the possible strings where I substitute one or more letters (all but the first) with "X".
So in this case my result would be:
> result
[1] "EXET" "ECXT" "ECEX" "EXXT" "EXEX" "ECXX" "EXXX"
Any ideas as to how to approach the issue?
This is not just create the possible combinations/permutations of "X" but also how to combine them with the existing string.
7 Answers
7
Using the FUN argument of combn:
FUN
combn
a <- "ECET"
fun <- function(n, string)
combn(nchar(string), n, function(x)
s <- strsplit(string, '')[[1]]
s[x] <- 'X'
paste(s, collapse = '')
)
lapply(seq_len(nchar(a)), fun, string = a)
[[1]]
[1] "XCET" "EXET" "ECXT" "ECEX"
[[2]]
[1] "XXET" "XCXT" "XCEX" "EXXT" "EXEX" "ECXX"
[[3]]
[1] "XXXT" "XXEX" "XCXX" "EXXX"
[[4]]
[1] "XXXX"
unlist to get a single vector. Faster solutions are probably available.
unlist
To leave your first character unchanged:
paste0(
substring(a, 1, 1),
unlist(lapply(seq_len(nchar(a) - 1), fun, string = substring(a, 2)))
)
[1] "EXET" "ECXT" "ECEX" "EXXT" "EXEX" "ECXX" "EXXX"
Here's a recursive solution:
f <- function(x,pos=2)
if(pos <= nchar(x))
c(f(x,pos+1), f(`substr<-`(x, pos, pos, "X"),pos+1))
else x
f(x)[-1]
# [1] "ECEX" "ECXT" "ECXX" "EXET" "EXEX" "EXXT" "EXXX"
Or using expand.grid :
expand.grid
do.call(paste0, expand.grid(c(substr(x,1,1),lapply(strsplit(x,"")[[1]][-1], c, "X"))))[-1]
# [1] "EXET" "ECXT" "EXXT" "ECEX" "EXEX" "ECXX" "EXXX"
Or using combn / Reduce / substr<-:
combn
Reduce
substr<-
combs <- unlist(lapply(seq(nchar(x)-1),combn, x =seq(nchar(x))[-1],simplify = F),F)
sapply(combs, Reduce, f= function(x,y) `substr<-`(x,y,y,"X"), init = x)
# [1] "EXET" "ECXT" "ECEX" "EXXT" "EXEX" "ECXX" "EXXX"
Second solution explained
pairs0 <- lapply(strsplit(x,"")[[1]][-1], c, "X") # pairs of original letter + "X"
pairs1 <- c(substr(x,1,1), pairs0) # including 1st letter (without "X")
do.call(paste0, expand.grid(pairs1))[-1] # expand into data.frame and paste
Kind of for the sake of adding another option using binary logic:
Assuming your string is always 4 character long:
input<-"ECET"
invec <- strsplit(input,'')[[1]]
sapply(1:7, function(x)
z <- invec
z[rev(as.logical(intToBits(x))[1:4])] <- "X"
paste0(z,collapse = '')
)
[1] "ECEX" "ECXT" "ECXX" "EXET" "EXEX" "EXXT" "EXXX"
If the string has to be longer, you can compute the values with power of 2, something like this should do:
input<-"ECETC"
pow <- nchar(input)
invec <- strsplit(input,'')[[1]]
sapply(1:(2^(pow-1) - 1), function(x)
z <- invec
z[rev(as.logical(intToBits(x))[1:(pow)])] <- "X"
paste0(z,collapse = '')
)
[1] "ECETX" "ECEXC" "ECEXX" "ECXTC" "ECXTX" "ECXXC" "ECXXX" "EXETC" "EXETX" "EXEXC" "EXEXX" "EXXTC" "EXXTX" "EXXXC"
[15] "EXXXX"
The idea is to know the number of possible alterations, it's a binary of 3 positions, so 2^3 minus 1 as we don't want to keep the no replacement string: 7
intToBits return the binary value of the integer, for 5:
> intToBits(5)
[1] 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
R uses 32 bits by default, but we just want a logical vector corresponding to our string lenght, so we just keep the nchar of the original string.
Then we convert to logical and reverse this 4 boolean values, as we'll never trigger the last bit (8 for 4 chars) it will never be true:
> intToBits(5)
[1] 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> tmp<-as.logical(intToBits(5)[1:4])
> tmp
[1] TRUE FALSE TRUE FALSE
> rev(tmp)
[1] FALSE TRUE FALSE TRUE
To avoid overwriting our original vector we do copy it into z, and then just replace the position in z using this logical vector.
For a nice output we return the paste0 with collapse as nothing to recreate a single string and retrieve a character vector.
Another version with combn, using purrr:
s <- "ECET"
f <- function(x,y) substr(x,y,y) <- "X"; x
g <- function(x) purrr::reduce(x,f,.init=s)
unlist(purrr::map(1:(nchar(s)-1), function(x) combn(2:nchar(s),x,g)))
#[1] "EXET" "ECXT" "ECEX" "EXXT" "EXEX" "ECXX" "EXXX"
or without purrr:
s <- "ECET"
f <- function(x,y) substr(x,y,y) <- "X"; x
g <- function(x) Reduce(f,x,s)
unlist(lapply(1:(nchar(s)-1),function(x) combn(2:nchar(s),x,g)))
Here is a base R solution, but i find it complicated, with 3 nested loops.
replaceChar <- function(x, char = "X")
n <- nchar(x)
res <- NULL
for(i in seq_len(n))
cmb <- combn(n, i)
r <- apply(cmb, 2, function(cc)
y <- x
for(k in cc)
substr(y, k, k) <- char
y
)
res <- c(res, r)
res
x <- "ECET"
replaceChar(x)
replaceChar(x, "Y")
replaceChar(paste0(x, x))
A vectorized method with boolean indexing:
permX <- function(text, replChar='X')
library(gtools)
library(stringr)
# get TRUE/FALSE permutations for nchar(text)
idx <- permutations(2, nchar(text),c(T,F), repeats.allowed = T)
# we don't want the first character to be replaced
idx <- idx[1:(nrow(idx)/2),]
# split string into single chars
chars <- str_split(text,'')
# build data.frame with nrows(df) == nrows(idx)
df = t(data.frame(rep(chars, nrow(idx))))
# do replacing
df[idx] <- replChar
row.names(df) <- c()
return(df)
permX('ECET')
[,1] [,2] [,3] [,4]
[1,] "E" "C" "E" "T"
[2,] "E" "C" "E" "X"
[3,] "E" "C" "X" "T"
[4,] "E" "C" "X" "X"
[5,] "E" "X" "E" "T"
[6,] "E" "X" "E" "X"
[7,] "E" "X" "X" "T"
[8,] "E" "X" "X" "X"
One more simple solution
# expand.grid to get all combinations of the input vectors, result in a matrix
m <- expand.grid( c('E'),
c('C','X'),
c('E','X'),
c('T','X') )
# then, optionally, apply to paste the columns together
apply(m, 1, paste0, collapse='')[-1]
[1] "EXET" "ECXT" "EXXT" "ECEX" "EXEX" "ECXX" "EXXX"
m
Moody's second option as a one line solution is truly excellent. But it's very terse with a lot packed in. I think it's worth also showing this way as it's clearer what's happening at each step. The problem was simple enough that it didn't require coding to put the input into expand.grid()
– krads
Sep 7 '18 at 14:48
I assume the question just take one exemple of 4 letters (maybe some kind of biologic sequence) and wish to apply that to a large number after, so showing how to build the various vectors in m would be better in my opinion
– Tensibai
Sep 7 '18 at 15:05
I think it's useful to show an intuitive solution even if it's not general. I've updated my answer to make my 2nd solution more understandable :)
– Moody_Mudskipper
Sep 7 '18 at 22:54
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Would be a complete answer if the building of
mwas done from a string and not from manual input. (but mostly that would be Moody's second option)– Tensibai
Sep 7 '18 at 13:24