Create all combinations of letter substitution in string

Create all combinations of letter substitution in string



I have a string "ECET" and I would like to create all the possible strings where I substitute one or more letters (all but the first) with "X".



So in this case my result would be:


> result
[1] "EXET" "ECXT" "ECEX" "EXXT" "EXEX" "ECXX" "EXXX"



Any ideas as to how to approach the issue?



This is not just create the possible combinations/permutations of "X" but also how to combine them with the existing string.




7 Answers
7



Using the FUN argument of combn:


FUN


combn


a <- "ECET"

fun <- function(n, string)
combn(nchar(string), n, function(x)
s <- strsplit(string, '')[[1]]
s[x] <- 'X'
paste(s, collapse = '')
)

lapply(seq_len(nchar(a)), fun, string = a)


[[1]]
[1] "XCET" "EXET" "ECXT" "ECEX"

[[2]]
[1] "XXET" "XCXT" "XCEX" "EXXT" "EXEX" "ECXX"

[[3]]
[1] "XXXT" "XXEX" "XCXX" "EXXX"

[[4]]
[1] "XXXX"



unlist to get a single vector. Faster solutions are probably available.


unlist



To leave your first character unchanged:


paste0(
substring(a, 1, 1),
unlist(lapply(seq_len(nchar(a) - 1), fun, string = substring(a, 2)))
)


[1] "EXET" "ECXT" "ECEX" "EXXT" "EXEX" "ECXX" "EXXX"



Here's a recursive solution:


f <- function(x,pos=2)
if(pos <= nchar(x))
c(f(x,pos+1), f(`substr<-`(x, pos, pos, "X"),pos+1))
else x

f(x)[-1]
# [1] "ECEX" "ECXT" "ECXX" "EXET" "EXEX" "EXXT" "EXXX"



Or using expand.grid :


expand.grid


do.call(paste0, expand.grid(c(substr(x,1,1),lapply(strsplit(x,"")[[1]][-1], c, "X"))))[-1]
# [1] "EXET" "ECXT" "EXXT" "ECEX" "EXEX" "ECXX" "EXXX"



Or using combn / Reduce / substr<-:


combn


Reduce


substr<-


combs <- unlist(lapply(seq(nchar(x)-1),combn, x =seq(nchar(x))[-1],simplify = F),F)
sapply(combs, Reduce, f= function(x,y) `substr<-`(x,y,y,"X"), init = x)
# [1] "EXET" "ECXT" "ECEX" "EXXT" "EXEX" "ECXX" "EXXX"



Second solution explained


pairs0 <- lapply(strsplit(x,"")[[1]][-1], c, "X") # pairs of original letter + "X"
pairs1 <- c(substr(x,1,1), pairs0) # including 1st letter (without "X")
do.call(paste0, expand.grid(pairs1))[-1] # expand into data.frame and paste



Kind of for the sake of adding another option using binary logic:



Assuming your string is always 4 character long:


input<-"ECET"
invec <- strsplit(input,'')[[1]]
sapply(1:7, function(x)
z <- invec
z[rev(as.logical(intToBits(x))[1:4])] <- "X"
paste0(z,collapse = '')
)

[1] "ECEX" "ECXT" "ECXX" "EXET" "EXEX" "EXXT" "EXXX"



If the string has to be longer, you can compute the values with power of 2, something like this should do:


input<-"ECETC"
pow <- nchar(input)
invec <- strsplit(input,'')[[1]]
sapply(1:(2^(pow-1) - 1), function(x)
z <- invec
z[rev(as.logical(intToBits(x))[1:(pow)])] <- "X"
paste0(z,collapse = '')
)

[1] "ECETX" "ECEXC" "ECEXX" "ECXTC" "ECXTX" "ECXXC" "ECXXX" "EXETC" "EXETX" "EXEXC" "EXEXX" "EXXTC" "EXXTX" "EXXXC"
[15] "EXXXX"



The idea is to know the number of possible alterations, it's a binary of 3 positions, so 2^3 minus 1 as we don't want to keep the no replacement string: 7



intToBits return the binary value of the integer, for 5:


> intToBits(5)
[1] 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00



R uses 32 bits by default, but we just want a logical vector corresponding to our string lenght, so we just keep the nchar of the original string.
Then we convert to logical and reverse this 4 boolean values, as we'll never trigger the last bit (8 for 4 chars) it will never be true:


> intToBits(5)
[1] 01 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> tmp<-as.logical(intToBits(5)[1:4])
> tmp
[1] TRUE FALSE TRUE FALSE
> rev(tmp)
[1] FALSE TRUE FALSE TRUE



To avoid overwriting our original vector we do copy it into z, and then just replace the position in z using this logical vector.



For a nice output we return the paste0 with collapse as nothing to recreate a single string and retrieve a character vector.



Another version with combn, using purrr:


s <- "ECET"
f <- function(x,y) substr(x,y,y) <- "X"; x
g <- function(x) purrr::reduce(x,f,.init=s)
unlist(purrr::map(1:(nchar(s)-1), function(x) combn(2:nchar(s),x,g)))

#[1] "EXET" "ECXT" "ECEX" "EXXT" "EXEX" "ECXX" "EXXX"



or without purrr:


s <- "ECET"
f <- function(x,y) substr(x,y,y) <- "X"; x
g <- function(x) Reduce(f,x,s)
unlist(lapply(1:(nchar(s)-1),function(x) combn(2:nchar(s),x,g)))



Here is a base R solution, but i find it complicated, with 3 nested loops.


replaceChar <- function(x, char = "X")
n <- nchar(x)
res <- NULL
for(i in seq_len(n))
cmb <- combn(n, i)
r <- apply(cmb, 2, function(cc)
y <- x
for(k in cc)
substr(y, k, k) <- char
y
)
res <- c(res, r)

res


x <- "ECET"

replaceChar(x)
replaceChar(x, "Y")
replaceChar(paste0(x, x))



A vectorized method with boolean indexing:


permX <- function(text, replChar='X')
library(gtools)
library(stringr)
# get TRUE/FALSE permutations for nchar(text)
idx <- permutations(2, nchar(text),c(T,F), repeats.allowed = T)

# we don't want the first character to be replaced
idx <- idx[1:(nrow(idx)/2),]

# split string into single chars
chars <- str_split(text,'')

# build data.frame with nrows(df) == nrows(idx)
df = t(data.frame(rep(chars, nrow(idx))))

# do replacing
df[idx] <- replChar

row.names(df) <- c()
return(df)

permX('ECET')

[,1] [,2] [,3] [,4]
[1,] "E" "C" "E" "T"
[2,] "E" "C" "E" "X"
[3,] "E" "C" "X" "T"
[4,] "E" "C" "X" "X"
[5,] "E" "X" "E" "T"
[6,] "E" "X" "E" "X"
[7,] "E" "X" "X" "T"
[8,] "E" "X" "X" "X"



One more simple solution


# expand.grid to get all combinations of the input vectors, result in a matrix
m <- expand.grid( c('E'),
c('C','X'),
c('E','X'),
c('T','X') )

# then, optionally, apply to paste the columns together
apply(m, 1, paste0, collapse='')[-1]

[1] "EXET" "ECXT" "EXXT" "ECEX" "EXEX" "ECXX" "EXXX"






Would be a complete answer if the building of m was done from a string and not from manual input. (but mostly that would be Moody's second option)

– Tensibai
Sep 7 '18 at 13:24



m






Moody's second option as a one line solution is truly excellent. But it's very terse with a lot packed in. I think it's worth also showing this way as it's clearer what's happening at each step. The problem was simple enough that it didn't require coding to put the input into expand.grid()

– krads
Sep 7 '18 at 14:48






I assume the question just take one exemple of 4 letters (maybe some kind of biologic sequence) and wish to apply that to a large number after, so showing how to build the various vectors in m would be better in my opinion

– Tensibai
Sep 7 '18 at 15:05






I think it's useful to show an intuitive solution even if it's not general. I've updated my answer to make my 2nd solution more understandable :)

– Moody_Mudskipper
Sep 7 '18 at 22:54



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ

Node.js puppeteer - Use values from array in a loop to cycle through pages