Split a Matrix file into multiple files based on column name [closed]
Split a Matrix file into multiple files based on column name [closed]
I have a large tab delimited file with thousands of columns and many thousands of line. It looks like:
ID A_1 A_2 A_3 B_1 B_3 B_4 C_2 C_3 C_5
xx 01 02 03 04 05 06 07 08 09
xy 03 05 33 44 15 26 27 08 09
I want to split this table in to multiple files:
# A.txt
ID A_1 A_2 A_3
xx 01 02 03
xy 03 05 33
# B.txt
ID B_1 B_3 B_4
xx 04 05 06
xy 44 15 26
# C.txt
ID C_2 C_3 C_5
xx 07 08 09
xy 27 08 09
So, file name would be the column header prefix. Id column is fixed in each file. Rest of the columns in each file is based on common prefix before underscore.
How can I do it in Linux/Bash/Perl/python command?
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
@simbabque. When I posted the answer, there was an
R tag.– akrun
Aug 23 at 15:21
R
You replaced that tag with the awk tag. The likely cause of the downvotes here is that you are asking other people to do your work for you without showing any efforts of your own.
– simbabque
Aug 23 at 15:26
1 Answer
1
We can use split.default on the substring of column names in R to a list of data.frames
split.default
R
list
data.frame
nm1 <- sub("_\d+", "", names(df1)[-1])
lst <- lapply(split.default(df1[-1], nm1), transform, ID = df1$ID)
Or with Map
Map
setNames(Map(cbind, ID = df1['ID'],
split.default(df1[-1], nm1)), unique(nm1))
#$A
# ID A_1 A_2 A_3
#1 xx 1 2 3
#2 xy 3 5 33
#$B
# ID B_1 B_3 B_4
#1 xx 4 5 6
#2 xy 44 15 26
#$C
# ID C_2 C_3 C_5
#1 xx 7 8 9
#2 xy 27 8 9
df1 <- structure(list(ID = c("xx", "xy"), A_1 = c(1L, 3L), A_2 = c(2L,
5L), A_3 = c(3L, 33L), B_1 = c(4L, 44L), B_3 = c(5L, 15L), B_4 = c(6L,
26L), C_2 = c(7L, 27L), C_3 = c(8L, 8L), C_5 = c(9L, 9L)),
class = data.frame", row.names = c(NA, -2L))
Thanks. How to read table File (myfile.txt) and save multiple files automatically by column prefix name after splitting at a time in R? Let me know full code. Learning R. Otherwise, other than R code would be helpful.
– Jishan
Aug 23 at 13:46
@Jishan The dataset is split into a
list. You can use lapply to loop through the list (or a for loop) to read multiple files i.e. lapply(filenames, read.table) into a list– akrun
Aug 23 at 13:47
list
lapply
list
for
lapply(filenames, read.table)
list
Thanks. I am new for R. How to save multiple files automatically by column prefix name after splitting at a time in R? Let me know full code please
– Jishan
Aug 23 at 14:03
@Jishan Try this
lapply(names(lst), function(x) write.csv(paste0(x, ".csv"), lst[[x]], row.names = FALSE, quote = FALSE))– akrun
Aug 23 at 14:05
lapply(names(lst), function(x) write.csv(paste0(x, ".csv"), lst[[x]], row.names = FALSE, quote = FALSE))
Have you already tried solving this yourself? Did you get stuck somewhere?
– simbabque
Aug 23 at 15:10