Split a Matrix file into multiple files based on column name [closed]

Split a Matrix file into multiple files based on column name [closed]



I have a large tab delimited file with thousands of columns and many thousands of line. It looks like:


ID A_1 A_2 A_3 B_1 B_3 B_4 C_2 C_3 C_5
xx 01 02 03 04 05 06 07 08 09
xy 03 05 33 44 15 26 27 08 09



I want to split this table in to multiple files:


# A.txt
ID A_1 A_2 A_3
xx 01 02 03
xy 03 05 33

# B.txt
ID B_1 B_3 B_4
xx 04 05 06
xy 44 15 26

# C.txt
ID C_2 C_3 C_5
xx 07 08 09
xy 27 08 09



So, file name would be the column header prefix. Id column is fixed in each file. Rest of the columns in each file is based on common prefix before underscore.



How can I do it in Linux/Bash/Perl/python command?



Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.





Have you already tried solving this yourself? Did you get stuck somewhere?
– simbabque
Aug 23 at 15:10





@simbabque. When I posted the answer, there was an R tag.
– akrun
Aug 23 at 15:21



R





You replaced that tag with the awk tag. The likely cause of the downvotes here is that you are asking other people to do your work for you without showing any efforts of your own.
– simbabque
Aug 23 at 15:26




1 Answer
1



We can use split.default on the substring of column names in R to a list of data.frames


split.default


R


list


data.frame


nm1 <- sub("_\d+", "", names(df1)[-1])
lst <- lapply(split.default(df1[-1], nm1), transform, ID = df1$ID)



Or with Map


Map


setNames(Map(cbind, ID = df1['ID'],
split.default(df1[-1], nm1)), unique(nm1))
#$A
# ID A_1 A_2 A_3
#1 xx 1 2 3
#2 xy 3 5 33

#$B
# ID B_1 B_3 B_4
#1 xx 4 5 6
#2 xy 44 15 26

#$C
# ID C_2 C_3 C_5
#1 xx 7 8 9
#2 xy 27 8 9


df1 <- structure(list(ID = c("xx", "xy"), A_1 = c(1L, 3L), A_2 = c(2L,
5L), A_3 = c(3L, 33L), B_1 = c(4L, 44L), B_3 = c(5L, 15L), B_4 = c(6L,
26L), C_2 = c(7L, 27L), C_3 = c(8L, 8L), C_5 = c(9L, 9L)),
class = data.frame", row.names = c(NA, -2L))





Thanks. How to read table File (myfile.txt) and save multiple files automatically by column prefix name after splitting at a time in R? Let me know full code. Learning R. Otherwise, other than R code would be helpful.
– Jishan
Aug 23 at 13:46






@Jishan The dataset is split into a list. You can use lapply to loop through the list (or a for loop) to read multiple files i.e. lapply(filenames, read.table) into a list
– akrun
Aug 23 at 13:47


list


lapply


list


for


lapply(filenames, read.table)


list





Thanks. I am new for R. How to save multiple files automatically by column prefix name after splitting at a time in R? Let me know full code please
– Jishan
Aug 23 at 14:03





@Jishan Try this lapply(names(lst), function(x) write.csv(paste0(x, ".csv"), lst[[x]], row.names = FALSE, quote = FALSE))
– akrun
Aug 23 at 14:05


lapply(names(lst), function(x) write.csv(paste0(x, ".csv"), lst[[x]], row.names = FALSE, quote = FALSE))

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

How do I collapse sections of code in Visual Studio Code for Windows?

ャフサォクコ ケウ,コ,ワ メ,ロスョノ゙,クネ,フムカヤヲニ,エコ゚ツ ウイオン゙ケワサネォキモュキォウイノンコチ゚メヌナイゥフュ,カヒウネェ ネ,ホノケ,ムュキ ッボーミュハ,チ ツス ィ メウイマヤ,゙ウチ ヅ ロ,ォジヌェ ャヌット ェ,マャ,チナエヒネソキツテ トホヲヲミーァ