Backward Concatenation of Column values(Strings) in Python pandas
Backward Concatenation of Column values(Strings) in Python pandas
I trying to concatenate strings by decremented columns and assuming the table is as below. I'm creating fixed number (5) of NEW columns & want every column to have string concatenated by non-empty strings. Let say for the row-1, first concatenation will be STR_1;STR_2;STR_3 and the second concatenation should be STR_1;STR_2 and third concatenation should be STR_1 & the rest of the cells should remain empty.
row-1
STR_1;STR_2;STR_3
STR_1;STR_2
STR_1
MAJOR_BRAND | SUB_SEGMENT | PACKAGE_TYPE | PACKAGE | SIZE | FLAVOR | VOLUME
-------------------------------------------------------------------------------------
STR_1 | STR_2 | STR_3 | | | |
STR_1 | STR_2 | STR_3 | | | | STR_7
STR_1 | | | STR_4 | STR_5 | STR_6 | STR_7
STR_1 | | STR_3 | | | STR_6 | STR_7
STR_1 | | STR_3 | | | STR_6 |
Output should looks like
STR_JOIN_1 | STR_JOIN_2 | STR_JOIN_3 | STR_JOIN_4 | STR_JOIN_5
----------------------------------------------------------------------------------------------------
STR_1;STR_2;STR_3 |STR_1;STR_2 |STR_1 | |
STR_1;STR_2;STR_3;STR_7 |STR_1;STR_2;STR_3 |STR_1;STR_2 |STR_1 |
STR_1;STR_4;STR_5;STR_6;STR_7 |STR_1;STR_4;STR_5;STR_6 |STR_1;STR_4;STR_5|STR_1;STR_4 | STR_1
STR_1;STR_3;STR_6;STR_7 |STR_1;STR_3;STR_6 |STR_1;STR_3 |STR_1 |
STR_1;STR_3;STR_6 |STR_1;STR_3 |STR_1 | |
Kind help will be greatly appreciated !!!
the second concatenation should be STR_1;STR_2
2 Answers
2
Quick and dirty. Note f-strings require Python 3.6+.
res = pd.DataFrame(columns=[f'STR_JOIN_i' for i in range(1, 6)])
res['STR_JOIN_1'] = df.apply(lambda x: ';'.join(filter(None, x)), axis=1)
for i in range(2, 6):
res[f'STR_JOIN_i'] = res[f'STR_JOIN_i-1'].str.split(';').str[:-1].str.join(';')
Result:
print(res)
STR_JOIN_1 STR_JOIN_2 STR_JOIN_3
0 STR_1;STR_2;STR_3 STR_1;STR_2 STR_1
1 STR_1;STR_2;STR_3;STR_7 STR_1;STR_2;STR_3 STR_1;STR_2
2 STR_1;STR_4;STR_5;STR_6;STR_7 STR_1;STR_4;STR_5;STR_6 STR_1;STR_4;STR_5
3 STR_1;STR_3;STR_6;STR_7 STR_1;STR_3;STR_6 STR_1;STR_3
4 STR_1;STR_3;STR_6 STR_1;STR_3 STR_1
STR_JOIN_4 STR_JOIN_5
0
1 STR_1
2 STR_1;STR_4 STR_1
3 STR_1
4
Another Approach..i guess
new_data = pd.DataFrame(columns=['STR_JOIN_'.format(i) for i in range(1, len(initial_data.columns))])
loc = 0
for i in range(len(initial_data)):#initial_data being the input
temp_list = [ x for x in list(initial_data.loc[i].values) if x ]
new_data.loc[loc] = ['']*(len(initial_data.columns)-1)
for j in range(len(temp_list)):
new_data['STR_JOIN_'.format(j+1)][loc] = ';'.join(data[:len(data)-j])
loc+=1
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
So the concatenation operation should go from left to right over all the columns of the dataframe, yielding each additional concatenation of one more string, and stopping the process at the first empty string found in a column? Also, regarding the second row, when you say that
the second concatenation should be STR_1;STR_2, why is it not STR_1;STR_2;STR_3? EDIT: I got it, by "first concatenation" you mean the last concatenation, you got the operations in reverse order. You will need to recursively .apply() a function on the dataframe, and this function should check for empty strings first.– Daniel R.
Sep 11 '18 at 8:32