How to change column names based on the first three characters of the column name
How to change column names based on the first three characters of the column name
I would like to change the column names based on the first three characters of the column name using a dictionary.
This is the code I have currently:
new_names = "aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
"pol":"cou_pol","spec":"coc_spec","dark":"daw_dark"
for x,y in new_names.items():
if df.columns.str.startswith(x):
df.columns = df.columns.str.replace(x,y)
I get the following error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
1 Answer
1
Use:
df = pd.DataFrame('aud1':list('abcdef'),
'spe2':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'F':list('aaabbb'))
print (df)
aud1 spe2 C F
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
new_names = "aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
"pol":"cou_pol","spec":"coc_spec","dark":"daw_dark"
First filter first 3 values of dictionary:
new_names = k[:3] :v for k, v in new_names.items()
print (new_names)
'aud': 'alc_aud', 'whe': 'clu_whe', 'per': 'pre_per',
'pol': 'cou_pol', 'spe': 'coc_spec', 'dar': 'daw_dark'
And then select first 3 letter by indexing str[:3]
and then replace
by dict
:
str[:3]
replace
dict
df.columns = df.columns.to_series().str[:3].replace(new_names)
print (df)
alc_aud coc_spec C F
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
Another solution with get
with list comprehension
, if value is not matched return original value:
get
list comprehension
df.columns = [new_names.get(x[:3], x) for x in df.columns]
print (df)
alc_aud coc_spec C F
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
EDIT: Soluton working with strings with any length:
df = pd.DataFrame('aud1':list('abcdef'),
'specd2':[4,5,4,5,5,4],
'podfds':[7,8,9,4,2,3],
'aaper':list('aaabbb'))
print (df)
aud1 specd2 podfds aaper
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
new_names = "aud":"alc_aud","whe":"clu_whe", "per":"pre_per",
"po":"cou_pol","spec":"coc_spec","dark":"daw_dark"
First extract
all values starting by keys of dict and then map
, last fill non matched values by fillna
:
extract
map
fillna
pat = '|'.join([r'^'.format(x) for x in new_names])
s = df.columns.to_series()
df.columns = s.str.extract('('+ pat + ')', expand=False).map(new_names).fillna(s)
print (df)
alc_aud coc_spec cou_pol aaper
0 a 4 7 a
1 b 5 8 a
2 c 4 9 a
3 d 5 4 b
4 e 5 2 b
5 f 4 3 b
I just realized that some of my dictionary keys have 2 characters rather than 3. Is it still possible to do the same thing, but just say 'startswith', so that there is no defined character limit.
– sos.cott
Sep 2 at 19:11
@a.parris I am offline, on phone only. so solution should be add this code
new_names2 = k :v for k, v in new_names.items() if len(k) ==2
and then df.columns = df.columns.to_series().str[:2].replace(new_names2)
or df.columns = [new_names2.get(x[:2], x) for x in df.columns]
– jezrael
Sep 2 at 20:04
new_names2 = k :v for k, v in new_names.items() if len(k) ==2
df.columns = df.columns.to_series().str[:2].replace(new_names2)
df.columns = [new_names2.get(x[:2], x) for x in df.columns]
@a.parris - added new more general solution.
– jezrael
Sep 3 at 7:18
@jezreal. Great! This works perfectly as well. Thanks for your help.
– sos.cott
Sep 3 at 19:02
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
This works perfectly! (I did not have to use the first step: filtering out the first three values of the dictionary)
– sos.cott
Sep 2 at 18:33