How to read csv formatted numeric data into Pandas
How to read csv formatted numeric data into Pandas
I have a csv file with two formatted columns that currently read in as objects:
contains percentage values which read in as strings like '0.01%'. The % is always at the end.
contains currency values which read in as string like '$1234.5'.
I have tried using the split function to remove the % or $ inside the dataframe, then using float on the result of the split. This will print the correct result but will not assign the value. It also gives a type error that float does not have split function, even though I do the split before the float????
2 Answers
2
Try this:
import pandas as pd
df = pd.read_csv('data.csv')
"""
The example df looks like this:
col1 col2
0 3.04% $100.25
1 0.15% $1250
2 0.22% $322
3 1.30% $956
4 0.49% $621
"""
df['col1'] = df['col1'].str.split('%', expand=True)[[0]]
df['col2'] = df['col2'].str.split('$', 1, expand=True)[[1]]
df[['col1', 'col2']] = df[['col1', 'col2']].apply(pd.to_numeric)
This worked! Thanks for help!
– frogfanitw
Aug 26 at 20:24
You are probably looking for the apply method.
With
df['first_col'] = df['first_col'].apply(lambda x: float(x.strip('%'))
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Thanks to all who helped.
– frogfanitw
Aug 26 at 20:26