Convert commas to dots within a Dataframe

I am importing a csv file looking like the one below. Using the pandas.read_csv

df = pd.read_csv(Input, delimiter=";")

.
.
.

10;01.02.2015 16:58;01.02.2015 16:58;-0.59;0.1;-4.39;NotApplicable;0.79;0.2

11;01.02.2015 16:58;01.02.2015 16:58;-0.57;0.2;-2.87;NotApplicable;0.79;0.21

.
.
.

The problem is that when I later on in my code try to use these values I get this error : TypeError: can't multiply sequence by non-int of type 'float'.

I get this error because the number I try to use is not written with a dot(.) as a decimal separator but a comma(,). After manually changing the commas to a dots my program works.

I can't change the format of my input, and thus have to replace the commas in my DataFrame in order for my code to work, and I want python to do this without the need of doing it manually. Do you have any suggestions?

Did you try df = pd.read_csv("data.csv",decimal=",",delimiter=";")
– Padraic Cunningham
Jul 29 '15 at 12:43

df = pd.read_csv("data.csv",decimal=",",delimiter=";")

No haven't tried that, fairly new to Python. I'l give it a try ty :)
– Nautilius
Jul 29 '15 at 12:45

Sorry I don't understand your csv is formatted using decimal points so it should come in as floats, can you show your code that doesn't like the float dtype, you can change the dtype using astype(int) on the column
– EdChum
Jul 29 '15 at 12:46

astype(int)

Ty Padric Cunningham, that did the trick :D
– Nautilius
Jul 29 '15 at 12:48

3 Answers
3

pandas.read_csv has a decimal parameter for this: doc

pandas.read_csv

decimal

I.e. try with:

df = pd.read_csv(Input, delimiter=";", decimal=",")

Ty, did the trick.
– Nautilius
Jul 29 '15 at 12:50

I tried it, and it fails on negative numbers.
– PlasmaBinturong
Oct 8 at 13:27

I think the earlier mentioned answer of including decimal="," in pandas read_csv is the preferred option.

decimal=","

However, I found it is incompatible with the Python parsing engine. e.g. when using skiprow=, read_csv will fall back to this engine and thus you can't use skiprow= and decimal= in the same read_csv statement as far as I know. Also, I haven't been able to actually get the decimal= statement to work (probably due to me though)

skiprow=

decimal=

The long way round I used to achieving the same result is with list comprehensions, .replace and .astype. The major downside to this method is that it needs to be done one column at a time:

.replace

.astype

df = pd.DataFrame('a': ['120,00', '42,00', '18,00', '23,00'], 'b': ['51,23', '18,45', '28,90', '133,00']) df['a'] = [x.replace(',', '.') for x in df['a']] df['a'] = df['a'].astype(float)

Now, column a will have float type cells. Column b still contains strings.

Note that the .replace used here is not pandas' but rather Python's built-in version. Pandas' version requires the string to be an exact match or a regex.

.replace

I answer to the question about how to change the decimal comma to the decimal dot with Python Pandas.

comma

dot

$ cat test.py import pandas as pd df = pd.read_csv("test.csv", quotechar='"', decimal=",") df.to_csv("test2.csv", sep=',', encoding='utf-8', quotechar='"', decimal='.')

where we specify the reading in decimal separator as comma while the output separator is specified as dot. So

$ cat test.csv header,header2 1,"2,1" 3,"4,0" $ cat test2.csv ,header,header2 0,1,2.1 1,3,4.0

where you see that the separator has changed to dot.

Thanks for contributing an answer to Stack Overflow!

But avoid …

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

But avoid …

To learn more, see our tips on writing great answers.

Required, but never shown

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Dfyjkt