How to convert dates in “DD/MM/YYYY” format in a pyspark dataframe?
How to convert dates in “DD/MM/YYYY” format in a pyspark dataframe?
I have a dataframe:
|2018/12/04| mop-MD00094(Kg.)| 2500.0| 147.0| 367500.0|
|2018/12/04| RSE-RM00014(Kg.)| 29000.0| 18.8| 545200.0|
|13/04/2018| BETAIN-MD00032(Kg.)| 200.0| 177.0| 35400.0|
|13/04/2018| MGL-RM00005(Kg.)| 25000.0| 42.5| 1115625.0|
|13/04/2018| MZE-RM00007(Kg.)| 24750.0| 14.1| 348975.0|
Here you can see dates for first two rows are in format YYYY/DD/MM format and for rest it's in DD/MM/YYYY format, I want to change all of this to DD/MM/YYYY format. How to do this?
The schema is:
root
|-- sDate: string (nullable = true)
|-- Items: string (nullable = true)
|-- purchasedQTY: double (nullable = true)
|-- rate: double (nullable = true)
|-- purchasedVolume: double (nullable = true)
None
1 Answer
1
Use regexp_replace
:
regexp_replace
df.withColumn("sDate", regexp_replace($"sDate", "([0-9]4)/([0-9]2)/([0-9]2)", "$2/$3/$1"))
The regular expression will match the year, month and day in three capture groups. The rearranged date will then be used to replace the old one. Since the regex will only match string with the format that is to be replaced, nothing will happen to the other rows.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.