Add custom functions with optimisations (hence not as UDF)
Add custom functions with optimisations (hence not as UDF)
I am struggling with optimisation of my custom functions currently being passed as UDFs. We take transformations configurably via a format like below, and hence cannot explicitly code transformation logic per setting.
transforms: [
col: "id", expr: """ cast(someCustomFunction(aColumn) as string) """
col: "date", expr: """ date_format(cast(unix_timestamp(someColumn, "yyyyMMddHHmmss") as Timestamp), "yyyyMMdd") """
],
I have registered someCustomFunction but I want to optimise this by somehow not creating it as a UDF since Spark blackboxes UDFs. I want to know what is the best approach for achieving this (and then sleeping peacefully):
I have been grappling with this for 3 days hence any help (preferably with a code sample) would be a giant Karmic brownie.
because when I look at the physical plan, all UDFs just show up as UDFs instead of generated codes. This is also mentioned in all docs ... jaceklaskowski.gitbooks.io/mastering-spark-sql/…
– aasthetic
Aug 30 at 4:14
and why is that bad? I don't quite understand what is that you are trying to achieve.
– shay__
Aug 30 at 6:54
they are not optimised b spark, the link above would give you more insights
– aasthetic
Aug 30 at 7:07
Are you using your udfs for filtering? If not, there is no problem.
– shay__
Aug 30 at 7:25
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
can you elaborate on why udf is bad for you?
– shay__
Aug 29 at 12:00