Add custom functions with optimisations (hence not as UDF)

Add custom functions with optimisations (hence not as UDF)



I am struggling with optimisation of my custom functions currently being passed as UDFs. We take transformations configurably via a format like below, and hence cannot explicitly code transformation logic per setting.


transforms: [

col: "id", expr: """ cast(someCustomFunction(aColumn) as string) """
col: "date", expr: """ date_format(cast(unix_timestamp(someColumn, "yyyyMMddHHmmss") as Timestamp), "yyyyMMdd") """
],



I have registered someCustomFunction but I want to optimise this by somehow not creating it as a UDF since Spark blackboxes UDFs. I want to know what is the best approach for achieving this (and then sleeping peacefully):



I have been grappling with this for 3 days hence any help (preferably with a code sample) would be a giant Karmic brownie.





can you elaborate on why udf is bad for you?
– shay__
Aug 29 at 12:00





because when I look at the physical plan, all UDFs just show up as UDFs instead of generated codes. This is also mentioned in all docs ... jaceklaskowski.gitbooks.io/mastering-spark-sql/…
– aasthetic
Aug 30 at 4:14





and why is that bad? I don't quite understand what is that you are trying to achieve.
– shay__
Aug 30 at 6:54





they are not optimised b spark, the link above would give you more insights
– aasthetic
Aug 30 at 7:07





Are you using your udfs for filtering? If not, there is no problem.
– shay__
Aug 30 at 7:25









By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)