error when passing broadcast variable into UDF, Pyspark
error when passing broadcast variable into UDF, Pyspark
I have a function, which trys to pass a broadcast variable into UDF.
The function looks like:
def generate_lookup_code(self, lookup_map):
lookup_map_broadcast = spark_session.sparkContext.broadcast(lookup_map)
print("lookup_map has been broadcasted")
#### UDF function only return a constant string###
def _generate_code(bc_reasoncode_lookup_map):
reasoncode_lookup_map = bc_reasoncode_lookup_map.value
return "hello"
udfGenerateCode = F.udf(_generate_code, StringType())
input_df = input_df.withColumn('code', udfGenerateCode(lookup_map_broadcast))
input_df.show()
My intention is only trying to pass the broadcast variable to the UDF, however, I got the error:
'Broadcast' object has no attribute '_get_object_id'
I have no idea where is wrong?
1 Answer
1
You don't need to pass a broadcasted variable as a UDF argument, just reference it from the function:
lookup_map_broadcast = spark_session.sparkContext.broadcast(lookup_map)
def _generate_code():
reasoncode_lookup_map = lookup_map_broadcast.value
return "hello"
udfGenerateCode = F.udf(_generate_code, StringType())
input_df = input_df.withColumn('code', udfGenerateCode())
A UDF is called for each row and it can accept either a column or literal.
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.