Spark Kafka Structured Streaming: Issue - Concurrent update to the log. Multiple streaming jobs detected
Spark Kafka Structured Streaming: Issue - Concurrent update to the log. Multiple streaming jobs detected
I am experimenting running structured streaming from kafka source and sinking them back to kafka topics.
In my current set up, I am scheduling two spark jobs via spark-submit.
Each job reads from it's own unique Kafka topic. But both of them write to a shared topic.
My current spark-defaults.conf includes:
spark.streaming.concurrentJobs 5
spark.scheduler.mode FAIR
When both jobs are scheduled independently, they work as expected. However, when I try to schedule them together, by submitting one after the other, the job submitted first stops responding with logs:
java.lang.AssertionError: assertion failed: Concurrent update to the log. Multiple streaming jobs detected for 10
at scala.Predef$.assert(Predef.scala:170)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply$mcV$sp(MicroBatchExecution.scala:339)
Are there some confs that I am missing? How do we schedule concurrent jobs writing to the same Kafka topic in Spark? Appreciate your thoughts.
Edit: writing to the same Kafka topic
Edit: Formatted Question Title
SparkSession
Each of the jobs has it's own SparkSession. The main Object builds the SparkSession (using SparkSession.builder) and does its Kafka read, compute, and Kafka write.
– irrelevantUser
Sep 14 '18 at 8:05
0
Thanks for contributing an answer to Stack Overflow!
But avoid …
To learn more, see our tips on writing great answers.
Required, but never shown
Required, but never shown
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
How do you build a
SparkSession
?– Jacek Laskowski
Sep 14 '18 at 5:02