how to spark submit job to yarn on other cluster?

how to spark submit job to yarn on other cluster?



I have a docker container with spark installed and i am trying to submit job to yarn on other cluster using marathon . The docker container has the exported values of yarn and hadoop conf dir, the yarn file also contains the correct address of the emr master ip , but i am not sure from where its taking as localhost?


ENV YARN_CONF_DIR="/opt/yarn-site.xml"
ENV HADOOP_CONF_DIR="/opt/spark-2.2.0-bin-hadoop2.6"



Yarn.xml


<property>
<name>yarn.resourcemanager.hostname</name>
<value>xx.xxx.x.xx</value>
</property>



Command:


"cmd": "/opt/spark-2.2.0-bin-hadoop2.6/bin/spark-submit --verbose \n --name emr_external_mpv_streaming \n --deploy-mode client \n --master yarn\n --conf spark.executor.instances=4 \n --conf spark.executor.cores=1 \n --conf spark.executor.memory=1g \n --conf spark.driver.memory=1g \n --conf spark.cores.max=4 \n --conf spark.executorEnv.EXT_WH_HOST=$EXT_WH_HOST \n --conf spark.executorEnv.EXT_WH_PASSWORD=$EXT_WH_PASSWORD \n --conf spark.executorEnv.KAFKA_BROKER_LIST=$_KAFKA_BROKER_LIST \n --conf spark.executorEnv.SCHEMA_REGISTRY_URL=$SCHEMA_REGISTRY_URL \n --conf spark.executorEnv.AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \n --conf spark.executorEnv.AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \n --conf spark.executorEnv.STAGING_S3_BUCKET=$STAGING_S3_BUCKET \n --conf spark.executorEnv.KAFKA_GROUP_ID=$KAFKA_GROUP_ID \n --conf spark.executorEnv.MAX_RATE=$MAX_RATE \n --conf spark.executorEnv.KAFKA_MAX_POLL_MS=$KAFKA_MAX_POLL_MS \n --conf spark.executorEnv.KAFKA_MAX_POLL_RECORDS=$KAFKA_MAX_POLL_RECORDS \n --class com.ticketnetwork.edwstream.external.MapPageView \n /opt/edw-stream-external-mpv_2.11-2-SNAPSHOT.jar",



I tried specifying --deploy-mode cluster \n --master yarn\n -- same error



Error:


Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/09/10 20:41:24 INFO SparkContext: Running Spark version 2.2.0
18/09/10 20:41:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/09/10 20:41:25 INFO SparkContext: Submitted application: edw-stream-ext-mpv-emr-prod
18/09/10 20:41:25 INFO SecurityManager: Changing view acls to: root
18/09/10 20:41:25 INFO SecurityManager: Changing modify acls to: root
18/09/10 20:41:25 INFO SecurityManager: Changing view acls groups to:
18/09/10 20:41:25 INFO SecurityManager: Changing modify acls groups to:
18/09/10 20:41:25 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
18/09/10 20:41:25 INFO Utils: Successfully started service 'sparkDriver' on port 35868.
18/09/10 20:41:25 INFO SparkEnv: Registering MapOutputTracker
18/09/10 20:41:25 INFO SparkEnv: Registering BlockManagerMaster
18/09/10 20:41:25 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/09/10 20:41:25 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/09/10 20:41:25 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-5526b967-2be9-44bf-a86f-79ef72f2ac0f
18/09/10 20:41:25 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
18/09/10 20:41:26 INFO SparkEnv: Registering OutputCommitCoordinator
18/09/10 20:41:26 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/09/10 20:41:26 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.150.4.45:4040
18/09/10 20:41:26 INFO SparkContext: Added JAR file:/opt/edw-stream-external-mpv_2.11-2-SNAPSHOT.jar at spark://10.150.4.45:35868/jars/edw-stream-external-mpv_2.11-2-SNAPSHOT.jar with timestamp 1536612086416
18/09/10 20:41:26 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/09/10 20:41:27 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/09/10 20:41:28 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
18/09/10 20:41:29 INFO Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)






If the output says at /0.0.0.0:8032, then something in your XML also says that, so it needs to point at the correct IP/DNS address... And your HADOOP_CONF_DIR need to be Spark's conf folder, not the base folder

– cricket_007
Sep 10 '18 at 23:28



at /0.0.0.0:8032




1 Answer
1



0.0.0.0 is the default hostname property, and 8032 is the default port number.


0.0.0.0



One reason you're getting defaults would be neither of Hadoop environment variables are correctly set. Your HADOOP_CONF_DIR need to be Spark's (or Hadoop's) conf folder, not the base folder from the Spark extraction. This directory must contain core-site.xml, yarn-site.xml, hdfs-site.xml, and hive-site.xml if using HiveContext


HADOOP_CONF_DIR


conf


core-site.xml


yarn-site.xml


hdfs-site.xml


hive-site.xml



Then if yarn-site.xml is in the above location, you don't need YARN_CONF_DIR, but if you do set it, it needs to be an actual directory, not directly to the file.


YARN_CONF_DIR



Additionally, you'll probably need to set more than just one hostname. For example, a production grade YARN cluster would have two ResourceManagers for fault tolerance. Additionally, maybe some Kerberos keytabs and principals would need set if you had that enabled.



If you already have Mesos/Marathon, though, I'm not sure why you'd want to use YARN






Also core-site.xml > defines default FS, security

– Samson Scharfrichter
Sep 12 '18 at 10:46


core-site.xml






Oops, forgot about that one

– cricket_007
Sep 12 '18 at 13:14



Thanks for contributing an answer to Stack Overflow!



But avoid



To learn more, see our tips on writing great answers.



Required, but never shown



Required, but never shown




By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

𛂒𛀶,𛀽𛀑𛂀𛃧𛂓𛀙𛃆𛃑𛃷𛂟𛁡𛀢𛀟𛁤𛂽𛁕𛁪𛂟𛂯,𛁞𛂧𛀴𛁄𛁠𛁼𛂿𛀤 𛂘,𛁺𛂾𛃭𛃭𛃵𛀺,𛂣𛃍𛂖𛃶 𛀸𛃀𛂖𛁶𛁏𛁚 𛂢𛂞 𛁰𛂆𛀔,𛁸𛀽𛁓𛃋𛂇𛃧𛀧𛃣𛂐𛃇,𛂂𛃻𛃲𛁬𛃞𛀧𛃃𛀅 𛂭𛁠𛁡𛃇𛀷𛃓𛁥,𛁙𛁘𛁞𛃸𛁸𛃣𛁜,𛂛,𛃿,𛁯𛂘𛂌𛃛𛁱𛃌𛂈𛂇 𛁊𛃲,𛀕𛃴𛀜 𛀶𛂆𛀶𛃟𛂉𛀣,𛂐𛁞𛁾 𛁷𛂑𛁳𛂯𛀬𛃅,𛃶𛁼

Edmonton

Crossroads (UK TV series)