這篇文章主要講解了“Spark提交Yarn的詳細(xì)過程”,文中的講解內(nèi)容簡單清晰,易于學(xué)習(xí)與理解,下面請(qǐng)大家跟著小編的思路慢慢深入,一起來研究和學(xué)習(xí)“Spark提交Yarn的詳細(xì)過程”吧!
創(chuàng)新互聯(lián)專注為客戶提供全方位的互聯(lián)網(wǎng)綜合服務(wù),包含不限于做網(wǎng)站、成都網(wǎng)站設(shè)計(jì)、邳州網(wǎng)絡(luò)推廣、微信小程序、邳州網(wǎng)絡(luò)營銷、邳州企業(yè)策劃、邳州品牌公關(guān)、搜索引擎seo、人物專訪、企業(yè)宣傳片、企業(yè)代運(yùn)營等,從售前售中售后,我們都將竭誠為您服務(wù),您的肯定,是我們最大的嘉獎(jiǎng);創(chuàng)新互聯(lián)為所有大學(xué)生創(chuàng)業(yè)者提供邳州建站搭建服務(wù),24小時(shí)服務(wù)熱線:18980820575,官方網(wǎng)址:www.cdcxhl.com
spark-submit.sh-> spark-class.sh,然后調(diào)用SparkSubmit.scala。
根據(jù)client或者cluster模式處理方式不一樣。
client:直接在spark-class.sh運(yùn)行的地方包裝要給進(jìn)程來執(zhí)行driver。
cluster:將driver提交到集群去執(zhí)行。
核心在SparkSubmit.scala的prepareSubmitEnvironment方法中,截取一段處理Yarn集群環(huán)境的看一下。
// In client mode, launch the application main class directly // In addition, add the main application jar and any added jars (if any) to the classpath if (deployMode == CLIENT) { childMainClass = args.mainClass if (localPrimaryResource != null && isUserJar(localPrimaryResource)) { childClasspath += localPrimaryResource } if (localJars != null) { childClasspath ++= localJars.split(",") } }
client模式,childMainClass就是driver的main方法。
接下來看看Yarn cluster模式:
// In yarn-cluster mode, use yarn.Client as a wrapper around the user class if (isYarnCluster) { childMainClass = YARN_CLUSTER_SUBMIT_CLASS if (args.isPython) { childArgs += ("--primary-py-file", args.primaryResource) childArgs += ("--class", "org.apache.spark.deploy.PythonRunner") } else if (args.isR) { val mainFile = new Path(args.primaryResource).getName childArgs += ("--primary-r-file", mainFile) childArgs += ("--class", "org.apache.spark.deploy.RRunner") } else { if (args.primaryResource != SparkLauncher.NO_RESOURCE) { childArgs += ("--jar", args.primaryResource) } childArgs += ("--class", args.mainClass) } if (args.childArgs != null) { args.childArgs.foreach { arg => childArgs += ("--arg", arg) } } }
這時(shí)候childMainClass變成了
YARN_CLUSTER_SUBMIT_CLASS = "org.apache.spark.deploy.yarn.YarnClusterApplication"
private[spark] class YarnClusterApplication extends SparkApplication { override def start(args: Array[String], conf: SparkConf): Unit = { // SparkSubmit would use yarn cache to distribute files & jars in yarn mode, // so remove them from sparkConf here for yarn mode. conf.remove(JARS) conf.remove(FILES) new Client(new ClientArguments(args), conf, null).run() } }
看源碼可以看到,YarnClusterApplication最終是用到了deploy/yarn/Client.scala
client.run調(diào)用client.submitApplication方法提交到Y(jié)arn集群。
def submitApplication(): ApplicationId = { // Set up the appropriate contexts to launch our AM val containerContext = createContainerLaunchContext(newAppResponse) val appContext = createApplicationSubmissionContext(newApp, containerContext) }
主要是createContainerLaunchContext方法:
/** * Set up a ContainerLaunchContext to launch our ApplicationMaster container. * This sets up the launch environment, java options, and the command for launching the AM. */ private def createContainerLaunchContext(newAppResponse: GetNewApplicationResponse){ val userClass = if (isClusterMode) { Seq("--class", YarnSparkHadoopUtil.escapeForShell(args.userClass)) } else { Nil } val amClass = if (isClusterMode) { Utils.classForName("org.apache.spark.deploy.yarn.ApplicationMaster").getName } else { Utils.classForName("org.apache.spark.deploy.yarn.ExecutorLauncher").getName } val amArgs = Seq(amClass) ++ userClass ++ userJar ++ primaryPyFile ++ primaryRFile ++ userArgs ++ Seq("--properties-file", buildPath(Environment.PWD.$$(), LOCALIZED_CONF_DIR, SPARK_CONF_FILE)) ++ Seq("--dist-cache-conf", buildPath(Environment.PWD.$$(), LOCALIZED_CONF_DIR, DIST_CACHE_CONF_FILE)) // Command for the ApplicationMaster val commands = prefixEnv ++ Seq(Environment.JAVA_HOME.$$() + "/bin/java", "-server") ++ javaOpts ++ amArgs ++ Seq( "1>", ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout", "2>", ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr") }
這樣就生成要執(zhí)行的命令了,就是Command。上面這句話啥意思呢:
(1)cluster模式
用ApplicationMaster啟動(dòng)userClass。
(2)client模式
啟動(dòng)Executor
這里我們要看的是cluster模式,至此就清楚了,在cluster模式下,在Yarn集群中用ApplicationMaster包裝了userClass并啟動(dòng)。userClass就是driver的意思。
感謝各位的閱讀,以上就是“Spark提交Yarn的詳細(xì)過程”的內(nèi)容了,經(jīng)過本文的學(xué)習(xí)后,相信大家對(duì)Spark提交Yarn的詳細(xì)過程這一問題有了更深刻的體會(huì),具體使用情況還需要大家實(shí)踐驗(yàn)證。這里是創(chuàng)新互聯(lián),小編將為大家推送更多相關(guān)知識(shí)點(diǎn)的文章,歡迎關(guān)注!