本篇內(nèi)容主要講解“Spark shuffle和hadoop shuffle的區(qū)別是什么”,感興趣的朋友不妨來看看。本文介紹的方法操作簡單快捷,實用性強。下面就讓小編來帶大家學習“Spark shuffle和hadoop shuffle的區(qū)別是什么”吧!
濟陽網(wǎng)站建設(shè)公司創(chuàng)新互聯(lián),濟陽網(wǎng)站設(shè)計制作,有大型網(wǎng)站制作公司豐富經(jīng)驗。已為濟陽成百上千提供企業(yè)網(wǎng)站建設(shè)服務(wù)。企業(yè)網(wǎng)站搭建\成都外貿(mào)網(wǎng)站建設(shè)要多少錢,請找那個售后服務(wù)好的濟陽做網(wǎng)站的公司定做!
Q1:AppClient和worker、master之間的關(guān)系是什么?
:AppClient是在StandAlone模式下SparkContext.runJob的時候在Client機器上應(yīng) 用程序的代表,要完成程序的registerApplication等功能;
當程序完成注冊后Master會通過Akka發(fā)送消息給客戶端來啟動Driver;
在Driver中管理Task和控制Worker上的Executor來協(xié)同工作;
Q2:Spark的shuffle 和hadoop的shuffle的區(qū)別大么?
Spark的Shuffle是一種比較嚴格意義上的shuffle,在Spark中Shuffle是有RDD操作的依賴關(guān)系中的Lineage上父RDD中的每個partition元素的內(nèi)容交給多個子RDD;
在Hadoop中的Shuffle是一個相對模糊的概念,Mapper階段介紹后把數(shù)據(jù)交給Reducer就會產(chǎn)生Shuffle,Reducer三階段的第一個階段即是Shuffle;
Q3:Spark 的HA怎么處理的?
對于Master的HA,在Standalone模式下,Worker節(jié)點自動是HA的,對于Master的HA,一般采用Zookeeper;
Utilizing ZooKeeper to provide leader election and some state storage, you can launch multiple Masters in your cluster connected to the same ZooKeeper instance. One will be elected “l(fā)eader” and the others will remain in standby mode. If the current leader dies, another Master will be elected, recover the old Master’s state, and then resume scheduling. The entire recovery process (from the time the the first leader goes down) should take between 1 and 2 minutes. Note that this delay only affects scheduling new applications – applications that were already running during Master failover are unaffected;
對于Yarn和Mesos模式,ResourceManager一般也會采用ZooKeeper進行HA;
到此,相信大家對“Spark shuffle和hadoop shuffle的區(qū)別是什么”有了更深的了解,不妨來實際操作一番吧!這里是創(chuàng)新互聯(lián)網(wǎng)站,更多相關(guān)內(nèi)容可以進入相關(guān)頻道進行查詢,關(guān)注我們,繼續(xù)學習!