本篇內(nèi)容主要講解“Flink提交任務(wù)的方法是什么”,感興趣的朋友不妨來看看。本文介紹的方法操作簡單快捷,實(shí)用性強(qiáng)。下面就讓小編來帶大家學(xué)習(xí)“Flink提交任務(wù)的方法是什么”吧!
創(chuàng)新互聯(lián)建站專注于網(wǎng)站設(shè)計制作、成都網(wǎng)站設(shè)計、網(wǎng)頁設(shè)計、網(wǎng)站制作、網(wǎng)站開發(fā)。公司秉持“客戶至上,用心服務(wù)”的宗旨,從客戶的利益和觀點(diǎn)出發(fā),讓客戶在網(wǎng)絡(luò)營銷中找到自己的駐足之地。尊重和關(guān)懷每一位客戶,用嚴(yán)謹(jǐn)?shù)膽B(tài)度對待客戶,用專業(yè)的服務(wù)創(chuàng)造價值,成為客戶值得信賴的朋友,為客戶解除后顧之憂。
任務(wù)提交過程中有三個重要組件:Dispatcher、JobMaster、JobManagerRunnerImpl。通過下面調(diào)用路徑先找到MiniDispatcher:
YarnJobClusterEntrypoint的main() -> ClusterEntrypoint的runCluster() -> DefaultDispatcherResourceManagerComponentFactory的create() -> DefaultDispatcherRunnerFactory的createDispatcherRunner() -> DefaultDispatcherRunner的grantLeadership() -> JobDispatcherLeaderProcess的onStart() -> DefaultDispatcherGatewayServiceFactory的create() -> JobDispatcherFactory的createDispatcher() -> MiniDispatcher的start()
(1)Dispatcher
負(fù)責(zé)接收任務(wù)提交請求,并分給JobManager執(zhí)行;
Dispatcher啟動時,會運(yùn)行startRecoveredJobs()來啟動需要恢復(fù)的任務(wù)。當(dāng)Flink on Yarn模式時,MiniDispatcher將當(dāng)前任務(wù)傳入到需要恢復(fù)的任務(wù)中,這樣就實(shí)現(xiàn)了任務(wù)的提交啟動
(2)JobManagerRunner
負(fù)責(zé)運(yùn)行JobMaster
(3)JobMaster
負(fù)責(zé)運(yùn)行任務(wù),對應(yīng)舊版的JobManager;
一個任務(wù)對應(yīng)一個JobMaster;
在JobMaster中通過Scheduler、Execution組件來執(zhí)行一個任務(wù)。將任務(wù)DAG中每個節(jié)點(diǎn)算子分配給TaskManager中的TaskExecutor運(yùn)行。
Execution的start()方法中通過rpc遠(yuǎn)程調(diào)用TaskExecutor的submitTask()方法:
public void deploy() throws JobException { ...... try { ...... final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway(); final ComponentMainThreadExecutor jobMasterMainThreadExecutor = vertex.getExecutionGraph().getJobMasterMainThreadExecutor(); CompletableFuture.supplyAsync(() -> taskManagerGateway.submitTask(deployment, rpcTimeout), executor) .thenCompose(Function.identity()) .whenCompleteAsync( ....., jobMasterMainThreadExecutor); } catch (Throwable t) { ...... } }
TaskExecutor的submitTask()方法中通過創(chuàng)建org.apache.flink.runtime.taskmanager.Task來運(yùn)行算子任務(wù)。Task的doRun()方法中通過算子節(jié)點(diǎn)對應(yīng)的執(zhí)行類AbstractInvokable來運(yùn)行算子的處理邏輯,每個算子對應(yīng)的執(zhí)行類AbstractInvokable在客戶端提交任務(wù)時確定,StreamExecutionEnvironment的addOperator():
publicvoid addOperator( Integer vertexID, @Nullable String slotSharingGroup, @Nullable String coLocationGroup, StreamOperatorFactory operatorFactory, TypeInformation inTypeInfo, TypeInformation outTypeInfo, String operatorName) { Class extends AbstractInvokable> invokableClass = operatorFactory.isStreamSource() ? SourceStreamTask.class : OneInputStreamTask.class; addOperator(vertexID, slotSharingGroup, coLocationGroup, operatorFactory, inTypeInfo, outTypeInfo, operatorName, invokableClass); }
當(dāng)是流式任務(wù)時,調(diào)用StreamTask的invoke()方法。當(dāng)是source節(jié)點(diǎn)時,通過調(diào)用鏈 StreamTask.invoke() -> StreamTask.runMailboxLoop() -> MailboxProcessor.runMailboxLoop() -> SourceStreamTask.processInput() :
protected void processInput(MailboxDefaultAction.Controller controller) throws Exception { controller.suspendDefaultAction(); // Against the usual contract of this method, this implementation is not step-wise but blocking instead for // compatibility reasons with the current source interface (source functions run as a loop, not in steps). sourceThread.setTaskDescription(getName()); sourceThread.start(); sourceThread.getCompletionFuture().whenComplete((Void ignore, Throwable sourceThreadThrowable) -> { if (isCanceled() && ExceptionUtils.findThrowable(sourceThreadThrowable, InterruptedException.class).isPresent()) { mailboxProcessor.reportThrowable(new CancelTaskException(sourceThreadThrowable)); } else if (!isFinished && sourceThreadThrowable != null) { mailboxProcessor.reportThrowable(sourceThreadThrowable); } else { mailboxProcessor.allActionsCompleted(); } }); }
創(chuàng)建線程LegacySourceFunctionThread實(shí)例,來開啟單獨(dú)生產(chǎn)數(shù)據(jù)的線程。LegacySourceFunctionThread的run()方法中調(diào)用StreamSource的run()方法:
public void run(final Object lockingObject, final StreamStatusMaintainer streamStatusMaintainer, final Output> collector, final OperatorChain, ?> operatorChain) throws Exception { final TimeCharacteristic timeCharacteristic = getOperatorConfig().getTimeCharacteristic(); final Configuration configuration = this.getContainingTask().getEnvironment().getTaskManagerInfo().getConfiguration(); final long latencyTrackingInterval = getExecutionConfig().isLatencyTrackingConfigured() ? getExecutionConfig().getLatencyTrackingInterval() : configuration.getLong(MetricOptions.LATENCY_INTERVAL); LatencyMarksEmitter latencyEmitter = null; if (latencyTrackingInterval > 0) { latencyEmitter = new LatencyMarksEmitter<>( getProcessingTimeService(), collector, latencyTrackingInterval, this.getOperatorID(), getRuntimeContext().getIndexOfThisSubtask()); } final long watermarkInterval = getRuntimeContext().getExecutionConfig().getAutoWatermarkInterval(); this.ctx = StreamSourceContexts.getSourceContext( timeCharacteristic, getProcessingTimeService(), lockingObject, streamStatusMaintainer, collector, watermarkInterval, -1); try { userFunction.run(ctx); // if we get here, then the user function either exited after being done (finite source) // or the function was canceled or stopped. For the finite source case, we should emit // a final watermark that indicates that we reached the end of event-time, and end inputs // of the operator chain if (!isCanceledOrStopped()) { // in theory, the subclasses of StreamSource may implement the BoundedOneInput interface, // so we still need the following call to end the input synchronized (lockingObject) { operatorChain.endHeadOperatorInput(1); } } } finally { if (latencyEmitter != null) { latencyEmitter.close(); } } }
StreamSource的run()方法中調(diào)用 userFunction.run(ctx); 當(dāng)數(shù)據(jù)源是kafka時,userFunction為FlinkKafkaConsumerBase
最后執(zhí)行run()的headOperator和算子程序userFunction是在添加算子時確定的,比如添加kafka數(shù)據(jù)源時
environment.addSource(new FlinkKafkaConsumer(......));
最后調(diào)用的addSource()方法:
publicDataStreamSource addSource(SourceFunction function, String sourceName, TypeInformation typeInfo) { TypeInformation resolvedTypeInfo = getTypeInfo(function, sourceName, SourceFunction.class, typeInfo); boolean isParallel = function instanceof ParallelSourceFunction; clean(function); final StreamSource sourceOperator = new StreamSource<>(function); return new DataStreamSource<>(this, resolvedTypeInfo, sourceOperator, isParallel, sourceName); }
headOperator為StreamSource,StreamSource中的userFunction為FlinkKafkaConsumer
到此,相信大家對“Flink提交任務(wù)的方法是什么”有了更深的了解,不妨來實(shí)際操作一番吧!這里是創(chuàng)新互聯(lián)網(wǎng)站,更多相關(guān)內(nèi)容可以進(jìn)入相關(guān)頻道進(jìn)行查詢,關(guān)注我們,繼續(xù)學(xué)習(xí)!