YARN是Hadoop2.x才有的,所以在介紹YARN之前,我們先看一下MapReduce1.x時所存在的問題:
網站建設哪家好,找創(chuàng)新互聯(lián)!專注于網頁設計、網站建設、微信開發(fā)、小程序設計、集團企業(yè)網站建設等服務項目。為回饋新老客戶創(chuàng)新互聯(lián)還提供了邵武免費建站歡迎大家使用!
MapReduce1.x時的架構如下:
可以看到,1.x時也是Master/Slave這種主從結構,在集群上的表現(xiàn)就是一個JobTracker帶多個TaskTracker。
JobTracker:負責資源管理和作業(yè)調度
TaskTracker:定期向JobTracker匯報本節(jié)點的健康狀況、資源使用情況以及作業(yè)執(zhí)行情況。還可以接收來自JobTracker的命令,例如啟動任務或結束任務等。
那么這種架構存在哪些問題呢:
由于1.x版本不支持其他框架的作業(yè),所以導致我們需要根據不同的框架去搭建多個集群。這樣就會導致資源利用率比較低以及運維成本過高,因為多個集群會導致服務環(huán)境比較復雜。如下圖:
在上圖中我們可以看到,不同的框架我不僅需要搭建不同的集群。而且這些集群很多時候并不是總是在工作,如上圖可以看到,Hadoop集群在忙的時候Spark就比較閑,Spark集群比較忙的時候Hadoop集群就比較閑,而MPI集群則是整體并不是很忙。這樣就無法高效的利用資源,因為這些不同的集群無法互相使用資源。除此之外,我們還得運維這些個不同的集群,而且文件系統(tǒng)是無法共享的。如果當需要將Hadoop集群上的HDFS里存儲的數(shù)據傳輸?shù)絊park集群上進行計算時,還會耗費相當大的網絡IO流量。
所以我們就想著要把這些集群都合并在一起,讓這些不同的框架能夠運行在同一個集群上,這樣就能解決這各種各樣的問題了。如下圖:
正是因為在1.x中,有各種各樣的問題,才使得YARN得以誕生,而YARN就可以令這些不同的框架運行在同一個集群上,并為它們調度資源。我們來看看Hadoop2.x的架構圖:
在上圖中,我們可以看到,集群最底層的是HDFS,在其之上的就是YARN層,而在YARN層上則是各種不同的計算框架。所以不同計算框架可以共享同一個HDFS集群上的數(shù)據,享受整體的資源調度,進而提高集群資源的利用率,這也就是所謂的 xxx on YARN。
YARN概述:
YARN架構圖,也是Master/Slave結構的:
從上圖中,我們可以看到YARN主要由以下幾個核心組件構成:
1. ResourceManager,簡稱RM,整個集群同一時間提供服務的RM只有一個,它負責集群資源的統(tǒng)一管理和調度。以及還需要處理客戶端的請求,例如:提交作業(yè)或結束作業(yè)等。并且監(jiān)控集群中的NM,一旦某個NM掛了,那么就需要將該NM上運行的任務告訴AM來如何進行處理。
2. NodeManager,簡稱NM,整個集群中會有多個NM,它主要負責自己本身節(jié)點的資源管理和使用,以及定時向RM匯報本節(jié)點的資源使用情況。接收并處理來自RM的各種命令,例如:啟動Container。NM還需要處理來自AM的命令,例如:AM會告訴NM需要啟動多少個Container來跑task。
3. ApplicationMaster,簡稱AM,每個應用程序都對應著一個AM。例如:MapReduce會對應一個、Spark會對應一個。它主要負責應用程序的管理,為應用程序向RM申請資源(Core、Memory),將資源分配給內部的task。AM需要與NM通信,以此來啟動或停止task。task是運行在Container里面的,所以AM也是運行在Container里面。
4. Container,封裝了CPU、Memory等資源的一個容器,相當于是一個任務運行環(huán)境的抽象。
5. Client,客戶端,它可以提交作業(yè)、查詢作業(yè)的運行進度以及結束作業(yè)。
YARN官方文檔地址如下:
https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YARN.html
假設客戶端向ResourceManager提交一個作業(yè),ResourceManager則會為這個作業(yè)分配一個Container。所以ResourceManager會與NodeManager進行通信,要求這個NodeManager啟動一個Container。而這個Container是用來啟動ApplicationMaster的,ApplicationMaster啟動完之后會與ResourceManager進行一個注冊。這時候客戶端就可以通過ResourceManager查詢作業(yè)的運行情況了。然后ApplicationMaster還會到ResourceManager上申請作業(yè)所需要的資源,申請到以后就會到對應的NodeManager之上運行客戶端所提交的作業(yè),然后NodeManager就會把task運行在啟動的Container里。
如下圖:
另外找到兩篇關于YARN執(zhí)行流程不錯的文章:
介紹完基本的理論部分之后,我們來搭建一個偽分布式的單節(jié)點YARN環(huán)境,使用的hadoop版本如下:
官方的安裝文檔地址如下:
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
1.下載并解壓好hadoop-2.6.0-cdh6.7.0,這一步可以參考我之前寫的一篇關于HDFS偽分布式環(huán)境搭建的文章,我這里就不再贅述了。
確保HDFS是正常啟動狀態(tài):
[root@localhost ~]# jps
3827 Jps
3383 NameNode
3500 DataNode
3709 SecondaryNameNode
[root@localhost ~]#
2.編輯mapred-site.xml配置文件,在文件中增加如下內容:
[root@localhost ~]# cd /usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop]# cp mapred-site.xml.template mapred-site.xml # 拷貝模板文件
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop]# vim mapred-site.xml # 增加如下內容
mapreduce.framework.name
yarn
3.編輯yarn-site.xml配置文件,在文件中增加如下內容:
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop]# vim yarn-site.xml # 增加如下內容
yarn.nodemanager.aux-services
mapreduce_shuffle
4.啟動ResourceManager進程以及NodeManager進程:
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/etc/hadoop]# cd ../../sbin/
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/sbin]# ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.6.0-cdh6.7.0/logs/yarn-root-resourcemanager-localhost.out
localhost: starting nodemanager, logging to /usr/local/hadoop-2.6.0-cdh6.7.0/logs/yarn-root-nodemanager-localhost.out
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/sbin]# jps
3984 NodeManager # 啟動成功后可以看到多出了NodeManager
4947 DataNode
5252 Jps
5126 SecondaryNameNode
3884 ResourceManager # 和ResourceManager進程,這樣才是正常的。
4813 NameNode
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/sbin]# netstat -lntp |grep java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 5126/java
tcp 0 0 127.0.0.1:42602 0.0.0.0:* LISTEN 4947/java
tcp 0 0 192.168.77.130:8020 0.0.0.0:* LISTEN 4813/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 4813/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 4947/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 4947/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 4947/java
tcp6 0 0 :::8040 :::* LISTEN 5566/java
tcp6 0 0 :::8042 :::* LISTEN 5566/java
tcp6 0 0 :::8088 :::* LISTEN 5457/java
tcp6 0 0 :::13562 :::* LISTEN 5566/java
tcp6 0 0 :::8030 :::* LISTEN 5457/java
tcp6 0 0 :::8031 :::* LISTEN 5457/java
tcp6 0 0 :::8032 :::* LISTEN 5457/java
tcp6 0 0 :::48929 :::* LISTEN 5566/java
tcp6 0 0 :::8033 :::* LISTEN 5457/java
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/sbin]#
5.通過瀏覽器來訪問ResourceManager,默認端口是8088,例如192.168.77.130:8088
,就會訪問到這樣的一個頁面上:
錯誤解決:
從上圖中,可以看到有一個不健康的節(jié)點,也就是說我們的單節(jié)點環(huán)境有問題,點擊紅色框框中標記的數(shù)字可以進入到詳細的信息頁面,在該頁面中看到了如下信息:
于是查看yarn的日志文件:yarn-root-nodemanager-localhost.log,發(fā)現(xiàn)如下警告與異常:
很明顯是因為磁盤的使用空間達到了90%,所以我們需要刪除一些沒有的數(shù)據,或者擴容磁盤空間才行。于是刪除了一堆安裝包,讓磁盤空間降低到90%以下了:
[root@localhost /usr/local]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 19G 14G 4.5G 76% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 8.7M 3.9G 1% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/sdb 50G 14G 34G 29% /kvm_data
/dev/sda1 497M 127M 371M 26% /boot
tmpfs 781M 0 781M 0% /run/user/0
[root@localhost /usr/local]#
這時再次刷新頁面,可以發(fā)現(xiàn)這個節(jié)點就正常了:
到此為止,我們的yarn環(huán)境就搭建完成了。
如果需要關閉進程則使用以下命令:
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/sbin]# stop-yarn.sh
雖然我們沒有搭建MapReduce的環(huán)境,但是我們可以使用Hadoop自帶的一些測試例子來演示一下如何提交作業(yè)到YARN上執(zhí)行。Hadoop把example的包放在了如下路徑,可以看到有好幾個jar包:
[root@localhost ~]# cd /usr/local/hadoop-2.6.0-cdh6.7.0/share/hadoop/mapreduce/
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/share/hadoop/mapreduce]# ls
hadoop-mapreduce-client-app-2.6.0-cdh6.7.0.jar
hadoop-mapreduce-client-common-2.6.0-cdh6.7.0.jar
hadoop-mapreduce-client-core-2.6.0-cdh6.7.0.jar
hadoop-mapreduce-client-hs-2.6.0-cdh6.7.0.jar
hadoop-mapreduce-client-hs-plugins-2.6.0-cdh6.7.0.jar
hadoop-mapreduce-client-jobclient-2.6.0-cdh6.7.0.jar
hadoop-mapreduce-client-jobclient-2.6.0-cdh6.7.0-tests.jar
hadoop-mapreduce-client-nativetask-2.6.0-cdh6.7.0.jar
hadoop-mapreduce-client-shuffle-2.6.0-cdh6.7.0.jar
hadoop-mapreduce-examples-2.6.0-cdh6.7.0.jar
lib
lib-examples
sources
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/share/hadoop/mapreduce]#
在這里我們使用hadoop-mapreduce-examples-2.6.0-cdh6.7.0.jar
這個jar包來進行演示:
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/share/hadoop/mapreduce]# hadoop jar hadoop-mapreduce-examples-2.6.0-cdh6.7.0.jar pi 2 3
命令說明:
運行以上命令后,到瀏覽器頁面上進行查看,會有以下三個階段:
1.接收資源,這個階段就是ApplicationMaster到ResourceManager上申請作業(yè)所需要的資源:
2.運行作業(yè),這時候NodeManager就會把task運行在啟動的Container里:
3.作業(yè)完成:
終端輸出信息如下:
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/share/hadoop/mapreduce]# hadoop jar hadoop-mapreduce-examples-2.6.0-cdh6.7.0.jar pi 2 3
Number of Maps = 2
Samples per Map = 3
Wrote input for Map #0
Wrote input for Map #1
Starting Job
18/03/27 23:00:01 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/03/27 23:00:01 INFO input.FileInputFormat: Total input paths to process : 2
18/03/27 23:00:01 INFO mapreduce.JobSubmitter: number of splits:2
18/03/27 23:00:02 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1522162696272_0001
18/03/27 23:00:02 INFO impl.YarnClientImpl: Submitted application application_1522162696272_0001
18/03/27 23:00:02 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1522162696272_0001/
18/03/27 23:00:02 INFO mapreduce.Job: Running job: job_1522162696272_0001
18/03/27 23:00:10 INFO mapreduce.Job: Job job_1522162696272_0001 running in uber mode : false
18/03/27 23:00:10 INFO mapreduce.Job: map 0% reduce 0%
18/03/27 23:00:15 INFO mapreduce.Job: map 50% reduce 0%
18/03/27 23:00:16 INFO mapreduce.Job: map 100% reduce 0%
18/03/27 23:00:19 INFO mapreduce.Job: map 100% reduce 100%
18/03/27 23:00:20 INFO mapreduce.Job: Job job_1522162696272_0001 completed successfully
18/03/27 23:00:20 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=50
FILE: Number of bytes written=335298
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=536
HDFS: Number of bytes written=215
HDFS: Number of read operations=11
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=7108
Total time spent by all reduces in occupied slots (ms)=2066
Total time spent by all map tasks (ms)=7108
Total time spent by all reduce tasks (ms)=2066
Total vcore-seconds taken by all map tasks=7108
Total vcore-seconds taken by all reduce tasks=2066
Total megabyte-seconds taken by all map tasks=7278592
Total megabyte-seconds taken by all reduce tasks=2115584
Map-Reduce Framework
Map input records=2
Map output records=4
Map output bytes=36
Map output materialized bytes=56
Input split bytes=300
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=56
Reduce input records=4
Reduce output records=0
Spilled Records=8
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=172
CPU time spent (ms)=2990
Physical memory (bytes) snapshot=803618816
Virtual memory (bytes) snapshot=8354324480
Total committed heap usage (bytes)=760217600
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=236
File Output Format Counters
Bytes Written=97
Job Finished in 19.96 seconds
Estimated value of Pi is 4.00000000000000000000
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/share/hadoop/mapreduce]#
以上這個例子計算了一個PI值,下面我們再來演示一個hadoop中比較經典的例子:wordcount ,這是一個經典的詞頻統(tǒng)計的例子。首先創(chuàng)建好用于測試的文件:
[root@localhost ~]# mkdir /tmp/input
[root@localhost ~]# cd /tmp/input/
[root@localhost /tmp/input]# echo "hello word" > file1.txt
[root@localhost /tmp/input]# echo "hello hadoop" > file2.txt
[root@localhost /tmp/input]# echo "hello mapreduce" >> file2.txt
[root@localhost /tmp/input]# hdfs dfs -mkdir /wc_input
[root@localhost /tmp/input]# hdfs dfs -put ./file* /wc_input
[root@localhost /tmp/input]# hdfs dfs -ls /wc_input
Found 2 items
-rw-r--r-- 1 root supergroup 11 2018-03-27 23:11 /wc_input/file1.txt
-rw-r--r-- 1 root supergroup 29 2018-03-27 23:11 /wc_input/file2.txt
[root@localhost /tmp/input]#
然后執(zhí)行以下命令:
[root@localhost /tmp/input]# cd /usr/local/hadoop-2.6.0-cdh6.7.0/share/hadoop/mapreduce
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/share/hadoop/mapreduce]# hadoop jar ./hadoop-mapreduce-examples-2.6.0-cdh6.7.0.jar wordcount /wc_input /wc_output
在yarn頁面上顯示的階段信息:
終端輸出信息如下:
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/share/hadoop/mapreduce]# hadoop jar ./hadoop-mapreduce-examples-2.6.0-cdh6.7.0.jar wordcount /wc_input /wc_output
18/03/27 23:12:54 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/03/27 23:12:55 INFO input.FileInputFormat: Total input paths to process : 2
18/03/27 23:12:55 INFO mapreduce.JobSubmitter: number of splits:2
18/03/27 23:12:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1522162696272_0002
18/03/27 23:12:56 INFO impl.YarnClientImpl: Submitted application application_1522162696272_0002
18/03/27 23:12:56 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1522162696272_0002/
18/03/27 23:12:56 INFO mapreduce.Job: Running job: job_1522162696272_0002
18/03/27 23:13:02 INFO mapreduce.Job: Job job_1522162696272_0002 running in uber mode : false
18/03/27 23:13:02 INFO mapreduce.Job: map 0% reduce 0%
18/03/27 23:13:06 INFO mapreduce.Job: map 50% reduce 0%
18/03/27 23:13:07 INFO mapreduce.Job: map 100% reduce 0%
18/03/27 23:13:11 INFO mapreduce.Job: map 100% reduce 100%
18/03/27 23:13:12 INFO mapreduce.Job: Job job_1522162696272_0002 completed successfully
18/03/27 23:13:12 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=70
FILE: Number of bytes written=334375
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=260
HDFS: Number of bytes written=36
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=5822
Total time spent by all reduces in occupied slots (ms)=1992
Total time spent by all map tasks (ms)=5822
Total time spent by all reduce tasks (ms)=1992
Total vcore-seconds taken by all map tasks=5822
Total vcore-seconds taken by all reduce tasks=1992
Total megabyte-seconds taken by all map tasks=5961728
Total megabyte-seconds taken by all reduce tasks=2039808
Map-Reduce Framework
Map input records=3
Map output records=6
Map output bytes=64
Map output materialized bytes=76
Input split bytes=220
Combine input records=6
Combine output records=5
Reduce input groups=4
Reduce shuffle bytes=76
Reduce input records=5
Reduce output records=4
Spilled Records=10
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=157
CPU time spent (ms)=2290
Physical memory (bytes) snapshot=800239616
Virtual memory (bytes) snapshot=8352272384
Total committed heap usage (bytes)=762314752
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=40
File Output Format Counters
Bytes Written=36
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/share/hadoop/mapreduce]#
查看輸出的結果文件:
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/share/hadoop/mapreduce]# hdfs dfs -ls /wc_output
Found 2 items
-rw-r--r-- 1 root supergroup 0 2018-03-27 23:13 /wc_output/_SUCCESS
-rw-r--r-- 1 root supergroup 36 2018-03-27 23:13 /wc_output/part-r-00000
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/share/hadoop/mapreduce]# hdfs dfs -cat /wc_output/part-r-00000 # 實際輸出結果在part-r-00000中
hadoop 1
hello 3
mapreduce 1
word 1
[root@localhost /usr/local/hadoop-2.6.0-cdh6.7.0/share/hadoop/mapreduce]#