首先再看一下四臺VM在集群中擔(dān)任的角色信息:
成都創(chuàng)新互聯(lián)服務(wù)項目包括邯山網(wǎng)站建設(shè)、邯山網(wǎng)站制作、邯山網(wǎng)頁制作以及邯山網(wǎng)絡(luò)營銷策劃等。多年來,我們專注于互聯(lián)網(wǎng)行業(yè),利用自身積累的技術(shù)優(yōu)勢、行業(yè)經(jīng)驗、深度合作伙伴關(guān)系等,向廣大中小型企業(yè)、政府機構(gòu)等提供互聯(lián)網(wǎng)行業(yè)的解決方案,邯山網(wǎng)站推廣取得了明顯的社會效益與經(jīng)濟效益。目前,我們服務(wù)的客戶以成都為中心已經(jīng)輻射到邯山省份的部分城市,未來相信會繼續(xù)擴大服務(wù)區(qū)域并繼續(xù)獲得客戶的支持與信任!IP 主機名 hadoop集群擔(dān)任角色 10.0.1.100 hadoop-test-nn NameNode,ResourceManager 10.0.1.101 hadoop-test-snn SecondaryNameNode 10.0.1.102 hadoop-test-dn1 DataNode,NodeManager 10.0.1.103 hadoop-test-dn2 DataNode,NodeManager
1. 將得到的hadoop-2.6.5.tar.gz 解壓到/usr/local/下,并建立/usr/local/hadoop軟鏈接。
mv hadoop-2.6.5.tar.gz /usr/local/ tar -xvf hadoop-2.6.5.tar.gz ln -s /usr/local/hadoop-2.6.5 /usr/local/hadoop
2. 將/usr/local/hadoop,/usr/local/hadoop-2.6.5屬主屬組修改為hadoop,保證hadoop用戶可以使用:
chown -R hadoop:hadoop /usr/local/hadoop-2.6.5 chown -R hadoop:hadoop /usr/local/hadoop
3. 為方便使用,配置HADOOP_HOME變量和修改PATH變量,在/etc/profile中添加如下記錄:
export HADOOP_HOME=/usr/local/hadoop export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
4. hadoop的配置文件存放在$HADOOP_HOME/etc/hadoop/目錄下,我們通過對該目錄下配置文件中的屬性進行修改來完成環(huán)境搭建工作:
1)修改hadoop-env.sh腳本,設(shè)置該腳本中的JAVA_HOME變量:
#在hadoop-env.sh中注釋并添加如下行 #export JAVA_HOME=${JAVA_HOME} export JAVA_HOME=/usr/local/java/jdk1.7.0_45
2)創(chuàng)建masters文件,該文件用于指定哪些主機擔(dān)任SecondaryNameNode的角色,在master文件中添加SecondaryNameNode的主機名:
#在masters添加如下行 hadoop-test-snn
3)創(chuàng)建slaves文件,該文件用于指定哪些主機擔(dān)任DataNode的角色,在slaves文件中添加DataNode的主機名:
#在slaves添加如下行 hadoop-test-dn1 hadoop-test-dn2
4)修改core-site.xml文件中的屬性值,設(shè)置hdfs的url和hdfs臨時文件目錄:
fs.defaultFS hdfs://hadoop-test-nn:8020 hadoop.tmp.dir /hadoop/dfs/tmp
5)修改hdfs-site.xml文件中的屬性值,進行hdfs,NameNode,DataNode相關(guān)的屬性配置:
dfs.http.address hadoop-test-nn:50070 dfs.namenode.secondary.http-address hadoop-test-snn:50090 dfs.namenode.name.dir /hadoop/dfs/name dfs.datanode.name.dir /hadoop/dfs/data dfs.datanode.ipc.address 0.0.0.0:50020 dfs.datanode.http.address 0.0.0.0:50075 dfs.replication 2
屬性值說明:
dfs.http.address:NameNode的web監(jiān)控頁面地址,默認監(jiān)聽在50070端口
dfs.namenode.secondary.http-address: SecondaryNameNode的web監(jiān)控頁面地址,默認監(jiān)聽在50090端口
dfs.namenode.name.dir:NameNode元數(shù)據(jù)在hdfs上保存的位置
dfs.datanode.name.dir:DataNode元數(shù)據(jù)在hdfs上保存的位置
dfs.datanode.ipc.address:DataNode的ipc監(jiān)聽端口,該端口通過心跳傳輸信息給NameNode
dfs.datanode.http.address:DataNode的web監(jiān)控頁面地址,默認監(jiān)聽在50075端口
dfs.replication:hdfs上每份數(shù)據(jù)的復(fù)制份數(shù)
6)修改mapred-site.xml,開發(fā)框架采用yarn架構(gòu):
mapreduce.framework.name yarn
7)既然采用了yarn架構(gòu),就有必要對yarn的相關(guān)屬性進行配置,在yarn-site.xml中進行如下修改:
yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.hostname hadoop-test-nn The address of the applications manager interface yarn.resourcemanager.address ${yarn.resourcemanager.hostname}:8040 The address of the scheduler interface yarn.resourcemanager.scheduler.address ${yarn.resourcemanager.hostname}:8030 The http address of the RM web application. yarn.resourcemanager.webapp.address ${yarn.resourcemanager.hostname}:8088 yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}:8025
屬性值說明:
yarn.resourcemanager.hostname:ResourceManager所在節(jié)點主機名
yarn.nodemanager.aux-services:在NodeManager節(jié)點上進行擴展服務(wù)的配置,指定為mapreduce-shuffle時,我們編寫的mapreduce程序就可以實現(xiàn)從map task輸出到reduce task
yarn.resourcemanager.address:NodeManager通過該端口同ResourceManager進行通信,默認監(jiān)聽在8032端口(本文所用配置修改了端口)
yarn.resourcemanager.scheduler.address:ResourceManager提供的調(diào)度服務(wù)接口地址,也是在eclipse中配置mapreduce location時,Map/Reduce Master一欄所填的地址。默認監(jiān)聽在8030端口
yarn.resourcemanager.webapp.address:ResourceManager的web監(jiān)控頁面地址,默認監(jiān)聽在8088端口
yarn.resourcemanager.resource-tracker.address:NodeManager通過該端口向ResourceManager報告任務(wù)運行狀態(tài)以便ResourceManagerg跟蹤任務(wù)。默認監(jiān)聽在8031端口(本文所用配置修改了端口)
還有其他屬性值,如yarn.resourcemanager.admin.address 用于發(fā)送管理命令的地址、yarn.resourcemanager.resource-tracker.client.thread-count 可以處理的通過RPC請求發(fā)送過來的handler個數(shù)等,如果需要,請在該配置文件中添加。
8)將修改過的配置文件復(fù)制到各個節(jié)點:
scp core-site.xml hdfs-site.xml mapred-site.xml masters slaves yarn-site.xml hadoop-test-snn:/usr/local/hadoop/etc/hadoop/ scp core-site.xml hdfs-site.xml mapred-site.xml masters slaves yarn-site.xml hadoop-test-dn1:/usr/local/hadoop/etc/hadoop/ scp core-site.xml hdfs-site.xml mapred-site.xml masters slaves yarn-site.xml hadoop-test-dn2:/usr/local/hadoop/etc/hadoop/
9)NameNode格式化操作。第一次使用hdfs時,需要對NameNode節(jié)點進行格式化操作,而格式化的路徑應(yīng)為hdfs-site.xml中眾多以dir結(jié)尾命名的屬性所指定的路徑的父目錄,這里指定的路徑都是文件系統(tǒng)上的絕對路徑。如果用戶對其父目錄具有完全控制權(quán)限時,這些屬性指定的目錄是可以在hdfs啟動時被自動創(chuàng)建。
因此首先建立/hadoop目錄,并更改該目錄屬主屬組為hadoop:
mkdir /hadoop chown -R hadoop:hadoop /hadoop
再使用hadoop用戶進行NameNode的格式化操作:
su - hadoop $HADOOP_HOME/bin/hdfs namenode -format
注:請關(guān)注該命令執(zhí)行過程中輸出的日志信息,如果出現(xiàn)錯誤或異常提示,請先檢查指定目錄的權(quán)限,問題有可能出在這里。
10)啟動hadoop集群服務(wù):在NameNode成功格式化以后,可以使用$HADOOP_HOME/sbin/下的腳本來啟停節(jié)點的服務(wù),在NameNode節(jié)點上可以使用start/stop-yarn.sh和start/stop-dfs.sh來啟停yarn和HDFS,也可以使用start/stop-all.sh來啟停所有節(jié)點上的服務(wù),或者使用hadoop-daemon.sh啟停指定節(jié)點上的特定服務(wù),這里使用start-all.sh啟動所有節(jié)點上的服務(wù):
start-all.sh
注:在啟動過程中,輸出的日志會顯示啟動的服務(wù)的過程,并且會將日志以*.out保存在特定的目錄下,如果發(fā)現(xiàn)有特定的服務(wù)沒有啟動成功,可以查看日志來進行排錯。
11)查看運行情況。啟動完成后,使用jps命令可以看到相關(guān)的運行的進程。因為服務(wù)不同,不同節(jié)點上進程是不同的:
NameNode 10.0.1.100: [hadoop@hadoop-test-nn ~]$ jps 4226 NameNode 4487 ResourceManager 9796 Jps 10.0.1.101 SecondaryNameNode: [hadoop@hadoop-test-snn ~]$ jps 4890 Jps 31518 SecondaryNameNode 10.0.1.102 DataNode: [hadoop@hadoop-test-dn1 ~]$ jps 31421 DataNode 2888 Jps 31532 NodeManager 10.0.1.103 DataNode: [hadoop@hadoop-test-dn2 ~]$ jps 29786 DataNode 29896 NodeManager 1164 Jps
至此,Hadoop完全分布式環(huán)境搭建完成。
12)運行測試程序
可以使用提供的mapreduce示例程序wordcount來驗證hadoop環(huán)境是否正常運行,該程序被包含在$HADOOP_HOME/share/hadoop/mapreduce/目錄下的hadoop-mapreduce-examples-2.6.5.jar包中,使用命令格式為
hadoop jar hadoop-mapreduce-examples-2.6.5.jar wordcount <輸入文件> [<輸入文件>...] <輸出目錄>
首先上傳一個文件到HDFS的/test_wordcount目錄下,這里采用/etc/profile進行測試:
#在hdfs上建立/test_wordcount目錄 [hadoop@hadoop-test-nn mapreduce]$ hdfs dfs -mkdir /test_wordcount #將/etc/profile上傳到/test_wordcount目錄下 [hadoop@hadoop-test-nn mapreduce]$ hdfs dfs -put /etc/profile /test_wordcount [hadoop@hadoop-test-nn mapreduce]$ hdfs dfs -ls /test_wordcount Found 1 items -rw-r--r-- 2 hadoop supergroup 2064 2017-08-06 21:28 /test_wordcount/profile #使用wordcount程序進行測試 [hadoop@hadoop-test-nn mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.6.5.jar wordcount /test_wordcount/profile /test_wordcount_out 17/08/06 21:30:11 INFO client.RMProxy: Connecting to ResourceManager at hadoop-test-nn/10.0.1.100:8040 17/08/06 21:30:13 INFO input.FileInputFormat: Total input paths to process : 1 17/08/06 21:30:13 INFO mapreduce.JobSubmitter: number of splits:1 17/08/06 21:30:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1501950606475_0001 17/08/06 21:30:14 INFO impl.YarnClientImpl: Submitted application application_1501950606475_0001 17/08/06 21:30:14 INFO mapreduce.Job: The url to track the job: http://hadoop-test-nn:8088/proxy/application_1501950606475_0001/ 17/08/06 21:30:14 INFO mapreduce.Job: Running job: job_1501950606475_0001 17/08/06 21:30:29 INFO mapreduce.Job: Job job_1501950606475_0001 running in uber mode : false 17/08/06 21:30:29 INFO mapreduce.Job: map 0% reduce 0% 17/08/06 21:30:39 INFO mapreduce.Job: map 100% reduce 0% 17/08/06 21:30:49 INFO mapreduce.Job: map 100% reduce 100% 17/08/06 21:30:50 INFO mapreduce.Job: Job job_1501950606475_0001 completed successfully 17/08/06 21:30:51 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=2320 FILE: Number of bytes written=219547 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=2178 HDFS: Number of bytes written=1671 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=7536 Total time spent by all reduces in occupied slots (ms)=8136 Total time spent by all map tasks (ms)=7536 Total time spent by all reduce tasks (ms)=8136 Total vcore-milliseconds taken by all map tasks=7536 Total vcore-milliseconds taken by all reduce tasks=8136 Total megabyte-milliseconds taken by all map tasks=7716864 Total megabyte-milliseconds taken by all reduce tasks=8331264 Map-Reduce Framework Map input records=84 Map output records=268 Map output bytes=2880 Map output materialized bytes=2320 Input split bytes=114 Combine input records=268 Combine output records=161 Reduce input groups=161 Reduce shuffle bytes=2320 Reduce input records=161 Reduce output records=161 Spilled Records=322 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=186 CPU time spent (ms)=1850 Physical memory (bytes) snapshot=310579200 Virtual memory (bytes) snapshot=1682685952 Total committed heap usage (bytes)=164630528 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=2064 File Output Format Counters Bytes Written=1671
檢查輸出日志,沒有錯誤產(chǎn)生,在/test_wordcount_out目錄下查看結(jié)果:
[hadoop@hadoop-test-nn mapreduce]$ hdfs dfs -ls /test_wordcount_out Found 2 items -rw-r--r-- 2 hadoop supergroup 0 2017-08-06 21:30 /test_wordcount_out/_SUCCESS -rw-r--r-- 2 hadoop supergroup 1671 2017-08-06 21:30 /test_wordcount_out/part-r-00000 [hadoop@hadoop-test-nn mapreduce]$ hdfs dfs -cat /test_wordcount_out/part-r-00000
另外有需要云服務(wù)器可以了解下創(chuàng)新互聯(lián)scvps.cn,海內(nèi)外云服務(wù)器15元起步,三天無理由+7*72小時售后在線,公司持有idc許可證,提供“云服務(wù)器、裸金屬服務(wù)器、高防服務(wù)器、香港服務(wù)器、美國服務(wù)器、虛擬主機、免備案服務(wù)器”等云主機租用服務(wù)以及企業(yè)上云的綜合解決方案,具有“安全穩(wěn)定、簡單易用、服務(wù)可用性高、性價比高”等特點與優(yōu)勢,專為企業(yè)上云打造定制,能夠滿足用戶豐富、多元化的應(yīng)用場景需求。