學(xué)習(xí)日志---hdfs配置及原理+yarn的配置

篩選算法：

網(wǎng)站建設(shè)哪家好，找成都創(chuàng)新互聯(lián)！專注于網(wǎng)頁設(shè)計(jì)、網(wǎng)站建設(shè)、微信開發(fā)、微信小程序開發(fā)、集團(tuán)企業(yè)網(wǎng)站建設(shè)等服務(wù)項(xiàng)目。為回饋新老客戶創(chuàng)新互聯(lián)還提供了塔城免費(fèi)建站歡迎大家使用！

關(guān)注度權(quán)重公式：

W = TF * Log(N/DF)

TF：當(dāng)前關(guān)鍵字在該條記錄中出現(xiàn)的總次數(shù)；

N：總的記錄數(shù)；

DF：當(dāng)前關(guān)鍵字在所有記錄中出現(xiàn)的條數(shù)；

HDFS的 namenode HA和namenode Federation

(1)解決單點(diǎn)故障：

使用HDFS HA:通過主備namenode解決；如果主發(fā)生故障，則切換到備上。

(2)解決內(nèi)存受限：

使用HDFS Federation,水平擴(kuò)展，支持多個(gè)namenode,相互獨(dú)立。共享所有datanode。

下面詳細(xì)說明：

namenode HA:namenode對(duì)元數(shù)據(jù)的修改都會(huì)經(jīng)過journalnode，在QJM的集群上備份一個(gè)，因此QJM上元數(shù)據(jù)和namenode上元數(shù)據(jù)是一樣的（namenode的元數(shù)據(jù)就是QJM上的元數(shù)據(jù)鏡像），在namenode掛掉后，standby的namenode會(huì)找QJM集群上的元數(shù)據(jù)，繼續(xù)工作。如果使用namenode Federation，則每個(gè)namenode的共享數(shù)據(jù)都會(huì)在journalnode的集群上。相當(dāng)于每個(gè)namenode上都存了一個(gè)對(duì)journalnode集群的鏡像，namenode的讀寫，都是在jn集群上修改和尋找的。

學(xué)習(xí)日志---hdfs配置及原理+yarn的配置

客戶端一開始請(qǐng)求hdfs時(shí)，先訪問zookeeper，去檢查哪些namenode掛掉了，哪些活著，去決定訪問哪個(gè)namenode。任何一個(gè)namenode都會(huì)對(duì)應(yīng)一個(gè)FailoverController,也就是ZKFC競(jìng)爭(zhēng)鎖。在一個(gè)namenode掛掉后，有競(jìng)爭(zhēng)鎖來選擇用哪一個(gè)namenode，這里使用的是投票機(jī)制，因此zookeeper要使用奇數(shù)的。

namenode Federation:是在一個(gè)集群中有若干個(gè)獨(dú)立的namenode,相當(dāng)于多個(gè)獨(dú)立的集群，但是共用datanode。客戶端訪問這些namenode時(shí)，要選擇使用哪個(gè)namenode才可以訪問和使用。

在Federation上加HA，是對(duì)每一個(gè)namenode都加HA，互相獨(dú)立。

學(xué)習(xí)日志---hdfs配置及原理+yarn的配置

YARN：

YARN是資源管理系統(tǒng)，管理HDFS的數(shù)據(jù)，知曉數(shù)據(jù)的所有情況；計(jì)算框架向yarn去申請(qǐng)資源去計(jì)算，可以做到資源不浪費(fèi)，可以并發(fā)的運(yùn)行計(jì)算框架；兼容其他第三方的并行計(jì)算框架；

在資源管理方面：

ResourceManager:負(fù)責(zé)整個(gè)集群的資源管理和調(diào)度

ApplicationMaster:負(fù)責(zé)應(yīng)用程序相關(guān)的事務(wù)，比如任務(wù)調(diào)度、任務(wù)監(jiān)控和容錯(cuò)等。其在每個(gè)節(jié)點(diǎn)上工作時(shí)有nodeManager，這里面就有ApplicationMaster。

學(xué)習(xí)日志---hdfs配置及原理+yarn的配置

nodeManager最好是在datanode的機(jī)器上，因?yàn)榉奖阌?jì)算；

以namenode HA的方式來配置啟動(dòng)hadoop集群

配置hdfs-site.xml及其說明：

這里都是對(duì)hdfs的配置進(jìn)行操作，如哪些node上有哪些特定的操作。



    
    
        dfs.name.dir
        /root/data/namenode
    

    
        dfs.data.dir
        /root/data/datanode
    

    
        dfs.tmp.dir
        /root/data/tmp
    

    
        dfs.replication
        1
    

    //nameservices是該集群的名字，是唯一的標(biāo)示，供zookeeper去識(shí)別，mycluster就是名字，可以改為其他的
    
        dfs.nameservices
        mycluster
    

    //指明在該集群下，有幾個(gè)namenode及其名字，這里有集群的名字，和上面的對(duì)應(yīng)
    
        dfs.ha.namenodes.mycluster
        nn1,nn2
    

    //每個(gè)namenode的rpc協(xié)議的地址，為了傳遞數(shù)據(jù)用的，客戶端上傳下載用這個(gè)
    
       dfs.namenode.rpc-address.mycluster.nn1
       hadoop11:4001
    
    
       dfs.namenode.rpc-address.mycluster.nn2
       hadoop22:4001
    

    
       dfs.namenode.servicerpc-address.mycluster.nn1
       hadoop11:4011
    
    
       dfs.namenode.servicerpc-address.mycluster.nn2
       hadoop22:4011
    

    //http協(xié)議的端口，是為了通過網(wǎng)絡(luò)，如瀏覽器，去查看hdfs的
    
          dfs.namenode.http-address.mycluster.nn1
          hadoop11:50070
    
    
          dfs.namenode.http-address.mycluster.nn2
          hadoop22:50070 
    

    //這里是配置journalnode的主機(jī)，配置為奇數(shù)個(gè)，集群中在哪些機(jī)器上有journalnode。
    //namenode進(jìn)行讀寫時(shí)，請(qǐng)求的是這個(gè)地址，journalnode實(shí)時(shí)的記錄了文件的情況，外界訪問namenode,namenode一方面自己響應(yīng)請(qǐng)求，一方面找journalnode進(jìn)行讀寫，做好備份。
    
            dfs.namenode.shared.edits.dir
            qjournal://hadoop11:8485;hadoop22:8485;hadoop33:8485/mycluster
    

    //journalNode在機(jī)器上的文件位置，工作目錄
    
        dfs.journalnode.edits.dir
        /root/data/journaldata/
    

    //外界連接激活的namenode調(diào)用的類
    //供外界去找到active的namenode
   
     dfs.client.failover.proxy.provider.mycluster
     org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
   

   //自動(dòng)切換namenode
   
      dfs.ha.automatic-failover.enabled
      true
   

   //用于一臺(tái)機(jī)子登陸另一臺(tái)機(jī)器使用的密匙的位置
    
      dfs.ha.fencing.methods
      sshfence
    
    
    //私鑰文件所在的位置
    
      dfs.ha.fencing.ssh.private-key-files
      /root/.ssh/id_dsa

配置core-site.xml

    //這個(gè)是hdfs的統(tǒng)一入口，mycluster是我們自己配置的該集群的統(tǒng)一服務(wù)標(biāo)識(shí)
    //外界訪問的是這個(gè)集群
    
        fs.defaultFS
        hdfs://mycluster
    

    //由zookeeper去管理hdfs,這里是ZooKeeper集群的地址和端口。注意，數(shù)量一定是奇數(shù)，且不少于三個(gè)節(jié)點(diǎn)
    
        ha.zookeeper.quorum
        hadoop11:2181,hadoop22:2181,hadoop33:2181

如果一個(gè)不是HA的namenode變?yōu)镠A的，則在要改的namenode的主機(jī)上執(zhí)行hdfs -initializeSharedEdits,這可以使該namenode上的元數(shù)據(jù)改為journalnode上的元數(shù)據(jù)。

root用戶下的.bashrc文件是環(huán)境變量的配置文件，只供root用戶使用

zookeeper的配置：

一個(gè)是先配置dir路徑，以存放文件，避免關(guān)閉后zookeeper信息丟失

server.1=hadoop11:2888:3888
server.2=hadoop22:2888:3888
server.3=hadoop33:2888:3888

server.1是指zookeeper在集群中的編號(hào)

在zookeeper的配置文件中還有一個(gè)這個(gè)dataDir=/root/data/zookeeper，里面有個(gè)myid文件

[root@hadoop11 data]# cd zookeeper/
[root@hadoop11 zookeeper]# ls
myid version-2

這個(gè)myid文件指明了當(dāng)前的zookeeper在集群中的編號(hào)是幾。

配置過程簡(jiǎn)述：

現(xiàn)在每臺(tái)機(jī)器上啟動(dòng)zookeeper，zk。。。啟動(dòng)。不要?jiǎng)?/p>

而后，hdfs-daemon.sh journalnode，啟動(dòng)journalnode，在其中一個(gè)機(jī)器上啟動(dòng)namenode，使用hdfs namenode -format，得到namenode的源文件，可以啟動(dòng)這個(gè)節(jié)點(diǎn)的namenode，再在另一個(gè)namenode上使用hdfs namenode -bootstrapStandby，作為備用的節(jié)點(diǎn)，兩個(gè)namenode的元文件一樣。

If you are setting up a fresh HDFS cluster, you should first run the format command (hdfs namenode -format) on one of NameNodes.
If you have already formatted the NameNode, or are converting a non-HA-enabled cluster to be HA-enabled, you should now copy over the contents of your NameNode metadata directories to the other, unformatted NameNode by running the command "hdfs namenode -bootstrapStandby" on the unformatted NameNode. Running this command will also ensure that the JournalNodes (as configured by dfs.namenode.shared.edits.dir) contain sufficient edits transactions to be able to start both NameNodes.
If you are converting a non-HA NameNode to be HA, you should run the command "hdfs -initializeSharedEdits", which will initialize the JournalNodes with the edits data from the local NameNode edits directories.

每個(gè)namenode上都有一個(gè)zkfc，是失敗機(jī)制，與zookeeper交互的。

在某一個(gè)namenode上要執(zhí)行下列指令，使得zkfc與zookeeper相關(guān)聯(lián)

使得zkfc可以正常啟動(dòng)

Initializing HA state in ZooKeeper

After the configuration keys have been added, the next step is to initialize required state in ZooKeeper. You can do so by running the following command from one of the NameNode hosts.

$ hdfs zkfc -formatZK

This will create a znode in ZooKeeper inside of which the automatic failover system stores its data.

hdfs的一些特點(diǎn)：

sbin目錄下的hadoop-deamon.sh 【節(jié)點(diǎn)】可以用于開啟該機(jī)器上的某一個(gè)節(jié)點(diǎn)

可以用kill -9 哪個(gè)進(jìn)程去殺掉某一個(gè)進(jìn)程的node

start-dfs.sh啟動(dòng)集群的hdfs

把hadoop的bin和sbin配置在環(huán)境變量后，可以使用hdfs實(shí)現(xiàn)很多操作，如下：

[root@hadoop11 ~]# hdfs
Usage: hdfs [--config confdir] COMMAND
       where COMMAND is one of:
dfs                  run a filesystem command on the file systems supported in Hadoop.
namenode -format     format the DFS filesystem
secondarynamenode    run the DFS secondary namenode
namenode             run the DFS namenode
journalnode          run the DFS journalnode
zkfc                 run the ZK Failover Controller daemon
datanode             run a DFS datanode
dfsadmin             run a DFS admin client
haadmin              run a DFS HA admin client
fsck                 run a DFS filesystem checking utility
balancer             run a cluster balancing utility
jmxget               get JMX exported values from NameNode or DataNode.
oiv                  apply the offline fsp_w_picpath viewer to an fsp_w_picpath
oev                  apply the offline edits viewer to an edits file
fetchdt              fetch a delegation token from the NameNode
getconf              get config values from configuration
groups               get the groups which users belong to
snapshotDiff         diff two snapshots of a directory or diff the
                       current directory contents with a snapshot
lsSnapshottableDir   list all snapshottable dirs owned by the current user
                                                Use -help to see options
portmap              run a portmap service
nfs3                 run an NFS version 3 gateway
cacheadmin           configure the HDFS cache

YARN的配置

mapred-site.xml 中

mapred-site.xml 


        //這里是指明mapreduce使用的是哪個(gè)框架
        
                mapreduce.framework.name
                yarn

yarn-site.xml中

yarn-site.xml


        //下面幾個(gè)在每個(gè)節(jié)點(diǎn)中配置的都一樣，因?yàn)檫@里指明集群中使用哪臺(tái)機(jī)器作為resourcemanager
        //這個(gè)是yarn資源管理器地址，用于外界連接到資源管理器（這個(gè)）
        
                yarn.resourcemanager.address
                hadoop1:9080
        
        
        //應(yīng)用程序宿主借此與資源管理器通信
        
                yarn.resourcemanager.scheduler.address
                hadoop1:9081
        
        
        //節(jié)點(diǎn)管理器借此與資源管理器通信的端口，如在hadoop2中陪這個(gè)，2中的nodemanager就可以找到1的resourcemanager
        
                yarn.resourcemanager.resource-tracker.address
                hadoop1:9082
        

        //節(jié)點(diǎn)管理器運(yùn)行的附加服務(wù)列表
        
                yarn.nodemanager.aux-services
                mapreduce_shuffle

每臺(tái)機(jī)器都可以自己?jiǎn)?dòng)nodemanager，使用yarn-darmon.sh start nodemanager，這是啟動(dòng)的nodemanager會(huì)根據(jù)yarn-site.xml文件中的配置找到其resourcemanager。但是在集群中nodemanager是運(yùn)行在datanode上，去管理datanode的，因此如果在slaves中指明哪些機(jī)器上有datanode，在主機(jī)上使用start-yarn.sh時(shí)，該主機(jī)作為resourcemanager，同時(shí)會(huì)從slaves中，把該文件中的節(jié)點(diǎn)上啟動(dòng)nodemanager。

在每個(gè)節(jié)點(diǎn)上都有yarn，其會(huì)根據(jù)自己的yarn的配置去有序的形成一個(gè)集群，以resourcemanager為主。

在resourcemanager要求的地址啟動(dòng)yarn，才會(huì)啟動(dòng)resourcemanager。

如果要在hadoop上運(yùn)行mapreduce：

要把mapreduce程序打包，放在hadoop集群中；
使用指令：hadoop jar [web.jar程序名] [主函數(shù)所在的類名] [輸入文件路徑] [輸出文件路徑]
如：hadoop jar web.jar org.shizhen.wordcount /test /output
之后就可在output上查看了

hadoop集群中本身的hadoop和yarn就對(duì)應(yīng)了很多指令：

使用這些指令可以操作某個(gè)進(jìn)程某個(gè)節(jié)點(diǎn)。。。。

[root@hadoop11 ~]# hadoop
Usage: hadoop [--config confdir] COMMAND
       where COMMAND is one of:
fs                   run a generic filesystem user client
version              print the version
jar             run a jar file
checknative [-a|-h] check native hadoop and compression libraries availability
distcp copy file or directories recursively
archive -archiveName NAME -p * create a hadoop archive
classpath            prints the class path needed to get the
                       Hadoop jar and the required libraries
daemonlog            get/set the log level for each daemon
or
CLASSNAME            run the class named CLASSNAME

Most commands print help when invoked w/o parameters.

使用這些yarn類的指令，可以操縱mapreduce相關(guān)的節(jié)點(diǎn)和監(jiān)控程序的流程，如application。。。

[root@hadoop11 ~]# yarn
Usage: yarn [--config confdir] COMMAND
where COMMAND is one of:
resourcemanager      run the ResourceManager
nodemanager          run a nodemanager on each slave
historyserver        run the application history server
rmadmin              admin tools
version              print the version
jar             run a jar file
application          prints application(s) report/kill application
applicationattempt   prints applicationattempt(s) report
container            prints container(s) report
node                 prints node report(s)
logs                 dump container logs
classpath            prints the class path needed to get the
                       Hadoop jar and the required libraries
daemonlog            get/set the log level for each daemon
or
CLASSNAME            run the class named CLASSNAME
Most commands print help when invoked w/o parameters.

配置完成后：

啟動(dòng)時(shí)先zkServer.sh start 啟動(dòng)zookeeper，然后start-all.sh啟動(dòng)hadoop即可。

文章標(biāo)題：學(xué)習(xí)日志---hdfs配置及原理+yarn的配置
本文來源：http://weahome.cn/article/gdidsh.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

學(xué)習(xí)日志---hdfs配置及原理+yarn的配置

Initializing HA state in ZooKeeper

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管