這篇文章主要介紹“hadoop重寫方法有哪些”,在日常操作中,相信很多人在hadoop重寫方法有哪些問題上存在疑惑,小編查閱了各式資料,整理出簡單好用的操作方法,希望對大家解答”hadoop重寫方法有哪些”的疑惑有所幫助!接下來,請跟著小編一起來學習吧!
創(chuàng)新互聯(lián)-專業(yè)網站定制、快速模板網站建設、高性價比豐南網站開發(fā)、企業(yè)建站全套包干低至880元,成熟完善的模板庫,直接使用。一站式豐南網站制作公司更省心,省錢,快速模板網站建設找我們,業(yè)務覆蓋豐南地區(qū)。費用合理售后完善,10余年實體公司更值得信賴。
1. 下載(略)
2. 編譯(略)
3. 配置(偽分布、集群略)
4. Hdfs
1. Web interface:http://namenode-name:50070/(顯示datanode列表和集群統(tǒng)計信息)
2. shell command & dfsadmin comman
3. checkpoint node & backup node
1. fsimage和edits文件merge原理
2. (猜測是早期版本的特性)手動恢復宕掉的集群:import checkpoint;
3. backupnode: Backup Node在內存中維護了一份從Namenode同步過來的fsimage,同時它還從namenode接收edits文件的日志流,并把它們持久化硬盤,Backup Node把收到的這些edits文件和內存中的fsimage文件進行合并,創(chuàng)建一份元數(shù)據(jù)備份。Backup Node高效的秘密就在這兒,它不需要從Namenode下載fsimage和edit,把內存中的元數(shù)據(jù)持久化到磁盤然后進行合并即可。
4. banlancer:平衡各rock和datanodes數(shù)據(jù)不均衡
5. Rock awareness:機架感知
6. Safemode:當數(shù)據(jù)文件不完整或者手動進入safemode時,hdfs只讀,當集群檢查達到閾值或手動離開安全模式時,集群恢復讀寫。
7. Fsck:塊文件檢查命令
8. Fetchdt:獲取token(安全)
9. Recovery mode:恢復模式
10. Upgrade and Rollback:升級、回滾
11. File Permissions and Security
12. Scalability
13.
5. Mapreduce
1.
publicclassMyMapper extendsMapper
privateText word = newText();
privateIntWritable one = newIntWritable(1);
// 重寫map方法
@Override
publicvoidmap(Object key, Text value, Context context)
throwsIOException, InterruptedException {
StringTokenizer stringTokenizer = newStringTokenizer(value.toString());
while(stringTokenizer.hasMoreTokens()){
word.set(stringTokenizer.nextToken());
// (word,1)進行傳遞
context.write(word, one);
}
}
}
publicclassMyReducer extendsReducer
privateIntWritable result = newIntWritable(0);
// 重寫reduce方法
@Override
protectedvoidreduce(Text key, Iterable
Context context) throwsIOException, InterruptedException {
intsum = 0;
for(IntWritable i : iterator){
sum += i.get();
}
result.set(sum);
// reduce輸出的值
context.write(key, result);
}
}
publicclassWordCountDemo {
publicstaticvoidmain(String[] args) throwsException {
Configuration conf = newConfiguration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCountDemo.class);
// 設置map、reduce class
job.setMapperClass(MyMapper.class);
job.setReducerClass(MyReducer.class);
job.setCombinerClass(MyReducer.class);
// 設置最終輸出的格式
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 設置FileInputFormat outputFormat
FileInputFormat.addInputPath(job, newPath(args[0]));
FileOutputFormat.setOutputPath(job, newPath(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
2. Job.setGroupingComparatorClass(Class).
3. Job.setCombinerClass(Class),
4. CompressionCodec
5. Map數(shù):Configuration.set(MRJobConfig.NUM_MAPS, int) => dataSize/blockSize
6. Reducer數(shù):Job.setNumReduceTasks(int).
With 0.95 all of the reduces can launch immediately and start transferring map outputs as the maps finish. With 1.75 the faster nodes will finish their first round of reduces and launch a second wave of reduces doing a much better job of load balancing.
7. Reduce->shuffle: Input to the Reducer is the sorted output of the mappers. In this phase the framework fetches the relevant partition of the output of all the mappers, via HTTP. –> reduce是mapper排序后的輸出的結果。在這一階段,框架通過http抓取所有mapper輸出的有關分區(qū)。
8. Reduce ->sort:The framework groups Reducer inputs by keys (since different mappers may have output the same key) in this stage. The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.-> 在這一階段,框架按照輸入的key(不同的mapper可能輸出相同的key)分組reducer。Shuffle和sort會同時發(fā)生,當map輸出被捕捉時,他們又會進行合并。
9. Reduce ->reduce:
10. Secondary sort
11. Partitioner
12. Counter :Mapper and Reducer implementations can use the Counter to report statistics.
13. Job conf:配置 -> speculative manner ( setMapSpeculativeExecution(boolean))/setReduceSpeculativeExecution(boolean)), maximum number of attempts per task (setMaxMapAttempts(int)/ setMaxReduceAttempts(int)) etc.
Or
Configuration.set(String, String)/ Configuration.get(String)
14. Task executor & environment -> The user can specify additional options to the child-jvm via the mapreduce.{map|reduce}.java.opts and configuration parameter in the Job such as non-standard paths for the run-time linker to search shared libraries via -Djava.library.path=<> etc. If the mapreduce.{map|reduce}.java.opts parameters contains the symbol @taskid@ it is interpolated with value of taskid of the MapReduce task.
15. Memory management - > Users/admins can also specify the maximum virtual memory of the launched child-task, and any sub-process it launches recursively, using mapreduce.{map|reduce}.memory.mb. Note that the value set here is a per process limit. The value for mapreduce.{map|reduce}.memory.mb should be specified in mega bytes (MB). And also the value must be greater than or equal to the -Xmx passed to JavaVM, else the VM might not start.
16. Map Parameters ...... (http://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#MapReduce_Tutorial)
17. Parameters ()
18. Job submission and monitoring:
1.Job provides facilities to submit jobs, track their progress, access component-tasks' reports and logs, get the MapReduce cluster's status information and so on.
2. The job submission process involves:
1. Checking the input and output specifications of the job.
2. Computing the InputSplit values for the job.
3. Setting up the requisite accounting information for the DistributedCache of the job, if necessary.
4. Copying the job's jar and configuration to the MapReduce system directory on the FileSystem.
5. Submitting the job to the ResourceManager and optionally monitoring it's status.
3. Job history
19. Job controller
1. Job.submit() || Job.waitForCompletion(boolean)
2. 多Mapreduce job
1. 迭代式mapreduce(上一個mr作為下一個mr的輸入,缺點:創(chuàng)建job對象的開銷、本地磁盤讀寫io和網絡開銷大)
2. MapReduce-JobControl:job封裝各個job的依賴關系,jobcontrol線程管理各個作業(yè)的狀態(tài)。
3. MapReduce-ChainMapper/ChainReduce:(chainMapper.addMap().可以在一個job中鏈接多個mapper任務,不可用于多reduce的job)。
20. Job input & output
1. InputFormat TextInputFormat FileInputFormat
2. InputSplit FileSplit
3. RecordReader
4. OutputFormat OutputCommitter
到此,關于“hadoop重寫方法有哪些”的學習就結束了,希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習,快去試試吧!若想繼續(xù)學習更多相關知識,請繼續(xù)關注創(chuàng)新互聯(lián)網站,小編會繼續(xù)努力為大家?guī)砀鄬嵱玫奈恼拢?/p>
當前名稱:hadoop重寫方法有哪些
本文鏈接:http://weahome.cn/article/jpseoe.html