這篇文章主要講解了“Hadoop中的MultipleOutput實(shí)例使用”,文中的講解內(nèi)容簡(jiǎn)單清晰,易于學(xué)習(xí)與理解,下面請(qǐng)大家跟著小編的思路慢慢深入,一起來研究和學(xué)習(xí)“Hadoop中的MultipleOutput實(shí)例使用”吧!
創(chuàng)新互聯(lián)公司主要從事網(wǎng)站建設(shè)、做網(wǎng)站、網(wǎng)頁設(shè)計(jì)、企業(yè)做網(wǎng)站、公司建網(wǎng)站等業(yè)務(wù)。立足成都服務(wù)屯留,10余年網(wǎng)站建設(shè)經(jīng)驗(yàn),價(jià)格優(yōu)惠、服務(wù)專業(yè),歡迎來電咨詢建站服務(wù):028-86922220
原數(shù)據(jù):
預(yù)想處理后的結(jié)果:
MyMapper.java
package com.xr.text; import java.io.IOException; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class MyMapper extends Mapper{ protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException { String[] split = value.toString().split(";"); context.write(new Text(split[0]), new Text(split[1])); } }
MyReducer.java
package com.xr.text; import java.io.IOException; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; public class MyReducer extends Reducer{ private MultipleOutputs mos; /** * start before set MultipleOutputs; */ protected void setup(Context context) throws IOException, InterruptedException { mos = new MultipleOutputs(context); } protected void reduce(Text k1, Iterable value,Context context) throws IOException, InterruptedException { String key = k1.toString(); for(Text t : value){ if("中國(guó)".equals(key)){ mos.write("china",new Text("中國(guó)"), t); }else if("美國(guó)".equals(key)){ mos.write("usa",new Text("美國(guó)"),t); }else if("中國(guó)人".equals(key)){ mos.write("cpeople",new Text("中國(guó)人"),t); } } } /** * close MultipleOutputs; */ protected void cleanup(Context context) throws IOException, InterruptedException { mos.close(); } }
JobTest.java
package com.xr.text; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; public class JobTest { public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { String inputPath = "hdfs://192.168.75.100:9000/1.txt"; String outputPath = "hdfs://192.168.75.100:9000/ceshi"; Job job = new Job(); job.setJarByClass(JobTest.class); job.setMapperClass(MyMapper.class); /** * set MultipleOutput file name */ MultipleOutputs.addNamedOutput(job, "china", TextOutputFormat.class, Text.class, Text.class); MultipleOutputs.addNamedOutput(job, "usa", TextOutputFormat.class, Text.class, Text.class); MultipleOutputs.addNamedOutput(job, "cpeople", TextOutputFormat.class, Text.class, Text.class); job.setReducerClass(MyReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); FileInputFormat.setInputPaths(job, new Path(inputPath)); // Configuration conf = new Configuration(); // FileSystem fs = FileSystem.get(conf); // // if(fs.exists(new Path(outputPath))){ // fs.delete(new Path(outputPath), true); // } FileOutputFormat.setOutputPath(job, new Path(outputPath)); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
運(yùn)行過程中報(bào)錯(cuò):
14/08/12 12:44:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/08/12 12:44:02 ERROR security.UserGroupInformation: PriviledgedActionException as:Xr cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Xr\mapred\staging\Xr-1514460710\.staging to 0700
Exception in thread "main">java.io.IOException: Failed to set permissions of path: \tmp\hadoop-Xr\mapred\staging\Xr-1514460710\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at com.xr.text.JobTest.main(JobTest.java:37)
錯(cuò)誤解決方案:
1. 把hadoop-core-1.1.2.jar中的FileUtil.class刪除.
2. 再把/org/apache/hadoop/fs/FileUtil.java從源碼中copy出來
3. 注釋checkReturnValue()方法
運(yùn)行時(shí)再次報(bào)錯(cuò):
java.lang.OutOfMemoryError: Java heap space
解決方案:
ok,job順利執(zhí)行。
生成以下文件:
感謝各位的閱讀,以上就是“Hadoop中的MultipleOutput實(shí)例使用”的內(nèi)容了,經(jīng)過本文的學(xué)習(xí)后,相信大家對(duì)Hadoop中的MultipleOutput實(shí)例使用這一問題有了更深刻的體會(huì),具體使用情況還需要大家實(shí)踐驗(yàn)證。這里是創(chuàng)新互聯(lián),小編將為大家推送更多相關(guān)知識(shí)點(diǎn)的文章,歡迎關(guān)注!