hadoop的map函數(shù)使用方法是什么

這篇文章主要介紹“hadoop的map函數(shù)使用方法是什么”，在日常操作中，相信很多人在hadoop的map函數(shù)使用方法是什么問題上存在疑惑，小編查閱了各式資料，整理出簡(jiǎn)單好用的操作方法，希望對(duì)大家解答”hadoop的map函數(shù)使用方法是什么”的疑惑有所幫助！接下來，請(qǐng)跟著小編一起來學(xué)習(xí)吧！

讓客戶滿意是我們工作的目標(biāo)，不斷超越客戶的期望值來自于我們對(duì)這個(gè)行業(yè)的熱愛。我們立志把好的技術(shù)通過有效、簡(jiǎn)單的方式提供給客戶，將通過不懈努力成為客戶在信息化領(lǐng)域值得信任、有價(jià)值的長(zhǎng)期合作伙伴，公司提供的服務(wù)項(xiàng)目有：國(guó)際域名空間、網(wǎng)絡(luò)空間、營(yíng)銷軟件、網(wǎng)站建設(shè)、蕉城網(wǎng)站維護(hù)、網(wǎng)站推廣。

大表關(guān)聯(lián)小表時(shí)可以使用hadoop的DistributedCache把小標(biāo)緩存到內(nèi)存中，由hadoop分發(fā)這些內(nèi)存到每臺(tái)需要map操作的服務(wù)器上進(jìn)行數(shù)據(jù)的清洗，關(guān)聯(lián)。

例如有這樣一份數(shù)據(jù)用戶登陸信息login：

1,0,20121213
2,0,20121213
3,1,20121213
4,1,20121213
1,0,20121114

第一列是用戶id，二列是性別，第三列是登陸時(shí)間。

需要將表中的用戶id，替換成用戶的名字，性別替換成漢字，然后統(tǒng)計(jì)他的登陸次數(shù)。

其中users表為：

1,張三,hubei
3,王五,tianjin
4,趙六,guangzhou
2,李四,beijing

sex表為：

0,男
1,女

map函數(shù)中進(jìn)行維表的關(guān)聯(lián)，輸出為姓名，性別為key，登陸1次為value。

public class Mapclass extends Mapper {
    private Map userMap = new HashMap();
    private Map sexMap = new HashMap();
    private Text oKey = new Text();
    private Text oValue = new Text();
    private String[] kv;
    @Override
    protected void setup(Context context) {
        BufferedReader in = null;
        // 從當(dāng)前作業(yè)中獲取要緩存的文件
        try {
            Path[] paths = DistributedCache.getLocalCacheFiles(context
                    .getConfiguration());
            String uidNameAddr = null;
            String sidSex = null;
            for (Path path : paths) {
                if (path.toString().contains("users")) {
                    in = new BufferedReader(new FileReader(path.toString()));
                    while (null != (uidNameAddr = in.readLine())) {
                        userMap.put(uidNameAddr.split(",", -1)[0],
                                uidNameAddr.split(",", -1)[1]);
                    }
                } else if (path.toString().contains("sex")) {
                    in = new BufferedReader(new FileReader(path.toString()));
                    while (null != (sidSex = in.readLine())) {
                        sexMap.put(sidSex.split(",", -1)[0],
                                sidSex.split(",", -1)[1]);
                    }
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if (in != null) {
                    in.close();
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    @Override
    protected void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        kv = value.toString().split(",");
        // map join: 在map階段過濾掉不需要的數(shù)據(jù)
        if (userMap.containsKey(kv[0]) && sexMap.containsKey(kv[1])) {
            oKey.set(userMap.get(kv[0]) + "," + sexMap.get(kv[1]));
            oValue.set("1");
            context.write(oKey, oValue);
        }
    }
}

reduce函數(shù)：

public class Reduce extends Reducer {
    private Text oValue = new Text();
    @Override
    protected void reduce(Text key, Iterable values, Context context)
            throws IOException, InterruptedException {
        int sumCount = 0;
        for (Text val : values) {
            sumCount += Integer.parseInt(val.toString());
        }
        oValue.set(String.valueOf(sumCount));
        context.write(key, oValue);
    }
}

main函數(shù)為：

public class MultiTableJoin extends Configured implements Tool {
    @Override
    public int run(String[] args) throws Exception {
        Job job = new Job(getConf(), "MultiTableJoin");
        job.setJobName("MultiTableJoin");
        job.setJarByClass(MultiTableJoin.class);
        job.setMapperClass(Mapclass.class);
        job.setReducerClass(Reduce.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        String[] otherArgs = new GenericOptionsParser(job.getConfiguration(),
                args).getRemainingArgs();
        // 我們把第1、2個(gè)參數(shù)的地址作為要緩存的文件路徑
        DistributedCache.addCacheFile(new Path(otherArgs[0]).toUri(),
                job.getConfiguration());
        DistributedCache.addCacheFile(new Path(otherArgs[1]).toUri(),
                job.getConfiguration());
        FileInputFormat.addInputPath(job, new Path(otherArgs[2]));
        FileOutputFormat.setOutputPath(job, new Path(otherArgs[3]));
        return job.waitForCompletion(true) ? 0 : 1;
    }
    public static void main(String[] arg0) throws Exception {
        String[] args = new String[4];
        args[0] = "hdfs://172.16.0.87:9000/user/jeff/decli/sex";
        args[1] = "hdfs://172.16.0.87:9000/user/jeff/decli/users";
        args[2] = "hdfs://172.16.0.87:9000/user/jeff/decli/login";
        args[3] = "hdfs://172.16.0.87:9000/user/jeff/decli/out";
        int res = ToolRunner.run(new Configuration(), new MultiTableJoin(),
                args);
        System.exit(res);
    }
}

計(jì)算的輸出為：

張三,男   2
李四,男   1
王五,女   1
趙六,女   1

到此，關(guān)于“hadoop的map函數(shù)使用方法是什么”的學(xué)習(xí)就結(jié)束了，希望能夠解決大家的疑惑。理論與實(shí)踐的搭配能更好的幫助大家學(xué)習(xí)，快去試試吧！若想繼續(xù)學(xué)習(xí)更多相關(guān)知識(shí)，請(qǐng)繼續(xù)關(guān)注創(chuàng)新互聯(lián)網(wǎng)站，小編會(huì)繼續(xù)努力為大家?guī)砀鄬?shí)用的文章！

文章標(biāo)題：hadoop的map函數(shù)使用方法是什么
網(wǎng)站網(wǎng)址：http://weahome.cn/article/jhggsj.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

hadoop的map函數(shù)使用方法是什么

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管