UDF和UDAF開發(fā)方法是什么

這篇文章主要講解了“UDF和UDAF開發(fā)方法是什么”，文中的講解內(nèi)容簡單清晰，易于學習與理解，下面請大家跟著小編的思路慢慢深入，一起來研究和學習“UDF和UDAF開發(fā)方法是什么”吧！

成都服務器托管，創(chuàng)新互聯(lián)提供包括服務器租用、服務器托管、帶寬租用、云主機、機柜租用、主機租用托管、CDN網(wǎng)站加速、申請域名等業(yè)務的一體化完整服務。電話咨詢：18980820575

UDF自定義函數(shù)

自定義函數(shù)包括三種UDF、UDAF、UDTF

UDF(User-Defined-Function) 一進一出

UDAF(User- Defined Aggregation Funcation) 聚集函數(shù)，多進一出。Count/max/min

UDTF(User-Defined Table-Generating Functions) 一進多出，如lateral view explore()

使用方式：在HIVE會話中add 自定義函數(shù)的jar文件，然后創(chuàng)建function繼而使用函數(shù)

使用方式：

在HIVE會話中add 自定義函數(shù)的jar文件，然后創(chuàng)建function，繼而使用函數(shù)

UDF開發(fā)

1、UDF函數(shù)可以直接應用于select語句，對查詢結構做格式化處理后，再輸出內(nèi)容。

2、編寫UDF函數(shù)的時候需要注意一下幾點：

a）自定義UDF需要繼承org.apache.hadoop.hive.ql.UDF。

b）需要實現(xiàn)evaluate函數(shù)，evaluate函數(shù)支持重載。

3、步驟

a）把程序打包放到目標機器上去；

b）進入hive客戶端，添加jar包：hive>add jar /run/jar/udf_test.jar;

c）創(chuàng)建臨時函數(shù)：hive>CREATE TEMPORARY FUNCTION add_example AS 'hive.udf.Add';

d）查詢HQL語句：

SELECT add_example(8, 9) FROM scores;

SELECT add_example(scores.math, scores.art) FROM scores;

SELECT add_example(6, 7, 8, 6.8) FROM scores;

e）銷毀臨時函數(shù)：hive> DROP TEMPORARY FUNCTION add_example;

注：UDF只能實現(xiàn)一進一出的操作，如果需要實現(xiàn)多進一出，則需要實現(xiàn)UDAF

udf實現(xiàn)對字符串的截取

package hive;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.hadoop.hive.ql.exec.UDF;

public class GetCmsID extends UDF{
    
    public String evaluate(String url){
        String cmsid = null;
        if(url ==null || "".equals(url)){
            return cmsid;
        }
        Pattern pat = Pattern.compile("topicId=[0-9]+");
        Matcher matcher = pat.matcher(url);
        if(matcher.find() ){
            cmsid=matcher.group().split("topicId=")[1];
        }
        
        return cmsid;
    }
    public String evaluate(String pattern,String url ){
        String cmsid = null;
        if(url ==null || "".equals(url)){
            return cmsid;
        }
        Pattern pat = Pattern.compile(pattern+"[0-9]+");
        Matcher matcher = pat.matcher(url);
        if(matcher.find() ){
            cmsid=matcher.group().split(pattern)[1];
        }
        
        return cmsid;
    }
    public static void main(String[] args) {
        String url = "http://www.baidu.com/cms/view.do?topicId=123456";
        GetCmsID getCmsID = new GetCmsID();
        System.out.println(getCmsID.evaluate(url));
        System.out.println(getCmsID.evaluate("topicId=",url));
        
    }

}

UDAF 自定義集函數(shù)

多行進一行出，如sum()、min()，用在group by時

1.必須繼承

} org.apache.hadoop.hive.ql.exec.UDAF(函數(shù)類繼承)

} org.apache.hadoop.hive.ql.exec.UDAFEvaluator(內(nèi)部類Evaluator實現(xiàn)UDAFEvaluator接口)

2.Evaluator需要實現(xiàn) init、iterate、terminatePartial、merge、terminate這幾個函數(shù)

} init():類似于構造函數(shù)，用于UDAF的初始化

} iterate():接收傳入的參數(shù)，并進行內(nèi)部的輪轉，返回boolean

} terminatePartial():無參數(shù)，其為iterate函數(shù)輪轉結束后，返回輪轉數(shù)據(jù)，類似于hadoop的Combiner

} merge():接收terminatePartial的返回結果，進行數(shù)據(jù)merge操作，其返回類型為boolean

} terminate():返回最終的聚集函數(shù)結果

}開發(fā)一個功能同：

}Oracle的wm_concat()函數(shù)

}MySQL的group_concat()

UDF和UDAF開發(fā)方法是什么

package hive;
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
public class Wm_concat {
    public static class myUDAFEval implements UDAFEvaluator{
        private PartialResult partial = new PartialResult();
        public static  class PartialResult{
            String result = "";
            String delimiter = null;
            
        }
        @Override
        public void init() {
            partial.result ="";
        }
        public boolean iterate(String value ,String deli){
            if(value == null || "null".equalsIgnoreCase(value)){
                return true;
            }
            if(partial.delimiter == null){
                partial.delimiter = deli;
            }
            if(partial.result.length()>0){
                partial.result = partial.result.concat(partial.delimiter);//拼接
            } 
            partial.result = partial.result.concat(value);//拼接
            return true;
        }
        public PartialResult terminatePartial(){
            return partial;
        }
        
        public boolean merge(PartialResult other){
            if(other == null ){
                return true;
            }
            if (partial.delimiter == null) {
                partial.delimiter = other.result;
                partial.result = other.result;
            }else{
                if (partial.result.length()>0) {
                    partial.result = partial.result.concat(partial.delimiter);
                }
                partial.result = partial.result.concat(other.result);
            }
            return true;
        }
        public String terminate(){
            if(partial==null || partial.result.length()==0){
                return null;
            }
            return partial.result;
        }
        
    }
}

測試：

create table test(id string , name string ) row format delimited fields terminated by '\t';

插入數(shù)據(jù)

1 a

1 b

2 b

3 c

1 c

2 a

4 b

2 d

1 d

4 c

3 b

在hive中執(zhí)行函數(shù)如下

select id,concat(name,',') from wm_concat where id is not null group by id;

UDF和UDAF開發(fā)方法是什么

感謝各位的閱讀，以上就是“UDF和UDAF開發(fā)方法是什么”的內(nèi)容了，經(jīng)過本文的學習后，相信大家對UDF和UDAF開發(fā)方法是什么這一問題有了更深刻的體會，具體使用情況還需要大家實踐驗證。這里是創(chuàng)新互聯(lián)，小編將為大家推送更多相關知識點的文章，歡迎關注！

當前題目：UDF和UDAF開發(fā)方法是什么
標題路徑：http://weahome.cn/article/pegses.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

UDF和UDAF開發(fā)方法是什么

其他資訊

網(wǎng)站制作

企業(yè)服務

網(wǎng)站建設

服務器托管