hive常見(jiàn)自定義函數(shù)有哪些

這篇文章主要介紹了hive常見(jiàn)自定義函數(shù)有哪些，具有一定借鑒價(jià)值，感興趣的朋友可以參考下，希望大家閱讀完這篇文章之后大有收獲，下面讓小編帶著大家一起了解一下。

創(chuàng)新互聯(lián)主營(yíng)南部網(wǎng)站建設(shè)的網(wǎng)絡(luò)公司,主營(yíng)網(wǎng)站建設(shè)方案,App定制開(kāi)發(fā),南部h5小程序設(shè)計(jì)搭建,南部網(wǎng)站營(yíng)銷(xiāo)推廣歡迎南部等地區(qū)企業(yè)咨詢

1.1 為什么需要自定義函數(shù)

hive的內(nèi)置函數(shù)滿足不了所有的業(yè)務(wù)需求。hive提供很多的模塊可以自定義功能，比如：自定義函數(shù)、serde、輸入輸出格式等。

1.2 常見(jiàn)自定義函數(shù)有哪些

00001. UDF：用戶自定義函數(shù)，user defined function。一對(duì)一的輸入輸出。（最常用的）。

00002. UDTF：用戶自定義表生成函數(shù)。user defined table-generate function.一對(duì)多的輸入輸出。lateral view explode

00003. UDAF：用戶自定義聚合函數(shù)。user defined aggregate function。多對(duì)一的輸入輸出 count sum max。

2 自定義函數(shù)實(shí)現(xiàn)

2.1 UDF格式

先在工程下新建一個(gè)pom.xml,加入以下maven的依賴包請(qǐng)查看code/pom.xml

定義UDF函數(shù)要注意下面幾點(diǎn):

00001. 繼承org.apache.hadoop.hive.ql.exec.UDF

00002. 重寫(xiě)evaluate()，這個(gè)方法不是由接口定義的,因?yàn)樗山邮艿膮?shù)的個(gè)數(shù),數(shù)據(jù)類(lèi)型都是不確定的。Hive會(huì)檢查UDF,看能否找到和函數(shù)調(diào)用相匹配的evaluate()方法

2.1.1 自定義函數(shù)第一個(gè)案例

public class FirstUDF extends UDF {

public String evaluate(String str){

String upper = null;

//1、檢查輸入?yún)?shù) if (StringUtils.isEmpty(str)){

} else {

upper = str.toUpperCase();

}

return upper;

}

//調(diào)試自定義函數(shù) public static void main(String[] args){

System.out.println(new firstUDF().evaluate("jiajingwen"));

}}

2.2 函數(shù)加載方式

2.2.1 命令加載

這種加載只對(duì)本session有效

# 1、將編寫(xiě)的udf的jar包上傳到服務(wù)器上，并且將jar包添加到hive的class path中

# 進(jìn)入到hive客戶端,執(zhí)行下面命令

add jar /hivedata/udf.jar

# 2、創(chuàng)建一個(gè)臨時(shí)函數(shù)名,要跟上面hive在同一個(gè)session里面：

create temporary function toUP as 'com.qf.hive.FirstUDF';

3、檢查函數(shù)是否創(chuàng)建成功

show functions;

4. 測(cè)試功能

select toUp('abcdef');

5. 刪除函數(shù)

drop temporary function if exists tolow;

2.2.2 啟動(dòng)參數(shù)加載

(也是在本session有效，臨時(shí)函數(shù))

1、將編寫(xiě)的udf的jar包上傳到服務(wù)器上

2、創(chuàng)建配置文件

vi ./hive-init

add jar /hivedata/udf.jar;

create temporary function toup as 'com.qf.hive.FirstUDF';

# 3、啟動(dòng)hive的時(shí)候帶上初始化文件：

hive -i ./hive-init

select toup('abcdef')

2.2.3 配置文件加載

通過(guò)配置文件方式這種只要用hive命令行啟動(dòng)都會(huì)加載函數(shù)

1、將編寫(xiě)的udf的jar包上傳到服務(wù)器上

2、在hive的安裝目錄的bin目錄下創(chuàng)建一個(gè)配置文件，文件名：.hiverc

vi ./bin/.hiverc

add jar /hivedata/udf.jar;

create temporary function toup as 'com.qf.hive.FirstUDF';

3、啟動(dòng)hive

hive

2.3 UDTF格式

UDTF是一對(duì)多的輸入輸出,實(shí)現(xiàn)UDTF需要完成下面步驟

00001. 繼承org.apache.hadoop.hive.ql.udf.generic.GenericUDF，

00002. 重寫(xiě)initlizer（）、getdisplay（）、evaluate()。

執(zhí)行流程如下:

UDTF首先會(huì)調(diào)用initialize方法，此方法返回UDTF的返回行的信息（返回個(gè)數(shù)，類(lèi)型）。

初始化完成后，會(huì)調(diào)用process方法,真正的處理過(guò)程在process函數(shù)中，在process中，每一次forward()調(diào)用產(chǎn)生一行；如果產(chǎn)生多列可以將多個(gè)列的值放在一個(gè)數(shù)組中，然后將該數(shù)組傳入到forward()函數(shù)。

最后close()方法調(diào)用，對(duì)需要清理的方法進(jìn)行清理。

2.3.1 需求:

把"k1:v1;k2:v2;k3:v3"類(lèi)似的的字符串解析成每一行多行,每一行按照key:value格式輸出

2.3.2 源碼

自定義函數(shù)如下:

package com.qf.hive;

public class ParseMapUDTF extends GenericUDTF{

@Override

public void close() throws HiveException {

}

@Override

public StructObjectInspector initialize(ObjectInspector[] args)

throws UDFArgumentException {

if (args.length != 1) {

throw new UDFArgumentLengthException(" 只能傳入一個(gè)參數(shù)");

}

ArrayList<String> fieldNameList = new ArrayList<String>();

ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();

fieldNameList.add("map");

fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

fieldNameList.add("key");

fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNameList,fieldOIs);

}

@Override

public void process(Object[] args) throws HiveException {

String input = args[0].toString();

String[] paramString = input.split(";");

for(int i=0; i<paramString.length; i++) {

try {

String[] result = paramString[i].split(":");

forward(result);

} catch (Exception e) {

continue;

}

2.3.3 打包加載

對(duì)上述命令源文件打包為udf.jar,拷貝到服務(wù)器的/hivedata/目錄

在Hive客戶端把udf.jar加入到hive中,如下:

add jar /hivedata/udf.jar;

2.3.4 創(chuàng)建臨時(shí)函數(shù):

在Hive客戶端創(chuàng)建函數(shù):

create temporary function parseMap as 'com.qf.hive.ParseMapUDTF'; # 創(chuàng)建一個(gè)臨時(shí)函數(shù)parseMap# 查看函數(shù)是否加入show functions ;

2.3.5 測(cè)試臨時(shí)函數(shù)

select parseMap("name:zhang;age:30;address:shenzhen")

結(jié)果如下:

#map key

name zhang

age 30

address shenzhen

2.4 UDAF格式

用戶自定義聚合函數(shù)。user defined aggregate function。多對(duì)一的輸入輸出 count sum max。定義一個(gè)UDAF需要如下步驟:

00001. UDF自定義函數(shù)必須是org.apache.hadoop.hive.ql.exec.UDAF的子類(lèi),并且包含一個(gè)火哥多個(gè)嵌套的的實(shí)現(xiàn)了org.apache.hadoop.hive.ql.exec.UDAFEvaluator的靜態(tài)類(lèi)。

00002. 函數(shù)類(lèi)需要繼承UDAF類(lèi)，內(nèi)部類(lèi)Evaluator實(shí)UDAFEvaluator接口。

00003. Evaluator需要實(shí)現(xiàn) init、iterate、terminatePartial、merge、terminate這幾個(gè)函

這幾個(gè)函數(shù)作用如下:

函數(shù)說(shuō)明init實(shí)現(xiàn)接口UDAFEvaluator的init函數(shù)iterate每次對(duì)一個(gè)新值進(jìn)行聚集計(jì)算都會(huì)調(diào)用,計(jì)算函數(shù)要根據(jù)計(jì)算的結(jié)果更新其內(nèi)部狀態(tài)terminatePartial無(wú)參數(shù)，其為iterate函數(shù)輪轉(zhuǎn)結(jié)束后，返回輪轉(zhuǎn)數(shù)據(jù)merge接收terminatePartial的返回結(jié)果，進(jìn)行數(shù)據(jù)merge操作，其返回類(lèi)型為boolean。terminate返回最終的聚集函數(shù)結(jié)果。

2.4.1 需求

計(jì)算一組整數(shù)的最大值

2.4.2 代碼

package com.qf.hive;public class MaxValueUDAF extends UDAF {

public static class MaximumIntUDAFEvaluator implements UDAFEvaluator {

private IntWritable result;

public void init() {

result = null;

}

public boolean iterate(IntWritable value) {

if (value == null) {

return true;

}

if (result == null) {

result = new IntWritable( value.get() );

} else {

result.set( Math.max( result.get(), value.get() ) );

}

return true;

}

public IntWritable terminatePartial() {

return result;

}

public boolean merge(IntWritable other) {

return iterate( other );

}

public IntWritable terminate() {

return result;

}

}}

2.4.3 打包加載

對(duì)上述命令源文件打包為udf.jar,拷貝到服務(wù)器的/hivedata/目錄

在Hive客戶端把udf.jar加入到hive中,如下:

add jar /hivedata/udf.jar;

2.4.4 創(chuàng)建臨時(shí)函數(shù):

在Hive客戶端創(chuàng)建函數(shù):

create temporary function maxInt as 'com.qf.hive.MaxValueUDAF';# 查看函數(shù)是否加入show functions ;

2.3.5 測(cè)試臨時(shí)函數(shù)

select maxInt(mgr) from emp

結(jié)果如下:

#結(jié)果

7902

感謝你能夠認(rèn)真閱讀完這篇文章，希望小編分享的“hive常見(jiàn)自定義函數(shù)有哪些”這篇文章對(duì)大家有幫助，同時(shí)也希望大家多多支持創(chuàng)新互聯(lián)，關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道，更多相關(guān)知識(shí)等著你來(lái)學(xué)習(xí)!

文章標(biāo)題：hive常見(jiàn)自定義函數(shù)有哪些
標(biāo)題路徑：http://weahome.cn/article/jgscih.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

hive常見(jiàn)自定義函數(shù)有哪些

1.1 為什么需要自定義函數(shù)

1.2 常見(jiàn)自定義函數(shù)有哪些

2 自定義函數(shù)實(shí)現(xiàn)

2.1 UDF格式

2.1.1 自定義函數(shù)第一個(gè)案例

2.2 函數(shù)加載方式

2.2.1 命令加載

2.2.2 啟動(dòng)參數(shù)加載

2.2.3 配置文件加載

2.3 UDTF格式

2.3.1 需求:

2.3.2 源碼

2.3.3 打包加載

2.3.4 創(chuàng)建臨時(shí)函數(shù):

2.3.5 測(cè)試臨時(shí)函數(shù)

2.4 UDAF格式

2.4.1 需求

2.4.2 代碼

2.4.3 打包加載

2.4.4 創(chuàng)建臨時(shí)函數(shù):

2.3.5 測(cè)試臨時(shí)函數(shù)

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管