這篇文章將為大家詳細(xì)講解有關(guān)hive on spark如何編譯,小編覺得挺實(shí)用的,因此分享給大家做個(gè)參考,希望大家閱讀完這篇文章后可以有所收獲。
創(chuàng)新互聯(lián)服務(wù)項(xiàng)目包括山海關(guān)網(wǎng)站建設(shè)、山海關(guān)網(wǎng)站制作、山海關(guān)網(wǎng)頁(yè)制作以及山海關(guān)網(wǎng)絡(luò)營(yíng)銷策劃等。多年來(lái),我們專注于互聯(lián)網(wǎng)行業(yè),利用自身積累的技術(shù)優(yōu)勢(shì)、行業(yè)經(jīng)驗(yàn)、深度合作伙伴關(guān)系等,向廣大中小型企業(yè)、政府機(jī)構(gòu)等提供互聯(lián)網(wǎng)行業(yè)的解決方案,山海關(guān)網(wǎng)站推廣取得了明顯的社會(huì)效益與經(jīng)濟(jì)效益。目前,我們服務(wù)的客戶以成都為中心已經(jīng)輻射到山海關(guān)省份的部分城市,未來(lái)相信會(huì)繼續(xù)擴(kuò)大服務(wù)區(qū)域并繼續(xù)獲得客戶的支持與信任!
Hive on Spark是Hive跑在Spark上,用的是Spark執(zhí)行引擎,而不是MapReduce,和Hive on Tez的道理一樣。 從Hive 1.1版本開始,Hive on Spark已經(jīng)成為Hive代碼的一部分了,并且在spark分支上面,可以看這里https://github.com/apache/hive/tree/spark,并會(huì)定期的移到master分支上面去。
cd hive_on_spark/ git branch -r origin/HEAD -> origin/master origin/HIVE-4115 origin/HIVE-8065 origin/beeline-cli origin/branch-0.10 origin/branch-0.11 origin/branch-0.12 origin/branch-0.13 origin/branch-0.14 origin/branch-0.2 origin/branch-0.3 origin/branch-0.4 origin/branch-0.5 origin/branch-0.6 origin/branch-0.7 origin/branch-0.8 origin/branch-0.8-r2 origin/branch-0.9 origin/branch-1 origin/branch-1.0 origin/branch-1.0.1 origin/branch-1.1 origin/branch-1.1.1 origin/branch-1.2 origin/cbo origin/hbase-metastore origin/llap origin/master origin/maven origin/next origin/parquet origin/ptf-windowing origin/release-1.1 origin/spark origin/spark-new origin/spark2 origin/tez origin/vectorization git checkout origin/spark git branch * (分離自 origin/spark) master
修改$HIVE_ON_SPARK/pom.xml spark版本改成spark1.4.1
1.4.1
hadoop版本改成2.3.0-cdh6.1.0
2.3.0-cdh6.1.0
編譯命令
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" mvn clean package -Phadoop-2 -DskipTests
spark home:/home/cluster/apps/spark/spark-1.4.1 hive home:/home/cluster/apps/hive_on_spark
1.set the property 'spark.home' to point to the Spark installation:
hive> set spark.home=/home/cluster/apps/spark/spark-1.4.1;
Define the SPARK_HOME environment variable before starting Hive CLI/HiveServer2:
export SPARK_HOME=/home/cluster/apps/spark/spark-1.4.1
3.Set the spark-assembly jar on the Hive auxpath:
hive --auxpath /home/cluster/apps/spark/spark-1.4.1/lib/spark-assembly-*.jar
Add the spark-assembly jar for the current user session:
hive> add jar /home/cluster/apps/spark/spark-1.4.1/lib/spark-assembly-*.jar;
Link the spark-assembly jar to $HIVE_HOME/lib.
[ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline.TerminalFactory.create(TerminalFactory.java:101) at jline.TerminalFactory.get(TerminalFactory.java:158) at jline.console.ConsoleReader.(ConsoleReader.java:229) at jline.console.ConsoleReader. (ConsoleReader.java:221) at jline.console.ConsoleReader. (ConsoleReader.java:209) at org.apache.hadoop.hive.cli.CliDriver.getConsoleReader(CliDriver.java:773) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:715) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:615) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
解決方法:export HADOOP_USER_CLASSPATH_FIRST=true
其他場(chǎng)景的錯(cuò)誤解決方法參見:https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
set spark.eventLog.dir= hdfs://master:8020/directory 否則查詢會(huì)報(bào)錯(cuò),否則一直報(bào)錯(cuò):/tmp/spark-event類似的文件夾不存在
hive> set hive.execution.engine=spark;
hive> set spark.master=spark://master:7077
或者yarn:spark.master=yarn
可以配置在spark-defaults.conf或者h(yuǎn)ive-site.xml
spark.master=spark.eventLog.enabled=true; spark.executor.memory=512m; spark.serializer=org.apache.spark.serializer.KryoSerializer; spark.executor.memory=... #Amount of memory to use per executor process. spark.executor.cores=... #Number of cores per executor. spark.yarn.executor.memoryOverhead=... spark.executor.instances=... #The number of executors assigned to each application. spark.driver.memory=... #The amount of memory assigned to the Remote Spark Context (RSC). We recommend 4GB. spark.yarn.driver.memoryOverhead=... #We recommend 400 (MB).
參數(shù)配置詳見文檔:https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started
hive (default)> select city_id, count(*) c from city_info group by city_id order by c desc limit 5; Query ID = spark_20150309173838_444cb5b1-b72e-4fc3-87db-4162e364cb1e Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= state = SENT state = STARTED state = STARTED state = STARTED state = STARTED Query Hive on Spark job[0] stages: 1 Status: Running (Hive on Spark job[0]) Job Progress Format CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost] 2015-03-09 17:38:11,822 Stage-0_0: 0(+1)/1 Stage-1_0: 0/1 Stage-2_0: 0/1 state = STARTED state = STARTED state = STARTED 2015-03-09 17:38:14,845 Stage-0_0: 0(+1)/1 Stage-1_0: 0/1 Stage-2_0: 0/1 state = STARTED state = STARTED 2015-03-09 17:38:16,861 Stage-0_0: 1/1 Finished Stage-1_0: 0(+1)/1 Stage-2_0: 0/1 state = SUCCEEDED 2015-03-09 17:38:17,867 Stage-0_0: 1/1 Finished Stage-1_0: 1/1 Finished Stage-2_0: 1/1 Finished Status: Finished successfully in 10.07 seconds OK city_id c -1000 22826 -10 17294 -20 10608 -1 6186 4158 Time taken: 18.417 seconds, Fetched: 5 row(s)
關(guān)于“hive on spark如何編譯”這篇文章就分享到這里了,希望以上內(nèi)容可以對(duì)大家有一定的幫助,使各位可以學(xué)到更多知識(shí),如果覺得文章不錯(cuò),請(qǐng)把它分享出去讓更多的人看到。