本篇內(nèi)容介紹了“Spark Eclipse開發(fā)環(huán)境的搭建方法”的有關(guān)知識,在實(shí)際案例的操作過程中,不少人都會遇到這樣的困境,接下來就讓小編帶領(lǐng)大家學(xué)習(xí)一下如何處理這些情況吧!希望大家仔細(xì)閱讀,能夠?qū)W有所成!
創(chuàng)新互聯(lián)建站堅(jiān)持“要么做到,要么別承諾”的工作理念,服務(wù)領(lǐng)域包括:成都網(wǎng)站設(shè)計(jì)、網(wǎng)站建設(shè)、企業(yè)官網(wǎng)、英文網(wǎng)站、手機(jī)端網(wǎng)站、網(wǎng)站推廣等服務(wù),滿足客戶于互聯(lián)網(wǎng)時(shí)代的海鹽網(wǎng)站設(shè)計(jì)、移動媒體設(shè)計(jì)的需求,幫助企業(yè)找到有效的互聯(lián)網(wǎng)解決方案。努力成為您成熟可靠的網(wǎng)絡(luò)建設(shè)合作伙伴!
首先下載與集群 Hadoop 版本對應(yīng)的 Spark 編譯好的版本,解壓縮到指定位置,注意用戶權(quán)限
進(jìn)入解壓縮之后的 SPARK_HOME 目錄
配置 /etc/profile 或者 ~/.bashrc 中配置 SPARK_HOME
cd $SPARK_HOME/conf cp spark-env.sh.template spark-env.sh
vim spark-env.sh
export SCALA_HOME=/home/hadoop/cluster/scala-2.10.5 export JAVA_HOME=/home/hadoop/cluster/jdk1.7.0_79 export HADOOP_HOME=/home/hadoop/cluster/hadoop-2.6.0 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop #注意這個(gè)地方一定要指定為IP,否則下面的eclipse去連接的時(shí)候會報(bào): #All masters are unresponsive! Giving up. 這個(gè)錯誤的。 SPARK_MASTER_IP=10.16.112.121 SPARK_LOCAL_DIRS=/home/hadoop/cluster/spark-1.4.0-bin-hadoop2.6 SPARK_DRIVER_MEMORY=1G
sbin/start-master.sh sbin/start-slave.sh
此時(shí)可以在瀏覽器中輸入:http://yourip:8080 查看Spark集群的情況
此時(shí)默認(rèn)的 Spark-Master為: spark://10.16.112.121:7077
首先下載 Scala-Eclipse IDE 去 scala 官網(wǎng)下載即可
打開IDE, 新建 Maven 項(xiàng)目, pom.xml 填寫如下:
4.0.0 spark.test FirstTrySpark 0.0.1-SNAPSHOT 2.6.0 1.4.0 org.apache.hadoop hadoop-client ${hadoop.version} provided javax.servlet * org.apache.hadoop hadoop-common 2.6.0 org.apache.hadoop hadoop-mapreduce-client-jobclient 2.6.0 org.apache.spark spark-core_2.10 ${spark.version} src/main/java net.alchim31.maven scala-maven-plugin 3.2.0 compile testCompile 2.10 org.apache.maven.plugins maven-assembly-plugin 2.5.5 jar-with-dependencies package single org.apache.maven.plugins maven-compiler-plugin 1.7 src/main/resources
新建幾個(gè) Source Folder
src/main/java #編寫 java 代碼 src/main/scala #編寫 scala 代碼 src/main/resources #存放資源文件 src/test/java #編寫測試 java 代碼 src/test/scala #編寫測試 scala 代碼 src/test/resources #存放資源文件
此時(shí)環(huán)境全部搭建完畢!
測試代碼如下:
import org.apache.spark.SparkConf import org.apache.spark.SparkConf import org.apache.spark.SparkContext /** * @author clebeg */ object FirstTry { def main(args: Array[String]): Unit = { val conf = new SparkConf conf.setMaster("spark://yourip:7077") conf.set("spark.app.name", "first-tryspark") val sc = new SparkContext(conf) val rawblocks = sc.textFile("hdfs://yourip:9000/user/hadoop/linkage") println(rawblocks.first) } }
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
分析問題:點(diǎn)開運(yùn)行ID對應(yīng)的運(yùn)行日志發(fā)現(xiàn)下面的錯誤:
15/10/10 08:49:01 INFO executor.CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 15/10/10 08:49:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/10/10 08:49:02 INFO spark.SecurityManager: Changing view acls to: hadoop,Administrator 15/10/10 08:49:02 INFO spark.SecurityManager: Changing modify acls to: hadoop,Administrator 15/10/10 08:49:02 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop, Administrator); users with modify permissions: Set(hadoop, Administrator) 15/10/10 08:49:02 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/10/10 08:49:02 INFO Remoting: Starting remoting 15/10/10 08:49:02 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@10.16.112.121:58708] 15/10/10 08:49:02 INFO util.Utils: Successfully started service 'driverPropsFetcher' on port 58708. Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [120 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:97) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:159) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) ... 4 more 15/10/10 08:51:02 INFO util.Utils: Shutdown hook called
仔細(xì)一看原來是權(quán)限的問題:立馬關(guān)閉 Hadoop, 在 etc/hadoop/core-site.xml 中添加:
hadoop.security.authorization false
設(shè)置任何人都可以讀取,問題立馬搞定。
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
到地址http://www.barik.net/archive/2015/01/19/172716/ 下載包含 winutils.exe 的 hadoop2.6 重新編譯的版本。注意一定要下載對應(yīng)自己的Hadoop版本。
減壓縮到指定位置,設(shè)置 HADOOP_HOME 環(huán)境變量。注意一定要重新啟動 eclipse。 搞定!
本文中提到的數(shù)據(jù)在哪里獲??? http://bit.ly/1Aoywaq 操作代碼如下:
mkdir linkage cd linkage/ curl -o donation.zip http://bit.ly/1Aoywaq unzip donation.zip unzip "block_*.zip" hdfs dfs -mkdir /user/hadoop/linkage hdfs dfs -put block_*.csv /user/hadoop/linkage
“Spark Eclipse開發(fā)環(huán)境的搭建方法”的內(nèi)容就介紹到這里了,感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識可以關(guān)注創(chuàng)新互聯(lián)網(wǎng)站,小編將為大家輸出更多高質(zhì)量的實(shí)用文章!