這篇文章主要介紹怎么搭建Flink開(kāi)發(fā)IDEA環(huán)境,文中介紹的非常詳細(xì),具有一定的參考價(jià)值,感興趣的小伙伴們一定要看完!
成都創(chuàng)新互聯(lián)專(zhuān)注于吉縣企業(yè)網(wǎng)站建設(shè),成都響應(yīng)式網(wǎng)站建設(shè)公司,購(gòu)物商城網(wǎng)站建設(shè)。吉縣網(wǎng)站建設(shè)公司,為吉縣等地區(qū)提供建站服務(wù)。全流程按需設(shè)計(jì)網(wǎng)站,專(zhuān)業(yè)設(shè)計(jì),全程項(xiàng)目跟蹤,成都創(chuàng)新互聯(lián)專(zhuān)業(yè)和態(tài)度為您提供的服務(wù)
一.IDEA開(kāi)發(fā)環(huán)境
1.pom文件設(shè)置
1.8 1.8 UTF-8 2.11.12 2.11 2.7.6 1.6.1 org.scala-lang scala-library ${scala.version} org.apache.flink flink-java ${flink.version} org.apache.flink flink-streaming-java_${scala.binary.version} ${flink.version} org.apache.flink flink-scala_${scala.binary.version} ${flink.version} org.apache.flink flink-streaming-scala_${scala.binary.version} ${flink.version} org.apache.flink flink-table_${scala.binary.version} ${flink.version} org.apache.flink flink-clients_${scala.binary.version} ${flink.version} org.apache.flink flink-connector-kafka-0.10_${scala.binary.version} ${flink.version} org.apache.hadoop hadoop-client ${hadoop.version} MySQL mysql-connector-java 5.1.38 com.alibaba fastjson 1.2.22 src/main/scala src/test/scala net.alchim31.maven scala-maven-plugin 3.2.0 compile testCompile -dependencyfile ${project.build.directory}/.scala_dependencies org.apache.maven.plugins maven-surefire-plugin 2.18.1 false true **/*Test.* **/*Suite.* org.apache.maven.plugins maven-shade-plugin 3.0.0 package shade *:* META-INF/*.SF META-INF/*.DSA META-INF/*.RSA org.apache.spark.WordCount
2.flink開(kāi)發(fā)流程
Flink具有特殊類(lèi)DataSet
并DataStream
在程序中表示數(shù)據(jù)。您可以將它們視為可以包含重復(fù)項(xiàng)的不可變數(shù)據(jù)集合。在DataSet
數(shù)據(jù)有限的情況下,對(duì)于一個(gè)DataStream
元素的數(shù)量可以是無(wú)界的。
這些集合在某些關(guān)鍵方面與常規(guī)Java集合不同。首先,它們是不可變的,這意味著一旦創(chuàng)建它們就無(wú)法添加或刪除元素。你也不能簡(jiǎn)單地檢查里面的元素。
集合最初通過(guò)在弗林克程序添加源創(chuàng)建和新的集合從這些通過(guò)將它們使用API方法如衍生map
,filter
等等。
Flink程序看起來(lái)像是轉(zhuǎn)換數(shù)據(jù)集合的常規(guī)程序。每個(gè)程序包含相同的基本部分:
1.獲取execution environment,
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
2.加載/創(chuàng)建初始化數(shù)據(jù)
DataStreamtext = env.readTextFile(file:///path/to/file);
3.指定此數(shù)據(jù)的轉(zhuǎn)換
val mapped = input.map { x => x.toInt }
4.指定放置計(jì)算結(jié)果的位置
writeAsText(String path) print()
5.觸發(fā)程序執(zhí)行
在local模式下執(zhí)行程序
execute()
將程序達(dá)成jar運(yùn)行在線上
./bin/flink run \ -m node21:8081 \ ./examples/batch/WordCount.jar \ --input hdfs:///user/admin/input/wc.txt\ --outputhdfs:///user/admin/output2\
二.Wordcount案例
1.Scala代碼
package com.xyg.streaming import org.apache.flink.api.java.utils.ParameterTool import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment import org.apache.flink.streaming.api.windowing.time.Time /** * Author: Mr.Deng * Date: 2018/10/15 * Desc: */ object SocketWindowWordCountScala { def main(args: Array[String]) : Unit = { // 定義一個(gè)數(shù)據(jù)類(lèi)型保存單詞出現(xiàn)的次數(shù) case class WordWithCount(word: String, count: Long) // port 表示需要連接的端口 val port: Int = try { ParameterTool.fromArgs(args).getInt("port") } catch { case e: Exception => { System.err.println("No port specified. Please run 'SocketWindowWordCount --port'") return } } // 獲取運(yùn)行環(huán)境 val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment // 連接此socket獲取輸入數(shù)據(jù) val text = env.socketTextStream("node21", port, '\n') //需要加上這一行隱式轉(zhuǎn)換 否則在調(diào)用flatmap方法的時(shí)候會(huì)報(bào)錯(cuò) import org.apache.flink.api.scala._ // 解析數(shù)據(jù), 分組, 窗口化, 并且聚合求SUM val windowCounts = text .flatMap { w => w.split("\\s") } .map { w => WordWithCount(w, 1) } .keyBy("word") .timeWindow(Time.seconds(5), Time.seconds(1)) .sum("count") // 打印輸出并設(shè)置使用一個(gè)并行度 windowCounts.print().setParallelism(1) env.execute("Socket Window WordCount") } }
2.Java代碼
package com.xyg.streaming; import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.api.java.utils.ParameterTool; import org.apache.flink.streaming.api.datastream.DataStream; import org.apache.flink.streaming.api.datastream.DataStreamSource; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.api.windowing.time.Time; import org.apache.flink.util.Collector; /** * Author: Mr.Deng * Date: 2018/10/15 * Desc: 使用flink對(duì)指定窗口內(nèi)的數(shù)據(jù)進(jìn)行實(shí)時(shí)統(tǒng)計(jì),最終把結(jié)果打印出來(lái) * 先在node21機(jī)器上執(zhí)行nc -l 9000 */ public class StreamingWindowWordCountJava { public static void main(String[] args) throws Exception { //定義socket的端口號(hào) int port; try{ ParameterTool parameterTool = ParameterTool.fromArgs(args); port = parameterTool.getInt("port"); }catch (Exception e){ System.err.println("沒(méi)有指定port參數(shù),使用默認(rèn)值9000"); port = 9000; } //獲取運(yùn)行環(huán)境 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); //連接socket獲取輸入的數(shù)據(jù) DataStreamSourcetext = env.socketTextStream("node21", port, "\n"); //計(jì)算數(shù)據(jù) DataStream windowCount = text.flatMap(new FlatMapFunction () { public void flatMap(String value, Collector out) throws Exception { String[] splits = value.split("\\s"); for (String word:splits) { out.collect(new WordWithCount(word,1L)); } } })//打平操作,把每行的單詞轉(zhuǎn)為 類(lèi)型的數(shù)據(jù) //針對(duì)相同的word數(shù)據(jù)進(jìn)行分組 .keyBy("word") //指定計(jì)算數(shù)據(jù)的窗口大小和滑動(dòng)窗口大小 .timeWindow(Time.seconds(2),Time.seconds(1)) .sum("count"); //把數(shù)據(jù)打印到控制臺(tái),使用一個(gè)并行度 windowCount.print().setParallelism(1); //注意:因?yàn)閒link是懶加載的,所以必須調(diào)用execute方法,上面的代碼才會(huì)執(zhí)行 env.execute("streaming word count"); } /** * 主要為了存儲(chǔ)單詞以及單詞出現(xiàn)的次數(shù) */ public static class WordWithCount{ public String word; public long count; public WordWithCount(){} public WordWithCount(String word, long count) { this.word = word; this.count = count; } @Override public String toString() { return "WordWithCount{" + "word='" + word + '\'' + ", count=" + count + '}'; } } }
3.運(yùn)行測(cè)試
首先,使用nc命令啟動(dòng)一個(gè)本地監(jiān)聽(tīng),命令是:
[admin@node21 ~]$ nc -l 9000
通過(guò)netstat命令觀察9000端口。netstat -anlp | grep 9000,啟動(dòng)監(jiān)聽(tīng)
如果報(bào)錯(cuò):-bash: nc: command not found,請(qǐng)先安裝nc,在線安裝命令:yum -y install nc
。
然后,IDEA上運(yùn)行flink官方案例程序
node21上輸入
IDEA控制臺(tái)輸出如下
4.集群測(cè)試
這里單機(jī)測(cè)試官方案例
[admin@node21 flink-1.6.1]$ pwd /opt/flink-1.6.1 [admin@node21 flink-1.6.1]$ ./bin/start-cluster.sh Starting cluster. Starting standalonesession daemon on host node21. Starting taskexecutor daemon on host node21. [admin@node21 flink-1.6.1]$ jps StandaloneSessionClusterEntrypoint TaskManagerRunner Jps [admin@node21 flink-1.6.1]$ ./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000
程序連接到套接字并等待輸入。您可以檢查Web界面以驗(yàn)證作業(yè)是否按預(yù)期運(yùn)行:
單詞在5秒的時(shí)間窗口(處理時(shí)間,翻滾窗口)中計(jì)算并打印到stdout
。監(jiān)視TaskManager的輸出文件并寫(xiě)入一些文本nc
(輸入在點(diǎn)擊后逐行發(fā)送到Flink):
三.使用IDEA開(kāi)發(fā)離線程序
Dataset是flink的常用程序,數(shù)據(jù)集通過(guò)source進(jìn)行初始化,例如讀取文件或者序列化集合,然后通過(guò)transformation(filtering、mapping、joining、grouping)將數(shù)據(jù)集轉(zhuǎn)成,然后通過(guò)sink進(jìn)行存儲(chǔ),既可以寫(xiě)入hdfs這種分布式文件系統(tǒng),也可以打印控制臺(tái),flink可以有很多種運(yùn)行方式,如local、flink集群、yarn等.
1. scala程序
package com.xyg.batch import org.apache.flink.api.scala.ExecutionEnvironment import org.apache.flink.api.scala._ /** * Author: Mr.Deng * Date: 2018/10/19 * Desc: */ object WordCountScala{ def main(args: Array[String]) { //初始化環(huán)境 val env = ExecutionEnvironment.getExecutionEnvironment //從字符串中加載數(shù)據(jù) val text = env.fromElements( "Who's there?", "I think I hear them. Stand, ho! Who's there?") //分割字符串、匯總tuple、按照key進(jìn)行分組、統(tǒng)計(jì)分組后word個(gè)數(shù) val counts = text.flatMap { _.toLowerCase.split("\\W+") filter { _.nonEmpty } } .map { (_, 1) } .groupBy(0) .sum(1) //打印 counts.print() } }
2. java程序
package com.xyg.batch; import org.apache.flink.api.common.functions.FlatMapFunction; import org.apache.flink.api.java.DataSet; import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.api.java.tuple.Tuple2; import org.apache.flink.util.Collector; /** * Author: Mr.Deng * Date: 2018/10/19 * Desc: */ public class WordCountJava { public static void main(String[] args) throws Exception { //構(gòu)建環(huán)境 final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); //通過(guò)字符串構(gòu)建數(shù)據(jù)集 DataSettext = env.fromElements( "Who's there?", "I think I hear them. Stand, ho! Who's there?"); //分割字符串、按照key進(jìn)行分組、統(tǒng)計(jì)相同的key個(gè)數(shù) DataSet > wordCounts = text .flatMap(new LineSplitter()) .groupBy(0) .sum(1); //打印 wordCounts.print(); } //分割字符串的方法 public static class LineSplitter implements FlatMapFunction > { @Override public void flatMap(String line, Collector > out) { for (String word : line.split(" ")) { out.collect(new Tuple2 (word, 1)); } } } }
3.運(yùn)行
以上是“怎么搭建Flink開(kāi)發(fā)IDEA環(huán)境”這篇文章的所有內(nèi)容,感謝各位的閱讀!希望分享的內(nèi)容對(duì)大家有幫助,更多相關(guān)知識(shí),歡迎關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道!