小編給大家分享一下flume1.7 新特性是什么,相信大部分人都還不怎么了解,因此分享這篇文章給大家參考一下,希望大家閱讀完這篇文章后大有收獲,下面讓我們一起去了解一下吧!
成都創(chuàng)新互聯(lián)公司服務(wù)項(xiàng)目包括巧家網(wǎng)站建設(shè)、巧家網(wǎng)站制作、巧家網(wǎng)頁制作以及巧家網(wǎng)絡(luò)營(yíng)銷策劃等。多年來,我們專注于互聯(lián)網(wǎng)行業(yè),利用自身積累的技術(shù)優(yōu)勢(shì)、行業(yè)經(jīng)驗(yàn)、深度合作伙伴關(guān)系等,向廣大中小型企業(yè)、政府機(jī)構(gòu)等提供互聯(lián)網(wǎng)行業(yè)的解決方案,巧家網(wǎng)站推廣取得了明顯的社會(huì)效益與經(jīng)濟(jì)效益。目前,我們服務(wù)的客戶以成都為中心已經(jīng)輻射到巧家省份的部分城市,未來相信會(huì)繼續(xù)擴(kuò)大服務(wù)區(qū)域并繼續(xù)獲得客戶的支持與信任!
在flume1.7之前如果想要監(jiān)控一個(gè)文件新增的內(nèi)容,我們一般采用的source 為 exec tail ,但是這會(huì)有一個(gè)弊端,就是當(dāng)你的服務(wù)器宕機(jī)重啟后,此時(shí)數(shù)據(jù)讀取還是從頭開始,這顯然不是我們想看到的! 在flume1.7沒有出來之前我們一般的解決思路為:當(dāng)讀取一條記錄后,就把當(dāng)前的記錄的行號(hào)記錄到一個(gè)文件中,宕機(jī)重啟時(shí),我們可以先從文件中獲取到最后一次讀取文件的行數(shù),然后繼續(xù)監(jiān)控讀取下去。保證數(shù)據(jù)不丟失、不重復(fù)。
具體配置文件修改為:
a1.sources.r3.command = tail -n +$(tail -n1 /root/nnn) -F /root/data/web.log | awk 'ARGIND==1{i=$0;next}{i++;if($0~/^tail/){i=0};print $0;print i >> "/root/nnn";fflush("")}' /root/nnn -
其中/root/data/web.log 為監(jiān)控的文件,/root/nnn為保存讀取記錄的文件。
而在flume1.7時(shí)新增了一個(gè)source 的類型為taildir,它可以監(jiān)控一個(gè)目錄下的多個(gè)文件,并且實(shí)現(xiàn)了實(shí)時(shí)讀取記錄保存的功能!功能更加強(qiáng)大! 先看看官網(wǎng)的介紹:
-Taildir Source
Note
This source is provided as a preview feature. It does not work on Windows.
Watch the specified files, and tail them in nearly real-time once detected new lines appended to the each files. If the new lines are being written, this source will retry reading them in wait for the completion of the write.
This source is reliable and will not miss data even when the tailing files rotate. It periodically writes the last read position of each files on the given position file in JSON format. If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file.
In other use case, this source can also start tailing from the arbitrary position for each files using the given position file. When there is no position file on the specified path, it will start tailing from the first line of each files by default.
Files will be consumed in order of their modification time. File with the oldest modification time will be consumed first.
This source does not rename or delete or do any modifications to the file being tailed. Currently this source does not support tailing binary files. It reads text files line by line.
需求:實(shí)現(xiàn)flume監(jiān)控一個(gè)目錄下的多個(gè)文件內(nèi)容,實(shí)時(shí)的收集存儲(chǔ)到hadoop集群中。
配置案例:
a1.channels = ch2
a1.sources = s1
a1.sinks = hdfs-sink1
#channel
a1.channels.ch2.type = memory
a1.channels.ch2.capacity=100000
a1.channels.ch2.transactionCapacity=50000
#source
a1.sources.s1.channels = ch2
#監(jiān)控一個(gè)目錄下的多個(gè)文件新增的內(nèi)容
a1.sources.s1.type = taildir
#通過 json 格式存下每個(gè)文件消費(fèi)的偏移量,避免從頭消費(fèi)
a1.sources.s1.positionFile = /var/local/apache-flume-1.7.0-bin/taildir_position.json
a1.sources.s1.filegroups = f1 f2 f3
a1.sources.s1.filegroups.f1 = /root/data/access.log
a1.sources.s1.filegroups.f2 = /root/data/nginx.log
a1.sources.s1.filegroups.f3 = /root/data/web.log
a1.sources.s1.headers.f1.headerKey = access
a1.sources.s1.headers.f2.headerKey = nginx
a1.sources.s1.headers.f3.headerKey = web
a1.sources.s1.fileHeader = true
##sink
a1.sinks.hdfs-sink1.channel = ch2
a1.sinks.hdfs-sink1.type = hdfs
a1.sinks.hdfs-sink1.hdfs.path =hdfs://master:9000/demo/data
a1.sinks.hdfs-sink1.hdfs.filePrefix = event_data
a1.sinks.hdfs-sink1.hdfs.fileSuffix = .log
a1.sinks.hdfs-sink1.hdfs.rollSize = 10485760
a1.sinks.hdfs-sink1.hdfs.rollInterval =20
a1.sinks.hdfs-sink1.hdfs.rollCount = 0
a1.sinks.hdfs-sink1.hdfs.batchSize = 1500
a1.sinks.hdfs-sink1.hdfs.round = true
a1.sinks.hdfs-sink1.hdfs.roundUnit = minute
a1.sinks.hdfs-sink1.hdfs.threadsPoolSize = 25
a1.sinks.hdfs-sink1.hdfs.useLocalTimeStamp = true
a1.sinks.hdfs-sink1.hdfs.minBlockReplicas = 1
a1.sinks.hdfs-sink1.hdfs.fileType =DataStream
a1.sinks.hdfs-sink1.hdfs.writeFormat = Text
a1.sinks.hdfs-sink1.hdfs.callTimeout = 60000
以上是“flume1.7 新特性是什么”這篇文章的所有內(nèi)容,感謝各位的閱讀!相信大家都有了一定的了解,希望分享的內(nèi)容對(duì)大家有所幫助,如果還想學(xué)習(xí)更多知識(shí),歡迎關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道!