1、logstash過(guò)濾器插件filter
10年積累的網(wǎng)站設(shè)計(jì)制作、網(wǎng)站建設(shè)經(jīng)驗(yàn),可以快速應(yīng)對(duì)客戶對(duì)網(wǎng)站的新想法和需求。提供各種問(wèn)題對(duì)應(yīng)的解決方案。讓選擇我們的客戶得到更好、更有力的網(wǎng)絡(luò)服務(wù)。我雖然不認(rèn)識(shí)你,你也不認(rèn)識(shí)我。但先網(wǎng)站策劃后付款的網(wǎng)站建設(shè)流程,更有匯川免費(fèi)網(wǎng)站建設(shè)讓你可以放心的選擇與我們合作。
1.1、grok正則捕獲
grok是一個(gè)十分強(qiáng)大的logstash filter插件,他可以通過(guò)正則解析任意文本,將非結(jié)構(gòu)化日志數(shù)據(jù)弄成結(jié)構(gòu)化和方便查詢的結(jié)構(gòu)。他是目前l(fā)ogstash 中解析非結(jié)構(gòu)化日志數(shù)據(jù)最好的方式
grok的語(yǔ)法規(guī)則是:
%{語(yǔ)法:語(yǔ)義}
“語(yǔ)法”指的是匹配的模式。例如使用NUMBER模式可以匹配出數(shù)字,IP模式則會(huì)匹配出127.0.0.1這樣的IP地址。
例如:
我們的試驗(yàn)數(shù)據(jù)是:
172.16.213.132 [07/Feb/2018:16:24:19 +0800] "GET /HTTP/1.1" 403 5039
1)我們舉個(gè)例子來(lái)講解過(guò)濾IP
input { stdin { } } filter{ grok{ match => {"message" => "%{IPV4:ip}"} } } output { stdout { } }
現(xiàn)在啟動(dòng)一下:
[root@:172.31.22.29 /etc/logstash/conf.d]#/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/l2.conf Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties172.16.213.132 [07/Feb/2018:16:24:19 +0800]"GET /HTTP/1.1" 403 5039 #手動(dòng)輸入此行信息{ "message" => "172.16.213.132 [07/Feb/2018:16:24:19 +0800]\"GET /HTTP/1.1\" 403 5039", "ip" => "172.16.213.132", "@version" => "1", "host" => "ip-172-31-22-29.ec2.internal", "@timestamp" => 2019-01-22T09:48:15.354Z }
2)舉個(gè)例子來(lái)講解過(guò)濾時(shí)間戳
input與output字段信息這里省略不寫了。
filter{ grok{ match => {"message" => "%{IPV4:ip}\ \[%{HTTPDATE:timestamp}\]"} } }
接下來(lái)我們過(guò)濾一下:
[root@:172.31.22.29 /etc/logstash/conf.d]#/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/l2.conf Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties172.16.213.132 [07/Feb/2018:16:24:19 +0800]"GET /HTTP/1.1" 403 5039 手動(dòng)輸入此行信息{ "@version" => "1", "timestamp" => "07/Feb/2018:16:24:19 +0800", "@timestamp" => 2019-01-22T10:16:14.205Z, "message" => "172.16.213.132 [07/Feb/2018:16:24:19 +0800]\"GET /HTTP/1.1\" 403 5039", "ip" => "172.16.213.132", "host" => "ip-172-31-22-29.ec2.internal"}
可以看到我們已經(jīng)過(guò)濾成功了,在配置文件中g(shù)rok其實(shí)是使用正則表達(dá)式來(lái)進(jìn)行過(guò)濾的。我們做個(gè)小實(shí)驗(yàn),比如我現(xiàn)在在例子中的數(shù)據(jù)ip后面添加兩個(gè)“-”。如圖所示:
172.16.213.132 - - [07/Feb/2018:16:24:19 +0800] "GET /HTTP/1.1" 403 5039
那么此時(shí)在配置文件中我就需要這樣子來(lái)寫:
filter{ grok{ match => {"message" => "%{IPV4:ip}\ -\ -\ \[%{HTTPDATE:timestamp}\]"} } }
那么此時(shí)在match行我就要匹配兩個(gè)“-”,否則grok就不能正確匹配數(shù)據(jù),從而不能解析數(shù)據(jù)。
啟動(dòng)一下來(lái)查看一下結(jié)果:
[root@:172.31.22.29 /etc/logstash/conf.d]#/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/l2.conf Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties172.16.213.132 - - [07/Feb/2018:16:24:19 +0800] "GET /HTTP/1.1" 403 5039 #手動(dòng)輸入此行內(nèi)容,然后按下enter鍵。{ "@timestamp" => 2019-01-22T10:25:46.687Z, "ip" => "172.16.213.132", "message" => "172.16.213.132 - - [07/Feb/2018:16:24:19 +0800] \"GET /HTTP/1.1\" 403 5039", "timestamp" => "07/Feb/2018:16:24:19 +0800", "@version" => "1", "host" => "ip-172-31-22-29.ec2.internal" }
這時(shí)候我們就得到了信息,我這里是匹配IP和時(shí)間,當(dāng)然你也可以直接匹配時(shí)間即可:
filter{ grok{ match => {"message" => "\ -\ -\ \[%{HTTPDATE:timestamp}\]"} } }
這個(gè)時(shí)候我們更加能理解grok使用正則匹配數(shù)據(jù)了。
需要注意的是:正則中,匹配空格和中括號(hào)要加上轉(zhuǎn)義符。
3)過(guò)濾出報(bào)文頭信息
首先來(lái)寫匹配的正則模式
filter{ grok{ match => {"message" => "\ %{QS:referrer}\ "} } }
啟動(dòng)一下看看結(jié)果:
[root@:172.31.22.29 /etc/logstash/conf.d]#/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/l2.conf Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties172.16.213.132 - - [07/Feb/2018:16:24:19 +0800] "GET /HTTP/1.1" 403 5039{ "@timestamp" => 2019-01-22T10:47:37.127Z, "message" => "172.16.213.132 - - [07/Feb/2018:16:24:19 +0800] \"GET /HTTP/1.1\" 403 5039", "@version" => "1", "host" => "ip-172-31-22-29.ec2.internal", "referrer" => "\"GET /HTTP/1.1\""}
4)舉一反三,我們嘗試輸出一下/var/log/message字段的時(shí)間信息。
例子的數(shù)據(jù):
Jan 20 11:33:03 ip-172-31-22-29 systemd: Removed slice User Slice of root.
我們的目的是輸出時(shí)間,也就是前三列而已。
這個(gè)時(shí)候我們可以去找匹配的正則有哪些,要去這個(gè)路徑下找:/usr/share/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-patterns-core-4.1.2/patterns目錄下的grok-patterns這個(gè)文件,我們發(fā)現(xiàn)了這個(gè):
正好非常符合上面輸出的信息。
首先寫好配置文件
filter{ grok{ match => {"message" => "%{SYSLOGTIMESTAMP:time}"} remove_field => ["message"] } }
啟動(dòng)一下看看情況:
[root@:172.31.22.29 /etc/logstash/conf.d]#/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/l4.conf Sending Logstash logs to /var/log/logstash which is now configured via log4j2.propertiesJan 20 11:33:03 ip-172-31-22-29 systemd: Removed slice User Slice of root. #手動(dòng)輸入此行信息。{ "@timestamp" => 2019-01-22T11:54:26.646Z, "host" => "ip-172-31-22-29.ec2.internal", "@version" => "1", "time" => "Jan 20 11:33:03"}
看到結(jié)果已經(jīng)轉(zhuǎn)換成功了,非常好用的工具。
1.2、date插件
在上面我們有個(gè)例子是講解timestamp字段,表示取出日志中的時(shí)間。但是在顯示的時(shí)候除了顯示你指定的timestamp外,還有一行是@timestamp信息,這兩個(gè)時(shí)間是不一樣的,@timestamp表示系統(tǒng)當(dāng)前時(shí)間。兩個(gè)時(shí)間并不是一回事,在ELK的日志處理系統(tǒng)中,@timestamp字段會(huì)被elasticsearch用到,用來(lái)標(biāo)注日志的生產(chǎn)時(shí)間,如此一來(lái),日志生成時(shí)間就會(huì)發(fā)生混亂,要解決這個(gè)問(wèn)題,需要用到另一個(gè)插件,即date插件,這個(gè)時(shí)間插件用來(lái)轉(zhuǎn)換日志記錄中的時(shí)間字符串,變成Logstash::Timestamp對(duì)象,然后轉(zhuǎn)存到@timestamp字段里面
接下來(lái)我們?cè)谂渲梦募信渲靡幌拢?/p>
filter{ grok{ match => {"message" => "\ -\ -\ \[%{HTTPDATE:timestamp}\]"} } date{ match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"] } }
注意:時(shí)區(qū)偏移量需要用一個(gè)字母Z來(lái)轉(zhuǎn)換。還有這里的“dd/MMM/yyyy”,你發(fā)現(xiàn)中間是三個(gè)大寫的M,沒(méi)錯(cuò),這里確實(shí)是三個(gè)大寫的M,我嘗試只寫兩個(gè)M的話,轉(zhuǎn)換失敗
啟動(dòng)一下我們看看效果:
[root@:172.31.22.29 /etc/logstash/conf.d]#/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/l2.conf Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties172.16.213.132 - - [07/Feb/2018:16:24:19 +0800] "GET /HTTP/1.1" 403 5039 #手動(dòng)輸入此行信息{ "host" => "ip-172-31-22-29.ec2.internal", "timestamp" => "07/Feb/2018:16:24:19 +0800", "@timestamp" => 2018-02-07T08:24:19.000Z, "message" => "172.16.213.132 - - [07/Feb/2018:16:24:19 +0800] \"GET /HTTP/1.1\" 403 5039", "@version" => "1" }
會(huì)發(fā)現(xiàn)@timestamp時(shí)間轉(zhuǎn)換成功,因?yàn)槲覍戇@篇博客是在2019年1月22日寫的。還有一點(diǎn)就是在時(shí)間少8個(gè)小時(shí),你發(fā)現(xiàn)了嗎?繼續(xù)往下看
1.2、remove_field的用法
remove_field的用法也是很常見的,他的作用就是去重,在前面的例子中你也看到了,不管是我們要輸出什么樣子的信息,都是有兩份數(shù)據(jù),即message里面是一份,HTTPDATE或者IP里面也有一份,這樣子就造成了重復(fù),過(guò)濾的目的就是篩選出有用的信息,重復(fù)的不要,因此我們看看如何去重呢?
1)我們還是以輸出IP為例:
filter{ grok{ match => {"message" => "%{IP:ip_address}"} remove_field => ["message"] } }
啟動(dòng)服務(wù)查看一下:
[root@:172.31.22.29 /etc/logstash/conf.d]#/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/l5.conf Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties172.16.213.132 - - [07/Feb/2018:16:24:19 +0800] "GET /HTTP/1.1" 403 5039 #手動(dòng)輸入此行內(nèi)容并按enter鍵{ "ip_address" => "172.16.213.132", "host" => "ip-172-31-22-29.ec2.internal", "@version" => "1", "@timestamp" => 2019-01-22T12:16:58.918Z }
這時(shí)候你會(huì)發(fā)現(xiàn)沒(méi)有之前顯示的那個(gè)message的那一行信息了。因?yàn)槲覀兪褂胷emove_field把他移除了,這樣的好處顯而易見,我們只需要日志中特定的信息而已。
2)在上面的幾個(gè)例子中我們是把message一行的信息一個(gè)一個(gè)分開演示了,現(xiàn)在我想在一個(gè)logstash中全部顯示出來(lái)。
我們先在配置文件中配置一下:
filter{ grok{ match => {"message" => "%{IP:ip_address}\ -\ -\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:status}\ %{NUMBER:bytes}"} } date{ match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"] } }
啟動(dòng)一下,看看情況:
[root@. /etc/logstash/conf.d]#/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/l5.conf Sending Logstash logs to /var/log/logstash which now configured via log4j2.properties
{
"status" => "403",
"bytes" => "5039",
"message" => "172.16.213.132 - - [07/Feb/2018:16:24:19 +0800] \"GET /HTTP/1.1\" 403 5039",
"ip_address" => "172.16.213.132",
"timestamp" => "07/Feb/2018:16:24:19 +0800",
"@timestamp" => 2018-02-07T08:24:19.000Z,
"referrer" => "\"GET /HTTP/1.1\"",
"@version" => "1",
"host" => "ip-172-31-22-29.ec2.internal"
}
在這個(gè)例子中,你能感受到輸出內(nèi)容的臃腫,相當(dāng)于輸出了兩份的內(nèi)容,因此我們很有必要將原始內(nèi)容message的這一行給去掉。
3)使用remove_field去掉message這一行的信息。
首先我們修改一下配置文件:
filter{ grok{ match => {"message" => "%{IP:ip_address}\ -\ -\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:status}\ %{NUMBER:bytes}"} } date{ match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"] } mutate{ remove_field => ["message","timestamp"] }
啟動(dòng)一下看看:
[root@:. /etc/logstash/conf.d]#/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/l5.conf Sending Logstash logs to /var/log/logstash which now configured via log4j2.properties
看到了嗎這就是我們想要的最終結(jié)果
1.3、時(shí)間處理(date)
上面有幾個(gè)例子已經(jīng)講到了date的用法。date插件對(duì)于排序事件和回填舊數(shù)據(jù)尤其重要,它可以用來(lái)轉(zhuǎn)換日志記錄中的時(shí)間字段,變成Logstash::timestamp對(duì)象,然后轉(zhuǎn)存到@timestamp字段里面。
為什么要使用這個(gè)插件呢?
1、一方面由于Logstash會(huì)給收集到的每條日志自動(dòng)打上時(shí)間戳(即@timestamp),但是這個(gè)時(shí)間戳記錄的是input接收數(shù)據(jù)的時(shí)間,而不是日志生成的時(shí)間(因?yàn)槿罩旧蓵r(shí)間與input接收的時(shí)間肯定不同),這樣就可能導(dǎo)致搜索數(shù)據(jù)時(shí)產(chǎn)生混亂。
2、另一方面,在上面那段rubydebug編碼格式的輸出中,@timestamp字段雖然已經(jīng)獲取了timestamp字段的時(shí)間,但是仍然比北京時(shí)間晚了8個(gè)小時(shí),這是因?yàn)樵贓lasticsearch內(nèi)部,對(duì)時(shí)間類型字段都是統(tǒng)一采用UTC時(shí)間,而日志統(tǒng)一采用UTC時(shí)間存儲(chǔ),是國(guó)際安全、運(yùn)維界的一個(gè)共識(shí)。其實(shí)這并不影響什么,因?yàn)镋LK已經(jīng)給出了解決方案,那就是在Kibana平臺(tái)上,程序會(huì)自動(dòng)讀取瀏覽器的當(dāng)前時(shí)區(qū),然后在web頁(yè)面自動(dòng)將UTC時(shí)間轉(zhuǎn)換為當(dāng)前時(shí)區(qū)的時(shí)間。
如果你要解析你的時(shí)間,你要使用字符來(lái)代替,用于解析日期和時(shí)間文本的語(yǔ)法使用字母來(lái)指示時(shí)間(年、月、日、時(shí)、分等)的類型。以及重復(fù)的字母來(lái)表示該值的形式。在上面看到的"dd/MMM/yyy:HH:mm:ss Z",他就是使用這種形式,我們列出字符的含義:
那我們是依據(jù)什么寫出“dd/MMM/yyy:HH:mm:ss Z”這樣子的形式的呢?
這一點(diǎn)不好理解,給大家盡量說(shuō)清楚。比如上面的試驗(yàn)數(shù)據(jù)是
172.16.213.132 - - [07/Feb/2018:16:24:19 +0800] "GET /HTTP/1.1" 403 5039
現(xiàn)在我們想轉(zhuǎn)換時(shí)間,那就要寫出"dd/MMM/yyy:HH:mm:ss Z",你發(fā)現(xiàn)中間有三個(gè)M,你要是寫出兩個(gè)就不行了,因?yàn)槲覀儾楸戆l(fā)現(xiàn)兩個(gè)大寫的M表示兩位數(shù)字的月份,可是我們要解析的文本中,月份則是使用簡(jiǎn)寫的英文,所以只能去找三個(gè)M。還有最后為什么要加上個(gè)大寫字母Z,因?yàn)橐馕龅奈谋局泻小?0800”時(shí)區(qū)偏移,因此我們要加上去,否則filter就不能正確解析文本數(shù)據(jù),從而轉(zhuǎn)換時(shí)間戳失敗。
1.4、數(shù)據(jù)修改mutate插件
mutate插件是logstash另一個(gè)非常重要的插件,它提供了豐富的基礎(chǔ)類型數(shù)據(jù)處理能力,包括重命名、刪除、替換、修改日志事件中的字段。我們這里舉幾個(gè)常用的mutate插件:字段類型轉(zhuǎn)換功能covert、正則表達(dá)式替換字段功能gsub、分隔符分隔字符串為數(shù)值功能split、重命名字段功能rename、刪除字段功能remove_field
1)字段類型轉(zhuǎn)換convert
先修改配置文件:
filter{ grok{ match => {"message" => "%{IPV4:ip}"} remove_field => ["message"] } mutate{ convert => ["ip","string"] } }
或者這樣子寫也行,寫法區(qū)別較?。?/p>
filter{ grok{ match => {"message" => "%{IPV4:ip}"} remove_field => ["message"] } mutate{ convert => { "ip" => "string" } } }
現(xiàn)在我們啟動(dòng)服務(wù)查看一下效果:
[root@:172.31.22.29 /etc/logstash/conf.d]#/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/l6.conf Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties172.16.213.132 - - [07/Feb/2018:16:24:9 +0800] "GET /HTTP/1.1" 403 5039{ "@timestamp" => 2019-01-23T04:13:55.261Z, "ip" => "172.16.213.132", "host" => "ip-172-31-22-29.ec2.internal", "@version" => "1" }
在這里的ip行中,效果可能不太明顯,但是確實(shí)是已經(jīng)轉(zhuǎn)化成string模式了。
2)正則表達(dá)式替換匹配字段
gsub可以通過(guò)正則表達(dá)式替換字段中匹配到的值,但是這本身只對(duì)字符串字段有效。
首先把修改配置文件看看
filter{ grok{ match => {"message" => "%{QS:referrer}"} remove_field => ["message"] } mutate{ gsub => ["referrer","/","-"] } }
啟動(dòng)一下看看效果:
172.16.213.132 - - [07/Feb/2018:16:24:9 +0800] "GET /HTTP/1.1" 403 5039{ "host" => "ip-172-31-22-29.ec2.internal", "@timestamp" => 2019-01-23T05:51:30.786Z, "@version" => "1", "referrer" => "\"GET -HTTP-1.1\"" }
很不錯(cuò),確實(shí)對(duì)QS的部分的分隔符換做橫杠了
3)分隔符分隔字符串為數(shù)組
split可以通過(guò)指定的分隔符分隔字段中的字符串為數(shù)組。
首先配置文件
filter{ mutate{ split => ["message","-"] add_field => ["A is lower case :","%{[message][0]}"] } }
這里的意思是對(duì)一個(gè)字段按照“-”進(jìn)行分隔為數(shù)組
啟動(dòng)一下:
a-b-c-d-e-f-g #手動(dòng)輸入此行內(nèi)容,并按下enter鍵。{ "A is lower case :" => "a", "message" => [ [0] "a", [1] "b", [2] "c", [3] "d", [4] "e", [5] "f", [6] "g" ], "host" => "ip-172-31-22-29.ec2.internal", "@version" => "1", "@timestamp" => 2019-01-23T06:07:18.062Z }
4)重命名字段
rename可以實(shí)現(xiàn)重命名某個(gè)字段的功能。
filter{ grok{ match => {"message" => "%{IPV4:ip}"} remove_field => ["message"] } mutate{ convert => { "ip" => "string" } rename => { "ip"=>"IP" } } }
rename字段使用大括號(hào){}括起來(lái),其實(shí)我們也可以使用中括號(hào)達(dá)到同樣的目的
mutate{ convert => { "ip" => "string" } rename => ["ip","IP"] }
啟動(dòng)后檢查一下:
172.16.213.132 - - [07/Feb/2018:16:24:9 +0800] "GET /HTTP/1.1" 403 5039 #手動(dòng)輸入此內(nèi)容{ "@version" => "1", "@timestamp" => 2019-01-23T06:20:21.423Z, "host" => "ip-172-31-22-29.ec2.internal", "IP" => "172.16.213.132"}
5)刪除字段,這個(gè)不多說(shuō),我們上面已經(jīng)有例子了。
6)添加字段add_field。
添加字段多用于split分隔中,主要是對(duì)split分隔后的字段中指定格式輸出。
filter { mutate { split => [ add_field => {
添加字段后,該字段會(huì)與@timestamp一樣同等格式顯示出來(lái)。
1.5、geoip地址查詢歸類
geoip是常見的免費(fèi)的IP地址歸類查詢庫(kù),geoip可以根據(jù)IP地址提供對(duì)應(yīng)的地域信息,包括國(guó)別,省市,經(jīng)緯度等等,此插件對(duì)于可視化地圖和區(qū)域統(tǒng)計(jì)非常有用。
首先我們修改一下配置文件來(lái)看看
filter{ grok { match => { "message" => "%{IP:ip}" } remove_field => ["message"] } geoip { source => "ip" } }
中間match的部分也可以替換成下圖例子:
grok { match => ["message","%{IP:ip}"] remove_field => ["message"] }
啟動(dòng)一下看看效果:
[root@:172.31.22.29 /etc/logstash/conf.d]#/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/l7.conf Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties114.55.68.111 - - [07/Feb/2018:16:24:9 +0800] "GET /HTTP/1.1" 403 5039 #手動(dòng)輸入此行信息{ "ip" => "114.55.68.111", "geoip" => { "city_name" => "Hangzhou", "region_code" => "33", "location" => { "lat" => 30.2936, "lon" => 120.1614 }, "longitude" => 120.1614, "latitude" => 30.2936, "country_code2" => "CN", "timezone" => "Asia/Shanghai", "ip" => "114.55.68.111", "country_code3" => "CN", "continent_code" => "AS", "country_name" => "China", "region_name" => "Zhejiang" }, "host" => "ip-172-31-22-29.ec2.internal", "@version" => "1", "@timestamp" => 2019-01-23T06:47:51.200Z }
成功了。
但是上面的內(nèi)容并不是每個(gè)都是我們想要的,因此我們可以選擇性的輸出。
繼續(xù)修改內(nèi)容如下:
filter{ grok { match => ["message","%{IP:ip}"] remove_field => ["message"] } geoip { source => ["ip"] target => ["geoip"] fields => ["city_name","region_name","country_name","ip"] } }
啟動(dòng)一下看看:
114.55.68.111 - - [07/Feb/2018:16:24:9 +0800] "GET /HTTP/1.1" 403 5039 #手動(dòng)輸入此行信息{ "@timestamp" => 2019-01-23T06:57:29.955Z, "ip" => "114.55.68.111", "geoip" => { "city_name" => "Hangzhou", "ip" => "114.55.68.111", "country_name" => "China", "region_name" => "Zhejiang" }, "@version" => "1", "host" => "ip-172-31-22-29.ec2.internal"}
發(fā)現(xiàn)輸出的內(nèi)容果然變少了,我們想輸出什么他就輸出什么內(nèi)容。
1.6、filter插件綜合應(yīng)用。
我們的業(yè)務(wù)例子如下所示:
112.195.209.90 - - [20/Feb/2018:12:12:14 +0800] "GET / HTTP/1.1" 200 190 "-" "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36" "-"
日志中的雙引號(hào)、單引號(hào)、中括號(hào)等不能被正則解析的都要加上轉(zhuǎn)義符號(hào),詳情可見這里:https://www.cnblogs.com/ysk123/p/9858387.html
現(xiàn)在我們修改配置文件進(jìn)行匹配
filter{ grok { match => ["message","%{IPORHOST:client_ip}\ -\ -\ \[%{HTTPDATE:timestamp}\]\ %{QS:referrer}\ %{NUMBER:status}\ %{NUMBER:bytes}\ \"-\"\ \"%{DATA:browser_info}\ %{GREEDYDATA:extra_info}\"\ \"-\""] } geoip { source => ["client_ip"] target => ["geoip"] fields => ["city_name","region_name","country_name","ip"] } date { match => ["timestamp","dd/MMM/yyyy:HH:mm:ss Z"] } mutate { remove_field => ["message","timestamp"] } }
然后啟動(dòng)一下看看效果:
[root@:vg_adn_tidbCkhsTest:23.22.172.65:172.31.22.29 /etc/logstash/conf.d]#/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/l9.conf Sending Logstash logs to /var/log/logstash which is now configured via log4j2.properties112.195.209.90 - - [20/Feb/2018:12:12:14 +0800] "GET / HTTP/1.1" 200 190 "-" "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36" "-"{ "referrer" => "\"GET / HTTP/1.1\"", "bytes" => "190", "client_ip" => "112.195.209.90", "@timestamp" => 2018-02-20T04:12:14.000Z, "browser_info" => "Mozilla/5.0", "extra_info" => "(Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Mobile Safari/537.36", "status" => "200", "host" => "ip-172-31-22-29.ec2.internal", "@version" => "1", "geoip" => { "city_name" => "Chengdu", "region_name" => "Sichuan", "country_name" => "China", "ip" => "112.195.209.90" } }
上面紅色字體的是我們手動(dòng)輸入進(jìn)去的內(nèi)容,下面金色字體是系統(tǒng)反饋給我們的信息。
通過(guò)信息我們可以查看信息已經(jīng)過(guò)濾成功了。非常好。
注意:有一點(diǎn)需要注意:在匹配信息的時(shí)候,GREEDYDATA與DATA匹配的機(jī)制是不一樣的,GREEDYDATA是貪婪模式,而DATA則是能少匹配一點(diǎn)就少匹配一點(diǎn)。通過(guò)上面的例子大家再體會(huì)一下