監(jiān)控機(jī):192.168.10.133
站在用戶的角度思考問(wèn)題,與客戶深入溝通,找到沁水網(wǎng)站設(shè)計(jì)與沁水網(wǎng)站推廣的解決方案,憑借多年的經(jīng)驗(yàn),讓設(shè)計(jì)與互聯(lián)網(wǎng)技術(shù)結(jié)合,創(chuàng)造個(gè)性化、用戶體驗(yàn)好的作品,建站類型包括:網(wǎng)站設(shè)計(jì)制作、網(wǎng)站設(shè)計(jì)、企業(yè)官網(wǎng)、英文網(wǎng)站、手機(jī)端網(wǎng)站、網(wǎng)站推廣、主機(jī)域名、網(wǎng)頁(yè)空間、企業(yè)郵箱。業(yè)務(wù)覆蓋沁水地區(qū)。
被監(jiān)控機(jī):192.168.10.107
系統(tǒng):centos 6.5_x64
1、在監(jiān)控端安裝和測(cè)試snmp
2、獲取被監(jiān)控端的網(wǎng)卡信息(用于腳本中的-I參數(shù))
[root@wqk1 mnt]# ./check_traffic.sh -V 2c -C public -H 192.168.10.107 -L
List Interface for host 192.168.10.107.
Interface index 1 orresponding to lo
Interface index 2 orresponding to eth0 //使用這個(gè)網(wǎng)卡接口
Interface index 3 orresponding to sit0
3、在監(jiān)控端和被監(jiān)控端放置腳本
cd /usr/local/nagios/libexec
chmod +x /usr/local/nagios/libexec/check_traffic.sh
./check_traffic.sh -h
./check_traffic.sh -V 2c -C public -H 192.168.10.107 -I 2 -w 200,300 -c 400,500 -K –B 定義in和out值分別超過(guò)200K、300K警告,超過(guò)400K,500k嚴(yán)重警告。(第一次運(yùn)行沒(méi)有輸出,30s后第二次運(yùn)行才有輸出;-I后面的2對(duì)應(yīng) 上面獲得的被監(jiān)控端的網(wǎng)卡信息)
OK - It's the first time for this plugins run. We'll get the data from the next time.
第一次執(zhí)行,history data file(/var/tmp/check_traffic_${Host}_${Interface}.hist_dat)不存在,因此會(huì)由此提示,可以忽略,再執(zhí)行一次可以正正常獲取
4、在監(jiān)控端command文件里面定義check_traffic命令
# 'check_traffic' command definition
define command{
command_name check_traffic
command_line $USER1$/check_traffic.sh -V 2c -C public -H $HOSTADDRESS$ -I $ARG1$ -w $ARG2$ -c $ARG3$ -K -B 監(jiān)控單位可以自定義
}
5、在監(jiān)控端localhost.cfg中定義主機(jī)
vim /usr/local/nagios/etc/objects/localhost.cfg
define host {
host_name wqk_centos-107
alias centos-107
address 192.168.10.107
check_command check-host-alive
notification_options d,u,r
check_interval 1
max_check_attempts 2
contact_groups admins
notification_interval 10
notification_period 24x7
}
6、在監(jiān)控端localhost.cfg中定義服務(wù)
vim /usr/local/nagios/etc/objects/localhost.cfg
define service {
host_name wqk_centos-107
service_description check_traffic
check_period 24x7
normal_check_interval 2
retry_check_interval 1
max_check_attempts 5
notification_period 24x7
notification_options w,u,c,r
check_command check_nrpe!check_traffic //與command文件中的名字一樣
7、檢查測(cè)試
cd /usr/local/nagios/libexec
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
8、重啟服務(wù)
service nagios restart
******************
1、在被監(jiān)控端安裝和測(cè)試snmp
參考:http://151wqooo.blog.51cto.com/2610898/1176730
2、插件
cd /usr/local/nagios/libexec
chmod +x /usr/local/nagios/libexec/check_traffic.sh
3、定義命令
編輯nrpe配置文件:
vim /usr/local/nagios/etc/nrpe.cfg
添加:
command[check_traffic]=/usr/local/nagios/libexec/check_traffic.sh -V 2c -C public -H 127.0.0.1 -I 2 -w 200,300 -c 400,500 -K –B
4、重啟服務(wù)
ps aux | grep nrpe
kill掉nrpe 進(jìn)程
/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d //啟動(dòng)nrpe服務(wù)
5、遠(yuǎn)程測(cè)試
在監(jiān)控端
cd /usr/local/nagios/libexec
./check_nrpe -H 192.168.10.107 -c check_traffic
The check interval must greater than 30 Seconds. But now it's 1. Please retry it later.
[root@wqk1 libexec]# ./check_nrpe -H 192.168.10.107 -c check_traffic
OK - The Traffic In is 0.0KB, Out is 0.0KB, Total is 0.0KB. The Check Interval is 60s |In=0.0KB;200;400;0;0 Out=0.0KB;300;500;0;0 Total=0.0KB;500;900;0;0 Interval=60s;1200;1800;0;0
第一次運(yùn)行沒(méi)有輸出,30s后第二次運(yùn)行才有輸出
[root@wqk1 mnt]# ./check_traffic.sh -V 2c -C public -H 192.168.10.107 -I 2 -w 200,300 -c 400,500 -K –B
OK - The Traffic In is 1Kbps, Out is 0.0Kbps, Total is 1Kbps. The Check Interval is 1165s |In=1Kbps;200;400;0;0 Out=0.0Kbps;300;500;0;0 Total=1Kbps;500;900;0;0 Interval=1165s;1200;1800;0;0
[root@wqk1 mnt]# ./check_traffic.sh -V 2c -C public -H 192.168.10.107 -I 2 -w 0,0 -c 400,500 -K –B
Warning - The Traffic In is 0.0Kbps, Out is 0.0Kbps, Total is 0.0Kbps. The Check Interval is 54s |In=0.0Kbps;0;400;0;0 Out=0.0Kbps;0;500;0;0 Total=0.0Kbps;0;900;0;0 Interval=54s;1200;1800;0;0
[root@wqk1 mnt]# ./check_traffic.sh -V 2c -C public -H 192.168.10.107 -I 2 -w 300,400 -c 400,500 -K –B
OK - The Traffic In is 0.0Kbps, Out is 0.0Kbps, Total is 0.0Kbps. The Check Interval is 68s |In=0.0Kbps;300;400;0;0 Out=0.0Kbps;400;500;0;0 Total=0.0Kbps;700;900;0;0 Interval=68s;1200;1800;0;0
[root@wqk1 mnt]# ./check_traffic.sh -V 2c -C public -H 192.168.10.107 -I 2 -w 500,600 -c 800,900 -K –B
OK - The Traffic In is 0.0Kbps, Out is 0.0Kbps, Total is 0.0Kbps. The Check Interval is 68s |In=0.0Kbps;500;800;0;0 Out=0.0Kbps;600;900;0;0 Total=0.0Kbps;1100;1700;0;0 Interval=68s;1200;1800;0;0
對(duì)比紅色數(shù)字字體——為監(jiān)控設(shè)置的閥值,后面為實(shí)際流量值,這就是監(jiān)控點(diǎn)。
==================
執(zhí)行命令報(bào)錯(cuò):
./check_nrpe -H 192.168.10.249 -c check_traffic -t 30
Unknown - Read or Write File /var/tmp/check_traffic_127.0.0.1_2.hist_dat_root__64 Error with user uid=500(nagios) gid=500(nagios) groups=500(nagios) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023.
解決:
是因?yàn)橐苑莕agios用戶身份,手動(dòng)測(cè)試執(zhí)行過(guò)該腳本(也就是command[check_traffic]=/usr/local/nagios/libexec/check_traffic.sh -V 2c -C public -H ***.***.***.*** -I 2 -w 200,300 -c 400,500 -K -B這個(gè)操作),請(qǐng)?jiān)谡绞褂迷撃_本前,刪除被監(jiān)控端/var/tmp下對(duì)應(yīng)測(cè)試生成的/var/tmp/check_traffic_${Host}_${Interface}.hist_dat文件,否則會(huì)造成nagios用戶無(wú)法讀寫該文件的錯(cuò)誤
===================
check_traffic.sh只能監(jiān)控到某一個(gè)網(wǎng)卡的全部流量,一旦nagios流量過(guò)高,出現(xiàn)警告,如何排查那個(gè)進(jìn)程所占用的流量
可以使用nethogs工具,可以監(jiān)控某塊網(wǎng)卡上每個(gè)進(jìn)程的流量,可以使用nethogs工具自行寫個(gè)腳本,監(jiān)控你所關(guān)心的進(jìn)程流量,配合check_traffic.sh插件使用