這篇文章主要介紹了nagios中check_openmanage插件學(xué)怎么用,具有一定借鑒價(jià)值,感興趣的朋友可以參考下,希望大家閱讀完這篇文章之后大有收獲,下面讓小編帶著大家一起了解一下。
創(chuàng)新互聯(lián)建站于2013年開始,先為榆陽等服務(wù)建站,榆陽等地企業(yè),進(jìn)行企業(yè)商務(wù)咨詢服務(wù)。為榆陽企業(yè)網(wǎng)站制作PC+手機(jī)+微官網(wǎng)三網(wǎng)同步一站式服務(wù)解決您的所有建站問題。
check_openmanage現(xiàn)在是epel的一個(gè)項(xiàng)目,所以安裝了epel-release就可以使用yum來安裝check_openmanage插件了。
前提是被監(jiān)控端已經(jīng)安裝了dell omsa(open management server administrator)程序。
#yum -y install nagios-plugins-openmanage.x86_64
插件路徑在:
#/usr/lib64/nagios/plugins/openmanage
#cp /usr/lib64/nagios/plugins/openmanage /usr/local/nagios/libexec/
epel和omsa如何安裝可以從網(wǎng)上google一下
被檢測(cè)端安裝還是nagios端安裝,就看檢測(cè)的環(huán)境了。
如果可以使用snmp,在nagios端安裝這個(gè)插件即可。
如果只能使用nrpe就在被監(jiān)控的機(jī)器上安裝插件。
可檢查的項(xiàng)目列表:
Storage components checked:
· Controllers
· Physical drives
· Logical drives
· Cache batteries
· Connectors (channels)
· Enclosures
· Enclosure fans
· Enclosure power supplies
· Enclosure temperature probes
· Enclosure management modules (EMMs)
Chassis components checked:
· Processors
· Memory modules
· Cooling fans
· Temperature probes
· Power supplies
· Batteries
· Voltage probes
· Power usage
· Chassis intrusion
· Removable flash media (SD cards)
Other:
· ESM Log health
· ESM Log content (default disabled)
· Alert Log content (default disabled, not SNMP)
nagios可以通過snmp來檢測(cè)主機(jī)狀態(tài),也可以使用npre來進(jìn)行檢測(cè)。使用nrpe時(shí)需要先定義相應(yīng)的command(類似于其他服務(wù)的檢查)
使用snmp時(shí)nagios的command.cfg的配置
# Openmanage check via SNMP
define command {
command_name check_openmanage
command_line /path/to/check_openmanage -H $HOSTADDRESS$
}
給監(jiān)控機(jī)配置文件增加omsa監(jiān)控
# Dell OMSA status
define service {
use generic-service
hostgroup_name dell-servers
service_description Dell OMSA
check_command check_openmanage
}
對(duì)比發(fā)現(xiàn),snmp獲取信息的速度快要快于本機(jī)的自檢。因此使用nrpe時(shí)需要帶上參數(shù)-t 30 (延時(shí) 30秒)
自帶的幫助信息:
$ check_openmanage -h
Usage: check_openmanage [OPTION]...
GENERAL OPTIONS:(公共的參數(shù),snmp和本地都可以用)
-f, --config Specify configuration file
-p, --perfdata Output performance data [default=no]
-t, --timeout Plugin timeout in seconds [default=30]
-c, --critical Custom temperature critical limits
-w, --warning Custom temperature warning limits
-F, --fahrenheit Use Fahrenheit as temperature unit
-d, --debug Debug output, reports everything
-h, --help Display this help text
-V, --version Display version info
SNMP OPTIONS:(SNMP方式)
-H, --hostname Hostname or IP (required for SNMP) (check_openmanage -H 1.2.3.4 )
-C, --community SNMP community string [default=public]
-P, --protocol SNMP protocol version [default=2]
--port SNMP port number [default=161]
-6, --ipv6 Use IPv6 instead of IPv4 [default=no]
--tcp Use TCP instead of UDP [default=no]
OUTPUT OPTIONS:
-i, --info Prefix any alerts with the service tag
-e, --extinfo Append system info to alerts
-s, --state Prefix alerts with alert state
-S, --short-state Prefix alerts with alert state abbreviated
-o, --okinfo Verbosity when check result is OK
-B, --show-blacklist Show blacklistings in OK output
-I, --htmlinfo HTML output with clickable links
CHECK CONTROL AND BLACKLISTING:
-a, --all Check everything, even log content
-b, --blacklist Blacklist missing and/or failed components 檢查黑名單
--only Only check a certain component or alert type 檢查單獨(dú)項(xiàng)
--check Fine-tune which components are checked 檢查組合項(xiàng)
--no-storage Don't check storage
For more information and advanced options, see the manual page or URL:
http://folk.uio.no/trondham/software/check_openmanage.html
snmp執(zhí)行結(jié)果:
[root@op omsa]# check_openmanage -H localhost
Controller 0 [PERC 6/i Integrated]: Firmware '6.1.1-0047' is out of date
#輸出帶有狀態(tài)提示的信息
[root@op omsa]# check_openmanage -H localhost -s
WARNING: Controller 0 [PERC 6/i Integrated]: Firmware '6.1.1-0047' is out of date
#此命令就是使用了黑名單,不檢查Firmware固件版本更新提示。
[root@localhost etc]# /usr/lib/nagios/plugins/check_openmanage -H 1.2.3.4 -s -b ctrl_fw=0
OK - System: 'PowerEdge R710', SN: 'XXXXXX', 16 GB ram (8 dimms), 1 logical drives, 6 physical drives
#只檢查電源
[root@localhost etc]# /usr/lib/nagios/plugins/check_openmanage -H 1.2.3.4 -s --only power
POWER OK - 2 power supplies checked
單項(xiàng)檢查參數(shù)表
Keyword | Effect |
critical | Only output critical alerts. It is possible to use the--check option together with this option to adjust checks. |
warning | Only output warning alerts. It is possible to use the--check option together with this option to adjust checks. |
chassis | Only check chassis components, i.e. everything but storage and log content. |
storage | Only check storage components |
memory | Only check memory modules |
fans | Only check fans |
power | Only check power supplies |
temp | Only check temperatures |
cpu | Only check processors |
voltage | Only check voltage probes |
batteries | Only check batteries |
amperage | Only check power usage |
intrusion | Only check chassis intrusion |
sdcard | Only check removable flash media |
servicetag | Only check for sane service tag |
esmhealth | Only check ESM log health |
esmlog | Only check ESM log content |
alertlog | Only check alertlog content |
#檢查存儲(chǔ)信息,并不檢查FirmWare信息
[root@localhost etc]# /usr/lib/nagios/plugins/check_openmanage -H 1.2.3.4 -s --only storage -b ctrl_fw=0
STORAGE OK - 6 physical drives, 1 logical drives
#如果想在信息顯示的時(shí)候知道哪些信息是放到了黑名單中,可以在命令最后加參數(shù) -B
[root@localhost etc]# /usr/lib/nagios/plugins/check_openmanage -H 1.2.3.4 -s -b ctrl_fw=0 -B
OK - System: 'PowerEdge R710', SN: 'XXXXXX', 16 GB ram (8 dimms), 1 logical drives, 6 physical drives
----- BLACKLISTED: ctrl_fw=0
黑名單功能中可以使用的參數(shù)表
Component | Comment |
ctrl | Controller |
ctrl_fw | Suppress the "special" warning message about old controller firmware. Use this if you can't or won't upgrade thefirmware. |
ctrl_driver | Suppress the "special" warning message about old controller driver. Particularly useful on systems where you can't upgrade the driver. |
ctrl_stdr | Suppress the "special" warning message about old Windows storport driver. |
pdisk | Physical disk. |
pdisk_cert | Ignore warnings for non-certified physical drives未配置的磁盤 |
pdisk_foreign | Ignore warnings for foreign physical drives外部磁盤例如:pdisk_foreign=1:0:5 |
vdisk | Logical drive (virtual disk) |
bat | Controller cache battery |
bat_charge | Ignore warnings related to the controller cache battery charging cycle, which happens approximately every 40 days on Dell servers. Note that using this blacklist keyword makes check_openmanage ignore non-critical cache battery errors. |
conn | Connector (channel) |
encl | Enclosure |
encl_fan | Enclosure fan |
encl_ps | Enclosure power supply |
encl_temp | Enclosure temperature probe |
encl_emm | Enclosure management module (EMM) |
dimm | Memory module |
fan | Fan (Cooling device) |
ps | Powersupply |
temp | Temperature sensor |
cpu | Processor (CPU) |
volt | Voltage probe |
bp | System battery |
amp | Amperage probe (power consumption monitoring) |
intr | Intrusion sensor |
sd | Removable flash media (SD card) |
#個(gè)性化輸出信息
參數(shù) --postmsg
$ check_openmanage --postmsg 'NOTE: Service tag: %s - Dell support: 555-1234-5678'
Power Supply 0 [AC]: Presence Detected, Failure Detected, AC Lost
Controller 0 [PERC 6/i Integrated]: Driver '00.00.03.15-RH1' is out of date
NOTE: Service tag: JV8KH0J - Dell support: 555-1234-5678
參數(shù)表:
Code | Replaced with |
%m | System model |
%s | Service tag |
%b | BIOS version |
%d | BIOS release date |
%o | Operating system name |
%r | Operating system release |
%p | Number of physical drives |
%l | Number of logical drives |
%n | Line break |
%% | A literal % |
可以使用-d或者--debug來顯示所有檢查項(xiàng)目:
[root@localhost etc]# /usr/lib/nagios/plugins/check_openmanage -H 1.2.3.4 -d
System: PowerEdge R710 OMSA version: 7.2.0
ServiceTag: XXXXXX Plugin version: 3.7.9
BIOS/date: 1.0.4 03/09/2009 Checking mode: SNMPv2c UDP/IPv4
-----------------------------------------------------------------------------
Storage Components
=============================================================================
STATE | ID | MESSAGE TEXT
---------+----------+--------------------------------------------------------
WARNING | 0 | Controller 0 [PERC 6/i Integrated]: Firmware '6.1.1-0047' is out of date
OK | 0 | Controller 0 [PERC 6/i Integrated] is Degraded
OK | 0:0:0:0 | Physical Disk 0:0:0 [SAS-HDD 146GB] on ctrl 0 is Online
OK | 0:0:0:1 | Physical Disk 0:0:1 [SAS-HDD 146GB] on ctrl 0 is Online
OK | 0:0:0:2 | Physical Disk 0:0:2 [SAS-HDD 146GB] on ctrl 0 is Online
OK | 0:0:0:3 | Physical Disk 0:0:3 [SAS-HDD 146GB] on ctrl 0 is Online
OK | 0:1:0:4 | Physical Disk 1:0:4 [SAS-HDD 146GB] on ctrl 0 is Online
OK | 0:1:0:5 | Physical Disk 1:0:5 [SAS-HDD 146GB] on ctrl 0 is Ready (Dedicated HS)
OK | 0:0 | Logical Drive '/dev/sda' [RAID-5, 544.50 GB] is Ready
OK | 0:0 | Cache Battery 0 in controller 0 is Ready
OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready
OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready
OK | 0:0:0 | Enclosure 0:0:0 [Backplane] on controller 0 is Ready
OK | 0:1:0 | Enclosure 0:1:0 [Backplane] on controller 0 is Ready
-----------------------------------------------------------------------------
Chassis Components
=============================================================================
STATE | ID | MESSAGE TEXT
---------+------+------------------------------------------------------------
OK | 0 | Memory module 0 [DIMM_A2, 2048 MB] is Ok
OK | 1 | Memory module 1 [DIMM_A3, 2048 MB] is Ok
OK | 2 | Memory module 2 [DIMM_A5, 2048 MB] is Ok
OK | 3 | Memory module 3 [DIMM_A6, 2048 MB] is Ok
OK | 4 | Memory module 4 [DIMM_B2, 2048 MB] is Ok
OK | 5 | Memory module 5 [DIMM_B3, 2048 MB] is Ok
OK | 6 | Memory module 6 [DIMM_B5, 2048 MB] is Ok
OK | 7 | Memory module 7 [DIMM_B6, 2048 MB] is Ok
OK | 0 | Chassis fan 0 [System Board FAN 1 RPM] reading: 3960 RPM
OK | 1 | Chassis fan 1 [System Board FAN 2 RPM] reading: 3960 RPM
OK | 2 | Chassis fan 2 [System Board FAN 3 RPM] reading: 3960 RPM
OK | 3 | Chassis fan 3 [System Board FAN 4 RPM] reading: 3960 RPM
OK | 4 | Chassis fan 4 [System Board FAN 5 RPM] reading: 3840 RPM
OK | 0 | Power Supply 0 [AC]: Presence detected
OK | 1 | Power Supply 1 [AC]: Presence detected
OK | 0 | Temperature Probe 0 [System Board Ambient Temp] reads 27 C (min=8/3, max=42/47)
OK | 0 | Processor 0 [Intel Xeon E5506 2.13GHz] is Present
OK | 1 | Processor 1 [Intel Xeon E5506 2.13GHz] is Present
OK | 0 | Voltage sensor 0 [CPU1 VCORE] is Good
OK | 1 | Voltage sensor 1 [CPU2 VCORE] is Good
OK | 2 | Voltage sensor 2 [CPU2 0.75 VTT CPU2 PG] is Good
OK | 3 | Voltage sensor 3 [CPU1 0.75 VTT CPU1 PG] is Good
OK | 4 | Voltage sensor 4 [System Board 1.5V PG] is Good
OK | 5 | Voltage sensor 5 [System Board 1.8V PG] is Good
OK | 6 | Voltage sensor 6 [System Board 3.3V PG] is Good
OK | 7 | Voltage sensor 7 [System Board 5V PG] is Good
OK | 8 | Voltage sensor 8 [CPU2 MEM PG] is Good
OK | 9 | Voltage sensor 9 [CPU1 MEM PG] is Good
OK | 10 | Voltage sensor 10 [CPU2 VTT ] is Good
OK | 11 | Voltage sensor 11 [CPU1 VTT ] is Good
OK | 12 | Voltage sensor 12 [System Board 0.9V PG] is Good
OK | 13 | Voltage sensor 13 [CPU2 1.8 PLL PG] is Good
OK | 14 | Voltage sensor 14 [CPU1 1.8 PLL PG] is Good
OK | 15 | Voltage sensor 15 [System Board 8.0 V PG] is Good
OK | 16 | Voltage sensor 16 [System Board 1.1 V PG] is Good
OK | 17 | Voltage sensor 17 [System Board 1.0 LOM PG] is Good
OK | 18 | Voltage sensor 18 [System Board 1.0 AUX PG] is Good
OK | 19 | Voltage sensor 19 [System Board 1.05 V PG] is Good
OK | 0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected
OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached)
-----------------------------------------------------------------------------
Other messages
=============================================================================
STATE | MESSAGE TEXT
---------+-------------------------------------------------------------------
OK | ESM log health is Ok (less than 80% full)
OK | Chassis Service Tag is sane
#使用檢查配置文件進(jìn)行個(gè)性化項(xiàng)目檢查。使用參數(shù)-f
check_openmanage -f /etc/check_openmanage.conf
check_openmange 安裝與使用
客戶端 1,下載Openmange的版本軟件:
cd /opt/ wget http:/support.dell.com (這里是網(wǎng)上的地址) mon02-001 /opt/DELL/dell 下面有 OM_6.1.0_ManNode_A00.tar 把這個(gè)下載下來
tar zxvf omsa...*.tgz
sh ./setup.sh
有三次選擇,
輸入y, 表示接受協(xié)議,
輸入6,表示選擇全部組件,
輸入i, 表示安裝所選擇
安裝時(shí)提示安裝的路徑,選擇默認(rèn)路徑的就行(/opt/dell/srvadmin/) 建議自己定義下目錄位置 /usr/local/openmanage
以下是我安裝的時(shí)候出現(xiàn)的錯(cuò)誤(僅供參考)出錯(cuò)有:
1. libstdc++.so.5 找不到
安裝:compat-libstdc 相關(guān)版本的軟件就好
2.libcurl.so.3 找不到
安裝curl 就OK 了
我們現(xiàn)在做的是用client 端和服務(wù)端都在一起
wget http://folk.uio.no/trondham/software/files/check_openmanage-3.6.5.tar.gz (mon02-001 /opt/DELL 有這個(gè)包)
tar zxvf check_openmanage-3.6.5.tar.gz
cp /tar包/check_openmanage 這個(gè)Perl 腳本 放到/usr/local/nagios/libexec 里面
client 端 :定義 nrpe.cfg
vi /usr/local/nagios/etc/nrpe.cfg
add 增加一行
command[check_dell_hardware]=/usr/local/nagios/libexec/check_openmanage -e --only critical
保存下。
運(yùn)行 /usr/local/nagios/libexec/check_openmanage -e --only critical 看是否有返回值。如果返回都OK ,客戶端設(shè)置完畢。
以下設(shè)置服務(wù)端:
server 端里定義service:
define service {
use saa-service
host_name localhost
service_description check_hardware
check_command check_nrpe!check_dell_hardware
}
其中的localhost 根據(jù)監(jiān)控的機(jī)器變動(dòng)主機(jī)名。
檢測(cè)監(jiān)控是否成功:
服務(wù)端/usr/local/nagios/libexec/check_nrpe -H hostIP -c check_dell_hardware
如果有問題檢測(cè) NRPE 是否正常。
下面的用SNMP 安裝服務(wù)端
服務(wù)端
安裝: 1,安裝相關(guān)Perl-snmp軟件包
perl-Crypt-DES-2.05-3.2.el5.rf.i386.rpm
perl-Digest-HMAC-1.01-2.2.el5.rf.noarch.rpm
perl-Digest-SHA1-2.12-2.el5.rf.i386.rpm
perl-Net-SNMP-5.2.0-1.2.el5.rf.noarch.rpm
perl-Socket6-0.23-1.el5.rf.i386.rpm
安裝順序安裝其他包,最后安裝perl-Net-SNMP-5.2.0-1.2.el5.rf.noarch.rpm
下載check_openmanage 插件 (http://folk.uio.no/trondham/software/check_openmanage.html#download) 根據(jù)系統(tǒng)的不同,下載不同的軟件。
wget http://folk.uio.no/trondham/software/files/check_openmanage-3.6.5.tar.gz
wget http://folk.uio.no/trondham/software/files/nagios-plugins-check-openmanage-3.6.5-1.el5.x86_64.rpm
上面是簡(jiǎn)單的安裝,有些地方是直接復(fù)制別人的。 安裝沒什么花頭的,所以看下應(yīng)該都能會(huì)。
下面的是使用,一些參數(shù)的剪輯。
check_openmanage -s 顯示詳細(xì)的服務(wù)狀態(tài)報(bào)警 check_openmanage -S 顯示簡(jiǎn)短的服務(wù)狀態(tài)報(bào)警 (也就是critcal 簡(jiǎn)寫成C)
check_openmanage -i 以服務(wù)編號(hào)為前綴的服務(wù)狀態(tài)報(bào)警
例:[JV8KH0J] Controller 0 [PERC 6/i Integrated]: Driver '00.00.03.15-RH1' is out of date
check_openmanage -e 顯示機(jī)器的類型和報(bào)警信息(以單線為區(qū)分號(hào) 顯示機(jī)器的系統(tǒng) 機(jī)型 服務(wù)號(hào) )
例:Power Supply 0 [AC]: Presence Detected, Failure Detected, AC Lost Controller 0 [PERC 6/i Integrated]: Driver '00.00.03.15-RH1' is out of date
------ SYSTEM: PowerEdge 1950, SN: JV8KH0J
check_openmanage --postmsg 'NOTE: Service tag: %s - Dell support: 800-8888-8888' 根據(jù)參數(shù) --postmsg 可以自定義提示信息。
Power Supply 0 [AC]: Presence Detected, Failure Detected, AC Lost Controller 0 [PERC 6/i Integrated]: Driver '00.00.03.15-RH1' is out of date NOTE: Service tag: JV8KH0J - Dell support: 800-8888-8888
其中 %s 是系統(tǒng)內(nèi)部的變量調(diào)用,以下是所有的內(nèi)部變量
%m System model 機(jī)器型號(hào)
%s Service tag 服務(wù)編號(hào)
%b BIOS version bios 版本
%d BIOS release date Bios 發(fā)布日期
%o Operating system name 系統(tǒng)名稱
%r Operating system release 操作系統(tǒng)的版本
%p Number of physical drives 物理驅(qū)動(dòng)器數(shù)
%l Number of logical drives 邏輯驅(qū)動(dòng)器數(shù)
%n Line break 換行符
%% A literal % 一個(gè)文字%
以上報(bào)警信息 可以多參數(shù)一起使用。例如: check_openmanage -i -s
check_openmanage -o 默認(rèn)情況下,輸出的OK 信息為一行,我們可以控制的,可以輸入check_openmanage -o 3 顯示3行,并且顯示一些硬件的底層。
check_openmanage -H localhost -b ctrl_driver=all -b pdisk=1:0:0:1 -B Openmanage 可以控制黑名單,通俗的說也就是無關(guān)涇要的監(jiān)控,使用參數(shù) -b 可以添加不要監(jiān)控的項(xiàng),但是等黑名單多了的時(shí)候,我們就無法知道到底什么被去掉了,這個(gè)時(shí)候 在后面加個(gè) -B =(show-blacklist) 顯示被黑的名單。。
check_openmanage -d 顯示軟件運(yùn)行后的debug信息。 (這個(gè)是我們?nèi)斯な謩?dòng)調(diào)試的時(shí)候用的,在nagios 里面不要使用這個(gè)選項(xiàng))
自定義溫度閥值
omreport 這是裝好openmanage 的自檢程序
omreport chassis temps 顯示機(jī)器的溫度
check_openmanage -H myhost --only temp -d 這是check_openmanage 的調(diào)試 ,顯示機(jī)器的溫度,我們可以定義閥值報(bào)警的。
check_openmanage -w 0=30 -c 0=40 更改溫度報(bào)警閥值
check_openmanage -w 0=30/15 -c 0=40/10 這個(gè)表示15分鐘 如果溫度大于30,warning,10分鐘大于40,critcal. 這個(gè)可以自己更具需要更改時(shí)間寫
添加黑名單
當(dāng)一些不重要的信息我們不想看到的時(shí)候,我們可以根據(jù) -b 來調(diào)試。
例如:
check_openmanage -s -b ctrl_driver=0,1 不檢測(cè) Controller 的驅(qū)動(dòng)問題。 如果所有的Controller驅(qū)動(dòng)都不需要監(jiān)控 可以使用ctrl_driver=all
以下是設(shè)備的代號(hào)(縮寫):
==- 利用--check 來檢測(cè)單個(gè)項(xiàng)目 0表示關(guān)閉,1表示開啟
check_openmanage --check storage=0,esmlog=1 關(guān)閉檢測(cè)存儲(chǔ),查看esmlog 信息
我們也可以定義一個(gè)文件,然后用--check 來執(zhí)行文件里面定義的check 項(xiàng)目(方便我們每次的重復(fù)操作) vi /tmp/check_openmanage.check storage=0,esmlog=1
check_openmanage --check /tmp/check_openmanage.check
==- 利用--only 來監(jiān)控指定項(xiàng)目
check_openmanage --only storage 只檢查 存儲(chǔ),其他的任何的都不監(jiān)控
以下是Only 的一些參數(shù)
== 如果想check 所有, check_openmanage -a 就check 所有了。
最后就是結(jié)合 PNP4Nagios 用圖片顯示信息。
注:
本人在裝的時(shí)候發(fā)現(xiàn)一個(gè)比較嚴(yán)重的問題:
Openmange 這個(gè)軟件不要重復(fù)的在服務(wù)器上卸載,安裝,這樣的話會(huì)導(dǎo)致多出很多進(jìn)程,每裝一次 他們會(huì)生成3個(gè)為一組的進(jìn)程例如以下:
root 30672 0.0 0.0 21688 1056 ? S Jun08 0:00 \_ hald-runner
68 30680 0.0 0.0 12320 848 ? S Jun08 0:00 \_ hald-addon-acpi: listening on acpid socket /var/run/acpid.socket
68 30693 0.0 0.0 12320 844 ? S Jun08 0:00 \_ hald-addon-keyboard: listening on /dev/input/event0
而且你卸載這個(gè)軟件過后,這個(gè)進(jìn)程是不會(huì)Kill 掉的,只有人工手動(dòng)kill, 還有這個(gè)進(jìn)程多了很多以后占用CPU 資源很多, 每十分鐘CPU LOAD 有個(gè)波動(dòng)。 我們公司的app 服務(wù)器就是因?yàn)檠b了這個(gè) load 會(huì)每十分鐘波動(dòng)一次,從1波動(dòng)到20, 然后馬上下降,所以大家一定要注意,不要在生產(chǎn)環(huán)境中重復(fù)的安裝和卸載。
其他的方面沒什么問題,軟件還是蠻好用的,可以結(jié)合nagios 和 zabbix 實(shí)現(xiàn)硬件監(jiān)控。
DELL openmanage and nagios on ubuntu 10.04PDFPrintE-mail
Linux
Written by Michael
Thursday, 28 July 2011 19:51
The dell openmanage tools are quite good for monitoring Dell servers. Although it slows down boot time (which shouldn't happen often with servers anyway), it provides some great ways to monitor your server.
Install Dell OMSA
add the repositories to a new file /etc/apt/sources.list.d/linux.dell.com.sources.list with the following content:
deb http://linux.dell.com/repo/community/deb/latest /
apt-get update
gpg --keyserver pgpkeys.mit.edu --recv-key E74433E25E3D7775
gpg -a --export E74433E25E3D7775 | apt-key add -
apt-get install srvadmin-all
Install the nagios check_openmanage plugin
Download the latest check_openmanage package from http://folk.uio.no/trondham/software/check_openmanage.html#download
wget http://folk.uio.no/trondham/software/files/check-openmanage_3.6.8-1_all.deb
Install the openManage package:
dpkg -i check-openmanage_3.6.6-1_all.deb
Check theoutput of:
/usr/lib/nagios/plugins/check_openmanage
If you get the following output:
Storage Error! No controllers found
Problem running 'omreport chassis memory': Error: Memory object not found
Problem running 'omreport chassis fans': Error! No fan probes found on this system.
Problem running 'omreport chassis temps': Error! No temperature probes found on this system.
Problem running 'omreport chassis volts': Error! No voltage probes found on this system.
Do:
/etc/init.d/dataeng restart
Rerun it, and blacklist warnings like 'Not certified drives' and controller firmware out of date like(or resolve them by swapping to certified disks and upgrade the raidcontroller firmware):
/usr/lib/nagios/plugins/check_openmanage -b ctrl_fw=0/pdisk=0:0:0:0,0:0:0:1
If you run it, it should show something like:
/usr/lib/nagios/plugins/check_openmanage -b ctrl_fw=0/pdisk=0:0:0:0,0:0:0:1
OK - System: 'PowerEdge R310', SN: 'somenumber', 4 GB ram (2 dimms), 1 logical drives, 2 physical drives
Only uncertified hard drives should be blacklisted, certified disks do not have to be blacklisted.
Make sure that dataeng starts at boot
update-rc.d dataeng defaults
Edit:
/etc/nagios/nrpe_local.cfg
And add the command without warnings to it, like:
command[check_openmanage]=/usr/lib/nagios/plugins/check_openmanage -b ctrl_fw=0/pdisk=0:0:0:0,0:0:0:1
Restart the service:
/etc/init.d/nagios-nrpe-server restart
Add the host to the nagios configuration on your Nagios server.
Optionally, you can start the openmanage built in webserver with
omconfig system webserver action=start
The webserver is running on port 1311 https by default. You can login with the root account or other local accounts of the linux system.
感謝你能夠認(rèn)真閱讀完這篇文章,希望小編分享的“nagios中check_openmanage插件學(xué)怎么用”這篇文章對(duì)大家有幫助,同時(shí)也希望大家多多支持創(chuàng)新互聯(lián),關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道,更多相關(guān)知識(shí)等著你來學(xué)習(xí)!