During a Nagios monitoring implementation, we often need to depend NRPE plugins and custom commands to execute server monitoring tasks such as load monitoring, disk usage monitoring etc. on remote servers. While majority of the disk checks can be performed through simple tweaking of the existing commands, Raid disk health evaluation demands some advanced level of operations due to the architecture and raid controller differences with each RAID setup.
This article is to highlight the steps to be followed to add raid check for servers using the `MegaCli` utility. I assume that you already configured a Nagios server for server monitoring using NRPE plugin and are familiar with its working. Here we are focusing our discussion only on the configuration of RAID check.
Before delving into how to add the check, lets first look at what MegaCli is. MegaCLI is a command line interface (CLI) binary used to communicate with the full LSI family of raid controllers.
For a complete reference either call MegaCli -h or refer to the manual at: http://www.cisco.com/c/dam/en/us/td/docs/unified_computing/ucs/3rd-party/lsi/mrsas/userguide/LSI_MR_SAS_SW_UG.pdf
Now let us move to the step by step instructions to enable RAID Monitoring on Nagios.
Step 1:
Login to the client server or the server to be monitored as root user
Step 2:Â
Before moving forward, verify the path to MegaCli. You can do that by issuing the command
root@server:~# which MegaCli /sbin/MegaCli
As you already knew, the binary paths can vary according to the installations. If for some reason the path to the binary is different like /usr/sbin/MegaCli etc, then modify the script and commands below by replacing all instances of /sbin/MegaCli with the correct path to binary.
The below instructions are to be read only if megacli is not found. Otherwise, skip to Step 3
For Centos Machines, you may get an error like below.
[root@server ~]#MegaCli MegaCli: command not found [root@server ~]#which MegaCli /usr/bin/which: no MegaCli in (/usr/local/ /sbin/usr/local/bin:/usr/sbin:/usr/bin:/opt/cpanel/composer/bin:/root/bin")
This doesn’t necessarily mean that MegaCli is absent. The path and name to access the utility might be different.  In CentOs machines, the binary is installed at /opt/MegaRAID/MegaCli/MegaCli64
Try executing the below command
# /opt/MegaRAID/MegaCli/MegaCli64 -v MegaCLI SAS RAID Management Tool Ver 8.07.14 Dec 16, 2013 (c)Copyright 2013, LSI Corporation, All Rights Reserved. Exit Code: 0x00
If you see the output as above, it means the binary is present. The reason the command does not show up without full path is because the path to the binary is not included in the users PATH variable.
PATH is an environmental variable in Linux and other Unix-like operating systems that tells the shell which directories to search for executable files (i.e., ready-to-run programs) in response to commands issued by a user.
If this is the case, do the step below.
For easy access, lets create an alias for the command with name megacli and add it to .bashrc to make the change permanent.
Execute the commands below.
echo alias MegaCli=\"/opt/MegaRAID/MegaCli/MegaCli64\" >> /root/.bashrc source /root/.bashrc
The bash built-in command “source” executes the content of the file /root/.bashrc and loads the variables to the current shell. So you can continue with your current session.
Now verify the binary
#MegaCli -v
If you see the version details, proceed to the next step
Step 3:
Create a new file check_raid at /usr/local/nagios/libexec Add the following code to the file check_raid
#!/bin/bash if /sbin/MegaCli -PDList -aAll | grep -i failed &> /dev/null then EXIT=2 STATUS="CRITICAL: RAID failure detected!" elif ! /sbin/MegaCli -PDList -aAll | grep "Count: " | grep -v ": 0" &> /dev/null then EXIT=0 STATUS="OK: RAID looks running fine" else EXIT=1 STATUS="WARNING: RAID errors detected!" fi echo "$STATUS" exit $EXIT |
Do change the binary location in accordance with your installation and OS. For eg. in case of a CentOS server, replace /sbin/MegaCli as /opt/MegaRAID/MegaCli/MegaCli64 in the above script as it is the correct path to the Binary in Centos distributions.
Give the script execute permission by issuing
chmod +x /usr/local/nagios/libexec/check_raid
Step 4:
Now we have to assign a command for this task to /usr/local/nagios/etc/nrpe.cfg
To do this, add the following line to the end of file /usr/local/nagios/etc/nrpe.cfg
command[check_raid]=/usr/local/nagios/libexec/check_raid
If you are not comfortable with direct editing of configuration files, you can perform it using the following commands
echo 'command[check_raid]=/usr/local/nagios/libexec/check_raid' >> /usr/local/nagios/etc/nrpe.cfg
This is because, when we communicate from the nagios server, we will be calling up this command from the server which we are monitoring. While this happens, the client server executes the associated command and returns the output.
Step 5:
Now test if the script is running correctly by the following command.
root@server:~# /usr/local/nagios/libexec/check_raid OK: RAID looks running fine
Step 6:
Now open the file /etc/sudoers and add the following lines to the bottom of the file:
a) If the system is running Debian
nagios ALL=NOPASSWD:/sbin/MegaCli nagios ALL=NOPASSWD:/bin/bash
Editing the configurations files are always a risky shot. So the best way for this operation using the editor visudo .
Similarly you can execute the below command to get the same result as well
echo -e 'nagios ALL=NOPASSWD:/sbin/MegaCli\nnagios ALL=NOPASSWD:/bin/bash' >> /etc/sudoers
b) If it is a CentOS server, add the following code
nagios ALL=NOPASSWD:/opt/MegaRAID/MegaCli/MegaCli64 nagios ALL=NOPASSWD:/bin/bash
or can use the following command
echo -e 'nagios ALL=NOPASSWD:/opt/MegaRAID/MegaCli/MegaCli64\nnagios ALL=NOPASSWD:/bin/bash' >> /etc/sudoers
 Also, if a line ‘Defaults requiretty‘ is present in /etc/sudoers, you must comment out the “Defaults requiretty” line as follows:
# Defaults requiretty
EasyWay: execute the below command
sed -ri 's/^Defaults requiretty/#Defaults requiretty/g' >> /etc/sudoers
As you know, nrpe checks the commands as user nagios. The check we did above returned the output as RAID OK because the command was executed as root user.
When we check at the client server, the query returns output but when checked from monitoring server, it will return error like ‘NRPE: Unable to read output‘. This is because we overlooked what user the command is executed as and if they have privilege to issue the command. The above lines allow the user nagios access to the commands /bin/bash and MegaCli. This is required because the nagios user is created with shell /sbin/nologin and MegaCli by default is a command which only root user has access to.
Step 7:
At this point, we have created a script to check Raid Status, we have configured a command in nrpe referencing it and have allowed the permissions required for the user nagios to execute the script. Now restart nrpe issuing the following command.
root@server:~# /etc/init.d/nrpe restart Restarting nagios remote plugin daemon: nrpe.
Step 8:
Now login to monitoring server and issue the following command to check if its working
[root@monitor ~]# /usr/local/nagios/libexec/check_nrpe -H aaa.bbb.ccc.ddd -c check_raid OK: RAID looks running fine [root@monitor ~]#
Be sure to replace the IP aaa.bbb.ccc.ddd with the client IP.
Step 9:
If the results are fine, then move ahead and add the check to the configuration file of the script. In our servers, locate the cfg file of the server under  /usr/local/nagios/etc/objects/clients/ and add the following entries.
define service{ use fiveminutes host_name *enter server hostname here* service_description Raid_Check contact_groups *enter contact group here* check_command check_nrpe!check_raid } |
Be sure to replace the hostname and contact group if you are pasting the above snippet. You can also open the .cfg file of the client server and copy one of the service checks once again and just modify the service_description and check_command as above. The rest of the fields will be the same for all service checks within a cfg file. |
Step 10
Now restart nagios server for the changes made to reflect.
[root@monitor ~]# /etc/init.d/nagios restart Running configuration check...done. Stopping nagios: .done. Starting nagios: done. [root@monitor ~]#
Now logon to Nagios Web Interface and verify that the check is reflecting correctly there ð