iostat (input/output statistics) a utility that reports Central Processing Unit (CPU) statistics and input/output statistics for devices and partitions.It can be used for the linux disk health check and it is quite handy to identify which partition is being heavily used and if any HW issues exist. The details regarding the various parameters will make this article quite confusing. So I am focusing on the values which need to be monitored during a suspected server issue.
A sample IOSTAT output will look like this
[~]# iostat -x
Linux 2.6.18-408.el5.lve0.8.58 server1.ssages.com Monday 04 June 2012
avg-cpu: %user %nice %system %iowait %steal %idle
28.01 0.33 4.18 7.94 0.00 59.54
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 234.36 463.39 420.55 217.85 8883.26 5451.91 22.45 1.70 2.66 1.12 71.63
sda1 0.02 0.00 0.00 0.00 0.03 0.00 41.66 0.00 6.57 5.06 0.00
sda2 19.75 65.49 223.08 79.80 3372.78 1163.38 14.98 1.82 6.02 1.70 51.63
sda3 6.77 18.37 5.94 13.44 251.64 254.52 26.11 0.58 29.76 2.30 4.45
sda8 206.75 170.56 190.00 78.20 5232.34 1990.57 26.93 1.98 7.39 2.12 56.99
An explanation for all parameters here will lead to ambiguity. Let us focus on the core values
CPU Statistics
As you can see from the result the avg-cpu part explains the IO activities for the CPU. You can get the CPU performance alone using the following command iostat -c .
So we are focusing on the most crucial one is %iowait and %idle in avg-cpu performance output .
A high iowait on this section indicates that CPU is incapable of handling all incoming requests and hence the requests are held in queue indicating a degraded performance.
The %idle parameter indicates the CPU idle time. A high value here indicates that CPU is not busy. On the above example, we can assume that 7.94 % of total IO requests are in the queue while 59.54% of CPU clocks are idle.
Disk statistics
The second portion of the output explains the IO activities for various disks attached to the server.This can be used in Linux disk health check. The disk details alone can be obtained by using the switch “-d “
#iostat -xd
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 22.72 138.23 24.66 12.88 2227.01 1208.87 91.53 0.19 5.03 3.08 11.55
sda1 0.01 0.00 0.00 0.00 0.01 0.00 10.58 0.00 7.10 6.49 0.00
sda2 0.00 0.00 0.00 0.00 0.01 0.00 47.20 0.00 3.98 3.68 0.00
sda3 0.01 0.13 0.00 0.17 0.06 2.36 14.17 15.02 0.77 5857.96 99.98
sda4 0.00 0.00 0.00 0.00 0.00 0.00 2.00 1.00 0.00 89353908.00 99.99
sda5 22.71 138.10 24.66 12.71 2226.93 1206.51 91.88 0.19 5.05 3.09 11.55
The most crucial parameters on this result are svctm %util . Let us see what they mean and its importance.
svctm
The number of milliseconds spent servicing requests, from beginning to end, including queue time and the time the device actually takes to fulfil the request.
%util
This really shows the device utilization, as the name implies, because when the value approaches 100%, the device is saturated.
As you can see from the above example, for sda3 and sda4 svctm and util stays at that high range of level indicating high activities on the mentioned partitions. Now let us see which partition is mounted on these particular HDD.
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda5 3.9G 1008M 2.7G 27% /
/dev/sda8 948G 576G 323G 65% /home
/dev/sda6 2.0G 546M 1.4G 30% /tmp
/dev/sda3 30G 18G 10G 64% /usr
/dev/sda4 97G 57G 36G 62% /var
/dev/sda1 198M 25M 164M 13% /boot
tmpfs 12G 12K 12G 1% /dev/shm
Here the active partitions are /var and /usr. Now check the processes which actively uses, these partitions. In this case, it was mysql abuse and the data directory was configured as /var. Stopping the attack restored normalcy for the IO activities.
[~]# iostat -xm 2
avg-cpu: %user %nice %system %iowait %steal %idle
48.56 0.28 6.06 1.20 0.00 43.91
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 57.46 281.08 61.40 173.13 1898.67 3689.77 23.83 0.36 4.28 0.88 20.74
sda1 0.01 0.00 0.00 0.00 0.03 0.00 24.49 0.00 3.98 3.89 0.00
sda2 6.33 62.28 28.13 80.88 425.01 1145.95 14.41 0.19 7.64 1.06 11.59
sda3 3.79 10.71 2.54 9.77 122.00 163.85 23.21 0.13 10.45 0.87 1.07
sda4 0.00 0.00 0.00 0.00 0.00 0.00 2.00 0.00 9.19 9.19 0.00
sda5 0.32 1.29 0.11 0.58 4.12 15.01 27.51 0.00 4.65 2.41 0.17
sda6 0.36 94.54 0.16 30.63 4.92 1001.49 32.68 0.37 11.93 0.36 1.12
sda7 0.00 0.00 0.00 0.00 0.02 0.02 48.97 0.00 4.94 4.28 0.00
sda8 46.66 112.26 30.44 51.27 1342.56 1363.44 33.12 0.31 3.81 1.52 12.41
For continues Linux disk health check, specify the time in second as an argument. The following example give you an output at 2 seconds interval
[otw_shortcode_info_box border_type=”bordered” border_color_class=”otw-aqua-border” border_style=”bordered”]iostat -xdm 2[/otw_shortcode_info_box]