You are on page 1of 9

ZFS Monitoring and Cacti

Not that we have been using ZFS (using OI + Napp-It) on a regular basis, we need to be able to produce at least some usage
history. Having used Cacti in the past, it seemed like a quick and easy option.
NET-SNMP was installed by default on Open Indiana, a quick edit of the snmpd.conf and an enable did the trick:
vi /etc/net-snmp/snmp/snmpd.conf
svcadm enable net-snmp

If you make any changes, use the restart command:


svcadm restart net-snmp

Last week we setup a Cacti monitoring server and basic SNMP on our Open Indiana test machine. Some basic Linux statistics
are provided out of box, such as interface traffic monitoring, which is nice but does not offer a lot of helpful information about
the ZFS storage pools we want to monitor.
I began my search looking for a decent monitoring solutions for ZFS via SNMP. In my searching, sadly I found that there was
very little out there. The best resource I found was actually on a French site, http://www.hypervisor.fr/?p=3828
The post outlines a few simple steps for getting started with a very basic storage space usage graph on any zpools in the
system using the zfs zpool list commands. I modified it slightly putting the commands in scripts so I wouldnt have to
restart snmp every time I made a change. From here on out, I store my scripts in /opt/utils/
/opt/utils/zpools_name.sh:
zpool list -H -o name
/opt/utils/zpools_capacity.sh:
zpool list -H -o capacity | sed -e 's/%//g'
The first script simply returns the name of each individual zpool (one per line). The second script returns the amount of
spaced used as a percentage of each individual zpool (one per line) and then strips the % from the string result.
So far pretty easy. Now to add the two commands to our /etc/net-snmp/snmp/snmpd.conf file:

########################################################################################
# SNMP : zpool capacity
########################################################################################

extend .1.3.6.1.4.1.2021.87 zpool_name /usr/gnu/bin/sh /opt/utils/zpools_name.sh


extend .1.3.6.1.4.1.2021.87 zpool_capacity /usr/gnu/bin/sh /opt/utils/zpools_capacity.sh
The base OID .1.3.6.1.4.1.2021 is basically an arbitrary OID thats registred to UC DAVIS, I only think the sub-ID .9 is used
for some disk checks, so using .87 and appending the name converted to ASCII guarentees us a fairly unique OID.
At this point we had to copy/create a zpools_capacity.xml file in the resource/snmp_queries/ folder within Cacti:
<interface>
<name>Get ZFS zpool capacity</name>
<index_order_type>numeric</index_order_type>
<oid_index>.1.3.6.1.4.1.2021.87.3</oid_index>
<oid_num_indexes>.1.3.6.1.4.1.2021.87.3</oid_num_indexes>

<fields>
<ZpoolName>
<name>Name</name>
<method>walk</method>
<source>value</source>
<direction>input</direction>
<oid>.1.3.6.1.4.1.2021.87.4.1.2.10.122.112.111.111.108.95.110.97.109.101
</ZpoolName>
<ZpoolCapacity>
<name>Capacity</name>
<method>walk</method>
<source>value</source>
<direction>output</direction>
<oid>.1.3.6.1.4.1.2021.87.4.1.2.14.122.112.111.111.108.95.99.97.112.97.9
</ZpoolCapacity>
</fields>
</interface>
At this point you can build your own Data Query and Graph Template or the post actually provided each as well. We went
ahead and used the provided templates to give it a shot.
This worked fairly well as a base test, and gives us an easy monitoring of capacity usage (while still not really all that useful).

After seeing a basic example of creating a custom OID for the zpool capacity, I really wanted to extend some additional
statistics that are available using the zpool iostat command.
In its simplest form, you can use the following command to see two sets of statistics on a zpool:
# zpool iostat zpool_name 5 2
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
zpool_name 30.7G 247G 9 11 1.14M 1.32M
zpool_name 30.7G 247G 0 0 0 0
The command is very basic, run iostats on a storage pool named zpool_name. Run the statistics with a 5 second interval,
show results for two iterations.
As a quick run down, (ignoring the first three lines that are simply headings) we have two lines of actual information. The
first line is an average since the system was powered on, and the second is the statistics from the next 5 second interval
since the command was run.
Since we are polling for SNMP information on approximately a 5m interval, we are really only interested in the very last line.
We can use sed or a combination of grep and sed to isolate down to a single line, something like:
1 zpool iostate zpool_name 5 2 | grep zpool_name | sed -n 2p
Grep will return all lines with zpool_name in the line and then sed will only show the second line of those results.
Since this will take a minimum of 5 seconds to run to get the correct stats, the best option I can think of is to store our
results in a cache file that SNMP can read from, then have the script run as a cron job every few minutes. Your data will be
within a close enough result during every poll. Its not perfect, but its close enough.
At this point we can combine a little linux magic to script that we can add to the crontab that will do most of the work for us.
The script below will produce a /tmp/iostat.cache that is used for returning the SNMP data via NET-SNMP.
/opt/utils/iostats-create.sh:
#!/bin/bash
#
###############################################################
#
# Add the line below to crontab to create the iostat.cache
#
# * * * * * cd /tmp && /opt/utils/iostats-create.sh pool_name > io.tmp && mv io.tmp iostat.cache
#
###############################################################

let x=1
zpool iostat $1 5 2 | grep $1 | sed -n 2p |
while read line; do
for i in $line; do
if [ $x -eq 1 ]; then # if the first line item
echo $i # print storage pool name
else # else print the number in bytes
echo $i | sed -e
"s/K/*1024/g;s/M/*1024*1024/;s/G/*1024*1024*1024/;s/T/*1024*1024*1024*1024/" | bc | sed "s/[.].*//
fi
((x++))
done
done
Since the data were given is already converted to human readable values, such as 1.2K, were fairly limited on our accuracy,
but its better than nothing, and we can do our best to make it work.
Cacti needs all of the information in bytes to do its scaling (especially for some of the data like IOPS that will range from 0 to
~10K in our case).
Most of the scripts reads pretty well other than the sed command that converts any number from KB/MB/GB/etc to bytes by
substituting each letter with the appropriate math (ex: substitute K with *1024) and then piping that into bc to get the
result. I threw an extra sed on at the end to remove anything after the decimal place to keep it clean for Cacti.
The script contains the information required to add to the crontab. Once the system runs for a bit, you should eventually see
something like this in your /tmp/iostat.cache file:
# cat /tmp/iostat.cache
zpool_name
32963873996
265214230528
0
0
0
0
Since this is a test system, theres really nothing going on for me right now, so ignore the zeros.
For fun, I also started a C++ program that did the same thing as the script. While I was able to kick it out faster than the
time it took to learn how to do loops and ifs in bash as well as some nasty sed, the program is way uglier. However, it was
fun to do:
/**********************************************
* iostat.cpp
* --------------------------------------------
*
* Calls the zpool iostat command using:
*
* zpool iostat $1 5 2 | grep $1 | sed -n 2p
*
* --------------------------------------------
*
* Converts the output of zpool iostat from
* K/M/G/T to Bytes for use by net-snmp
*
*********************************************/

#include <iostream>
#include <iomanip>
#include <fstream>
#include <sstream>
#include <string>
using namespace std;

void print_iostats(string filename, string name, double iostats[]);


string exec(char* cmd);

int main(int argc, const char* argv[])


{
string pool_name, line, result, sub, cmd, filename;
double iostats[8];
int i=0;

if ( argc < 1 )
{
cout << endl;
cout << "ERROR: " << argv[0] << " requires input parmater." << endl;
cout << endl;
cout << "usage: " << argv[0] << " tank" << endl;
cout << " where \'tank\' is the name of a zfs pool." << endl;
cout << endl;
}
else
{
if( argv[2] )
{
filename = argv[2];
}
else
{
filename = "/tmp/io.cache";
cout << "No output filename provided." << endl;
cout << "Using: " << filename << endl;
}

pool_name = argv[1];

cmd = "zpool iostat " + pool_name + " 5 2 | grep " + pool_name + " | sed
-n 2p";
char * cmdd = (char*)cmd.c_str();
result = exec( cmdd );

result.erase(remove(result.begin(), result.end(), '\n'), result.end());

// set the pool name based on the given paramater


istringstream iss(result);

while (iss>>sub)
{
if(i>0) // skip the storage pool name
{
// get items 2 - 7 of data and store them into IOSTAT[0-
6]
// free space, used space, read ops, write ops, read
bandwidth, write bandwidth
iostats[i-1] = atof( sub.c_str() );
// convert to bytes (rough approximation based on
available information)
if(sub[sub.length()-1] == 'K') {iostats[i-1] = iostats[i-
1]*1024; }
if(sub[sub.length()-1] == 'M') {iostats[i-1] = iostats[i-
1]*1024*1024; }
if(sub[sub.length()-1] == 'G') {iostats[i-1] = iostats[i-
1]*1024*1024*1024; }
if(sub[sub.length()-1] == 'T') {iostats[i-1] = iostats[i-
1]*1024*1024*1024*1024; }
}
i++;
}
print_iostats(filename, pool_name,iostats);
}

return 0;
}

void print_iostats(string filename, string name, double iostats[])


{
ofstream iocache;
iocache.open ( filename.c_str() );
iocache << name << endl;
iocache << fixed << setprecision (0);
for(int i=0; i {
iocache << iostats[i] << endl;
}

iocache.close();
}

string exec(char* cmd)


{
FILE* pipe = popen(cmd, "r");
if (!pipe) return "ERROR";

char buffer[128];
string result = "";
while(!feof(pipe)) {
if(fgets(buffer, 128, pipe) != NULL)
result += buffer;
}
pclose(pipe);
return result;
}
Like I said, it was a quick and ugly hack-job of a solution. I take no pride in it, other than it makes that simple bash script
seem pretty nice.
Next, to take the data stored in iostat.cache and do something useful with it.

After fumbling around a bit with bash in Part 3, I got to a point where I could graph a single storage pool using the data that
was being updated in our /tmp/iostat.cache file. While this is great for small ZFS setups, what happens if you have more
than one pool?
Since I was creating this custom solution from scratch anyway, I figured I might as well make this a bit more flexible. I
scraped my previous script since it was very focused on a singular pool, and set out to get stats for all pools. I used the zpool
capacity done by NiTRo at hypervisor.fr as an example of how to collect stats for multiple pools and expanded my solution
from the other day.
I started by creating a basic script to create a new iostat.cache file called iostats_make_cache.sh:
#!/bin/sh
#
###############################################################
#
# Add the line below to crontab to create the iostat.cache
#
# * * * * * cd /opt/utils/iostats && /opt/utils/iostats/iostats_make_cache.sh
iostat.cache
#
###############################################################
#
# This script will produce the file iostat.cache that will be
# used by the various /opt/utils/isotat/* scripts for returning
# SNMP data via net-snmp.
#
###############################################################

FILENAME=$1 # Use file name from


parameter
#
zpool iostat 5 2 | sed '/-----/d' | sed '1,2d' > $1 # dump zpool stats of all pools
(and remove headings)
# from a 5 second interval to
filename $1
#
N=`wc -l < $1` # get number of lines
((N=$N/2)); # divide by 2
#
sed '1,'$N'd' -i $1 # delete first half of file
This very basic script takes a single parameter for the output filename. The script runs the zpool iostat command for a 5
second interval and strips the output down removing the formatting, line separators and the useless I/O stats since power
on which are not helpful in determining real-time information.
The resulting iostat.cache file looks like:
1 rpool 11.5G 533G 0 15 0 66.6K
2 storage 258K 278G 0 0 0 0
Very basic, but we have a an easy to access file that has up-to-date I/O stats.
Next, I created a very simple script called iostats_get_item.sh to retrieve an individual item for all storage pools (such as
Read IOPS) from the iostat.cache file:
#!/bin/bash
#
###############################################################
#
# Usage: ./iostats_get_item.sh <#> <io_cache.file>
#
# The parameter # has the following results
#
# 1 - Storage Pool Name
# 2 - Allocated Space
# 3 - Free Space
# 4 - Read IOPS
# 5 - Write IOPS
# 6 - Read Bandwidth
# 7 - Write Bandwidth
#
# Ex: ./iostats_get_item.sh 1 /tmp/iostat.cache
#
###############################################################

FILENAME=$2
let x=1
cat $2 |
while read line; do
for i in $line; do
if [ $x -eq $1 ]; then # if the first line item
if [ $1 -eq 1 ]; then
echo $i # print storage pool name
else
echo $i | sed -e
"s/K/*1024/g;s/M/*1024*1024/;s/G/*1024*1024*1024/;s/T/*1024*1024*1024*1024/" | bc | sed 's/[.].*//
fi
fi
((x++))
done
((x=1))
done
Using this script, I can now easily return the information for each individual stat via SNMP for all zpools, ex:
~#: ./iostats_get_item.sh 1 iostat.cache
rpool
storage
~#: ./iostats_get_item.sh 2 iostat.cache
12348030976
264192

At this point you can use the NET-SNMP-EXTEND-MIB to extend the data by calling the script and passing the appropriate
parameters. Unfortunately, NET-SNMP-EXTEND was a little flaky with passing parameters. Depending on how you use the
extend/pass features, I had mixed results.
To guarantee things work, and keep things more readable, we created an individual script files to run the actual commands:
~#: cat getName.sh
/opt/utils/iostats/iostats_get_item.sh 1 /opt/utils/iostats/iostat.cache

~# cat getAllocSpace.sh
/opt/utils/iostats/iostats_get_item.sh 2 /opt/utils/iostats/iostat.cache

~# cat getFreeSpace.sh
/opt/utils/iostats/iostats_get_item.sh 3 /opt/utils/iostats/iostat.cache

~# cat getReadIOP.sh
/opt/utils/iostats/iostats_get_item.sh 4 /opt/utils/iostats/iostat.cache

~# cat getWriteIOP.sh
/opt/utils/iostats/iostats_get_item.sh 5 /opt/utils/iostats/iostat.cache

~# cat getReadBand.sh
/opt/utils/iostats/iostats_get_item.sh 6 /opt/utils/iostats/iostat.cache

~# cat getWriteBand.sh
/opt/utils/iostats/iostats_get_item.sh 7 /opt/utils/iostats/iostat.cache

It seemed a little self-defeating after putting the work into a single script that can get the individual items, but at least this
way we know it will always works as intended but at least I got to use the script.
At this point, simply add the following lines to your SNMP configuration file:
########################################################################################
# Capacity
########################################################################################

extend .1.3.6.1.4.1.2021.87 zpool_name /usr/gnu/bin/sh


/opt/utils/capacity/zpools_name.sh
extend .1.3.6.1.4.1.2021.87 zpool_capacity /usr/gnu/bin/sh
/opt/utils/capacity/zpools_capacity.sh

extend .1.3.6.1.4.1.2021.88 zio_name /usr/gnu/bin/sh /opt/utils/iostats/getName.sh


extend .1.3.6.1.4.1.2021.88 zio_allo /usr/gnu/bin/sh /opt/utils/iostats/getAllocSpace.sh
extend .1.3.6.1.4.1.2021.88 zio_free /usr/gnu/bin/sh /opt/utils/iostats/getFreeSpace.sh

########################################################################################
# IO STATS
########################################################################################

############################# IOPS ############################################

extend .1.3.6.1.4.1.2021.89 zio_name /usr/gnu/bin/sh /opt/utils/iostats/getName.sh


extend .1.3.6.1.4.1.2021.89 zio_readIOPs /usr/gnu/bin/sh
/opt/utils/iostats/getReadIOP.sh
extend .1.3.6.1.4.1.2021.89 zio_writeIOPs /usr/gnu/bin/sh
/opt/utils/iostats/getWriteIOP.sh

############################ Bandwidth ########################################

extend .1.3.6.1.4.1.2021.90 zio_name /usr/gnu/bin/sh /opt/utils/iostats/getName.sh


extend .1.3.6.1.4.1.2021.90 zio_readBand /usr/gnu/bin/sh
/opt/utils/iostats/getReadBand.sh
extend .1.3.6.1.4.1.2021.90 zio_writeBand /usr/gnu/bin/sh
/opt/utils/iostats/getWriteBand.sh
Now were ready to make some graphs!

I havent had as much time as I would have liked to post more information on gathering I/O stats with zpool iostat and
presenting them via SNMP. However, I have a couple test graphs up and running to see how the current stat gathering is
working, and I figured I would show some charts.
Mmmm charts.

You might also like