SWick

Sysadmin-by-Nature

Monitoring stale NFS and CIFS shares with monit
17th July 2014

A good way to check if a network share like NFS or CIFS is still available, is to monitor an existing file on the share itself.

Doing this with SMB/CIFS is a little easier than with NFS when the NFS share is hard mounted.

The check would then wait forever if the NFS server is not available and not return any error. It could also happen that your checks are piling up in the process list.

Here is an example to make it work by using /usr/bin/timeout

timeout will run stat to check for the file but kill the stat command if it does not return within a specified time period.

For demonstration purposes I use monit but this can be done with any other monitoring solution like OpenNMS by executing the check over net-snmp's extend feature for example.

apt-get install monit

Monit Version >= 5.7

Since monit 5.7 "check program" supports now arguments.

/etc/monit/conf.d/cifs_nfs

check program CIFS with path "/usr/bin/timeout 1 /usr/bin/stat -t /media/cifs/test.txt"
  if status != 0 then alert

check program NFS with path "/usr/bin/timeout 1 /usr/bin/stat -t /media/nfs/test.txt"
  if status != 0 then alert

Monit Version < 5.7

With older versions of monit you have to use a wrapper script for the check.

mkdir /etc/monit/check_scripts/

/etc/monit/conf.d/cifs_nfs

check program CIFS with path "/etc/monit/check_scripts/check_stale_cifs.sh"
  if status != 0 then alert

check program NFS with path "/etc/monit/check_scripts/check_stale_nfs.sh"
  if status != 0 then alert

/etc/monit/check_scripts/check_stale_cifs.sh

#!/bin/bash

CHECK_FILE="/media/cifs/test.txt"
TIMEOUT=1

BIN_TIMEOUT=/usr/bin/timeout
BIN_STAT=/usr/bin/stat

"$BIN_TIMEOUT" "$TIMEOUT" "$BIN_STAT" -t "$CHECK_FILE" > /dev/null 2> /dev/null

RETVAL=$?

[ $RETVAL -eq 0   ] && echo "Ok. Found $CHECK_FILE" && exit $RETVAL
[ $RETVAL -eq 124 ] && echo "Timed out checking for $CHECK_FILE" >&2 && exit $RETVAL
[ $RETVAL -ne 0   ] && echo "Could not find $CHECK_FILE" >&2 && exit $RETVAL

/etc/monit/check_scripts/check_stale_nfs.sh

#!/bin/bash

CHECK_FILE="/media/nfs/test.txt"
TIMEOUT=1

BIN_TIMEOUT=/usr/bin/timeout
BIN_STAT=/usr/bin/stat

"$BIN_TIMEOUT" "$TIMEOUT" "$BIN_STAT" -t "$CHECK_FILE" > /dev/null 2> /dev/null

RETVAL=$?

[ $RETVAL -eq 0   ] && echo "Ok. Found $CHECK_FILE" && exit $RETVAL
[ $RETVAL -eq 124 ] && echo "Timed out checking for $CHECK_FILE" >&2 && exit $RETVAL
[ $RETVAL -ne 0   ] && echo "Could not find $CHECK_FILE" >&2 && exit $RETVAL
Tags: monit, monitoring.

"People said I should accept the world. Bullshit! I don't accept the world." -- Stallman