Environment
- Red Hat Enterprise Linux (RHEL) 6 Update 2 or later with the High Availability Add On
fence_scsi
configured as the fence agent- System configured to use the
fence_scsi_check.pl
watchdog
Resolution
1) The
fence_scsi_check
watchdog script should trigger a reboot when a clusternode has been successfully fenced via the fence_scsi agent. To test this, simply use the fence_node
utility. The cluster node that was fenced should reboot itself.# fence_node <nodename>
2) Verify that a reboot will not occur if a clusternode is registered with any device listed in the
fence_scsi.dev
file. A reboot is only triggered when a clusternode's key is no longer registered with all of devices listed in the fence_scsi.dev
file.
This test is similar to test #1, but instead of using fence_node to remove a clusternode's key from all the devices at once, removed a clusternode's key from the devices in multiple steps.
Assume that a node is registered with three devices:
/dev/sda
, /dev/sdb
, and /dev/sdc
.# fence_scsi -o off -n <nodename> -d /dev/sda
# fence_scsi -o off -n <nodename> -d /dev/sdb
After removing a node's key from two of the three devices, no reboot should be triggered.
# fence_scsi -o off -n <nodename> -d /dev/sdc
After removing a node's key from the last device, a reboot should be triggered.
3) The
fence_scsi_check
watchdog script should not trigger a reboot if registration fails.
If registration (unfencing) fails, neither the
fence_scsi.key
or fence_scsi.dev
file should exist and the fence_scsi_check
watchdog script should exist without error.
This can be tested in at least two ways. One option is to manually define the devices you wish to register with in the
/etc/cluster/cluster.conf
file. In this list of devices, specify either non-existent devices or devices that do not support SCSI-3 persistent reservations. Another option is to manually remove the fence_scsi.key
file and/or the fence_scsi.dev
file.
4) The
fence_scsi_check
watchdog script should not trigger a reboot if the watchdog service is running but cman service has not been started.
There must be distinction between the node having been fenced (ie. it was registered with the devices, but now is not) versus the clusternode having never been registered with the devices. The watchdog script should not trigger a reboot unless the clusternode lost its registrations. This test also depends on the existence of the
fence_scsi.key
and fence_scsi.dev
files.
To test this, simply start the watchdog service without starting the cman service and neither the fence_scsi.dev file nor the
fence_scsi.key
file should exist, and therefore there are no registrations to check.