Package Configuration Planning

Planning for packages involves assembling information about each group of highly available services.




	NOTE: As of Serviceguard A.11.18, there is a new and simpler way to configure packages. This method allows you to build packages out of smaller modules, and eliminates the separate package control script and the need to distribute it manually; see Chapter 6 “Configuring Packages and Their Services” for complete instructions. This manual refers to packages produced by the newer method as modular packages, and to packages produced by the older method as legacy packages. The discussion that follows assumes you will be using the modular method. For information and instructions on creating and maintaining older packages, see “Configuring a Legacy Package”.

Logical Volume and File System Planning




	NOTE: LVM Volume groups that are to be activated by packages must also be defined as cluster-aware in the cluster configuration file. See “Cluster Configuration Planning ”. Disk groups (for Veritas volume managers) that are to be activated by packages must be defined in the package configuration file, described below.

You may need to use logical volumes in volume groups as part of the infrastructure for package operations on a cluster. When the package moves from one node to another, it must be able to access data residing on the same disk as on the previous node. This is accomplished by activating the volume group and mounting the file system that resides on it.

In Serviceguard, high availability applications, services, and data are located in volume groups that are on a shared bus. When a node fails, the volume groups containing the applications, services, and data of the failed node are deactivated on the failed node and activated on the adoptive node. In order for this to happen, you must configure the volume groups so that they can be transferred from the failed node to the adoptive node.

As part of planning, you need to decide the following:

What volume groups are needed?
How much disk space is required, and how should this be allocated in logical volumes?
What file systems need to be mounted for each package?
Which nodes need to import which logical volume configurations?
If a package moves to an adoptive node, what effect will its presence have on performance?

Create a list by package of volume groups, logical volumes, and file systems. Indicate which nodes need to have access to common file systems at different times.

HP recommends that you use customized logical volume names that are different from the default logical volume names (lvol1, lvol2, etc.). Choosing logical volume names that represent the high availability applications that they are associated with (for example, lvoldatabase) will simplify cluster administration.

To further document your package-related volume groups, logical volumes, and file systems on each node, you can add commented lines to the /etc/fstab file. The following is an example for a database application:

# /dev/vg01/lvoldb1 /applic1 vxfs defaults 0 1   # These six entries are
# /dev/vg01/lvoldb2 /applic2 vxfs defaults 0 1   # for information purposes
# /dev/vg01/lvoldb3 raw_tables ignore ignore 0 0 # only. They record the
# /dev/vg01/lvoldb4 /general vxfs defaults 0 2   # logical volumes that
# /dev/vg01/lvoldb5 raw_free ignore ignore 0 0   # exist for Serviceguard's
# /dev/vg01/lvoldb6 raw_free ignore ignore 0 0   # HA package. Do not uncomment.

Create an entry for each logical volume, indicating its use for a file system or for a raw device. Don’t forget to comment out the lines (using the # character as shown).




	NOTE: Do not use `/etc/fstab` to mount file systems that are used by Serviceguard packages.

Planning Veritas Cluster Volume Manager (CVM) and Cluster File System (CFS)




	NOTE: Check the Serviceguard, SGeRAC, and SMS Compatibility and Feature Matrix and the latest Release Notes for your version of Serviceguard for up-to-date information on support for CVM and CFS: `http://www.docs.hp.com -> High Availability -> Serviceguard`.

For a failover package that uses the CVM or CFS, you configure system multi-node packages to handle the volume groups and file systems.




	CAUTION: Serviceguard manages Veritas processes, specifically gab and LLT, through system multi-node packages. As a result, the Veritas administration commands such as gabconfig, llthosts, and lltconfig should only be used in display mode, for example gabconfig -a. You could crash nodes or the entire cluster if you use Veritas commands such as the gab* or llt* commands to configure these components or affect their runtime behavior.

CVM Version 3.5

Veritas Cluster Volume Manager 3.5 uses the system multi-node package VxVM-CVM-pkg to manage the cluster’s volumes.

You configure one heartbeat network for CVM 3.5 and VxVM-CVM-pkg. Multiple heartbeats are not supported. Using APA, Infiniband, or VLAN interfaces as the heartbeat network is not supported.

CVM 4.1 and later without CFS

Veritas Cluster Volume Manager 4.1 and later uses the system multi-node package SG-CFS-pkg to manage the cluster’s volumes.

CVM 4.1 and later and the SG-CFS-pkg require you to configure multiple heartbeat networks, or a single heartbeat with a standby. Using APA, Infiniband, or VLAN interfaces as the heartbeat network is not supported.

CVM 4.1 and later with CFS

CFS (Veritas Cluster File System) is supported for use with Veritas Cluster Volume Manager Version 4.1 and later.

The system multi-node package SG-CFS-pkg manages the cluster’s volumes. Two sets of multi-node packages are also used: the CFS mount packages, SG-CFS-MP-id#, and the CFS disk group packages, SG-CFS-DG-id#. Create the multi-node packages with the cfs family of commands; do not edit the configuration file.

You create a chain of package dependencies for application failover packages and the non-failover packages:

The failover package’s applications should not run on a node unless the mount point packages are already running.
In the package’s configuration file, you fill out the dependency parameter to specify the requirement SG-CFS-MP-id# =UP on the SAME_NODE.
The mount point packages should not run unless the disk group packages are running.
Create the mount point packages using the cfsmntadm and cfsmount commands. Serviceguard names the mount point packages SG-CFS-MP-id#, automatically incrementing their ID numbers. In their configuration file, Serviceguard specifies the dependency on the disk group packages.

The disk group packages should not run unless the CFS system multi-node package (SG-CFS-pkg) is running to manage the volumes.

Create the disk group packages with the cfsdgadm. command. Serviceguard names them SG-CFS-DG-id#, automatically incrementing their ID number. In their configuration file, Serviceguard specifies the dependency on the CFS system multi-node package (SG-CFS-pkg).




	CAUTION: Once you create the disk group and mount point packages, it is critical that you administer the cluster with the cfs commands, including cfsdgadm, cfsmntadm, cfsmount, and cfsumount. If you use the general commands such as mount and umount, it could cause serious problems such as writing to the local file system instead of the cluster file system. Any form of the mount command (for example, mount -o cluster, dbed_chkptmount, or sfrac_chkptmount) other than cfsmount or cfsumount in a HP Serviceguard Storage Management Suite environment with CFS should be done with caution. These non-cfs commands could cause conflicts with subsequent command operations on the file system or Serviceguard packages. Use of these other forms of mount will not create an appropriate multi-node package which means that the cluster packages are not aware of the file system changes.




	NOTE: The Disk Group (DG) and Mount Point (MP) multi-node packages (`SG-CFS-DG_ID#` and `SG-CFS-MP_ID#`) do not monitor the health of the disk group and mount point. They check that the application packages that depend on them have access to the disk groups and mount points. If the dependent application package loses access and cannot read and write to the disk, it will fail, but that will not cause the DG or MP multi-node package to fail.

You create the CFS package, SG-CFS-pkg, with the cfscluster command. It is a system multi-node package that regulates the volumes used by CVM 4.1 and later. System multi-node packages cannot be dependent on any other package.

Planning for Expansion

You can add packages to a running cluster. This process is described in Chapter 7 “Cluster and Package Maintenance”.

When adding packages, be sure not to exceed the value of max_configured_packages as defined in the cluster configuration file. (see “Cluster Configuration Parameters ”). You can modify this parameter while the cluster is running if you need to.

Choosing Switching and Failover Behavior

To determine the failover behavior of a failover package (see “Package Types”), you define the policy that governs where Serviceguard will automatically start up a package that is not running. In addition, you define a failback policy that determines whether a package will be automatically returned to its primary node when that is possible.

The following table describes different types of failover behavior and the settings in the package configuration file that determine each behavior. See “Package Parameter Explanations” for more information.

Table 4-2 Package Failover Behavior

Switching Behavior	Parameters in Configuration File
Package switches normally after detection of service, network, or EMS failure, or when a configured resource dependency is not met. Halt script runs before switch takes place. (Default)	`node_fail_fast_enabled` set to `no`. (Default) `service_fail_fast_enabled` set to `NO` for all services. (Default) `auto_run` set to `yes` for the package. (Default)
Package fails over to the node with the fewest active packages.	`failover_policy` set to `min_package_node`.
Package fails over to the node that is next on the list of nodes. (Default)	`failover_policy` set to `configured_node.` (Default)
Package is automatically halted and restarted on its primary node if the primary node is available and the package is running on a non-primary node.	`failback_policy` set to `automatic`.
If desired, package must be manually returned to its primary node if it is running on a non-primary node.	failback_policy set to `manual`. (Default) failover_policy set to `configured_node`. (Default)
All packages switch following a system reset (an immediate halt without a graceful shutdown) on the node when a specific service fails. Halt scripts are not run.	`service_fail_fast_enabled` set to `yes` for a specific service. `auto_run` set to `yes` for all packages.
All packages switch following a system reset on the node when any service fails. An attempt is first made to reboot the system prior to the system reset.	`service_fail_fast_enabled` set to `yes` for all services. `auto_run` set to `yes` for all packages.

Failover packages can be also configured so that IP addresses switch from a failed LAN card to a standby LAN card on the same node and the same physical subnet. To manage this behavior, use the parameter “local_lan_failover_allowed” in the package configuration file. (yes, meaning enabled, is the default.)

Parameters for Configuring EMS Resources




	NOTE: The default form for parameter names and literal values in the modular package configuration file is lower case; for legacy packages the default is upper case. There are no compatibility issues; Serviceguard is case-insensitive as far as the parameters are concerned. This manual uses lower case, unless the parameter in question is used only in legacy packages, or the context refers exclusively to such a package.

Serviceguard provides a set of parameters for configuring EMS (Event Monitoring Service) resources. These are resource_name, resource_polling_interval, resource_start, and resource_up_value. Configure each of these parameters in the package configuration file for each resource the package will be dependent on.

The resource_start parameter determines when Serviceguard starts up resource monitoring for EMS resources. resource_start can be set to either automatic or deferred.

Serviceguard will start up resource monitoring for automatic resources automatically when the Serviceguard cluster daemon starts up on the node.

Serviceguard will not attempt to start deferred resource monitoring during node startup, but will start monitoring these resources when the package runs.

The following is an example of how to configure deferred and automatic resources.

resource_name /net/interfaces/lan/status/lan0
resource_polling_interval 60
resource_start deferred
resource_up_value = up
 
resource_name /net/interfaces/lan/status/lan1
resource_polling_interval 60
resource_start deferred
resource_up_value = up
 
resource_name /net/interfaces/lan/status/lan0
resource_polling_interval 60
resource_start automatic
resource_up_value = up

NOTE: For a legacy package, specify the deferred resources again in the package control script, using the DEFERRED_RESOURCE_NAME parameter:

DEFERRED_RESOURCE_NAME[0]="/net/interfaces/lan/status/lan0"
DEFERRED_RESOURCE_NAME[1]="/net/interfaces/lan/status/lan1"

If a resource is configured to be AUTOMATIC in a legacy configuration file, you do not need to define DEFERRED_RESOURCE_NAME in the package control script.

About Package Dependencies

Starting in Serviceguard A.11.17, a package can have dependencies on other packages, meaning the package will not start on a node unless the packages it depends on are running on that node.

In Serviceguard A.11.17, package dependencies are supported only for use with certain applications specified by HP, such as the multi-node and system multi-node packages that HP supplies for use with Veritas Cluster File System (CFS) on systems that support it.

As of Serviceguard A.11.18, package dependency is no longer restricted; you can make a package dependent on any other package or packages running on the same cluster node, subject to the restrictions spelled out in Chapter 6, under “dependency_condition”.

Make a package dependent on another package if the first package cannot (or should not) function without the services provided by the second. For example, pkg1 might run a real-time web interface to a database managed by pkg2. In this case it might make sense to make pkg1 dependent on pkg2.

In considering whether or not to create a dependency between packages, consider the “Rules” and “Guidelines” that follow.

Rules

Assume that we want to make pkg1 depend on pkg2.




	NOTE: `pkg1` can depend on more than one other package, and `pkg2` can depend on another package or packages; we are assuming only two packages in order to make the rules as clear as possible.

pkg1 will not start on any node unless pkg2 is running on that node.
pkg1’s “package_type” and “failover_policy” constrain the type and characteristics of pkg2, as follows:
- If pkg1 is a multi-node package, pkg2 must be a multi-node or system multi-node package. (Note that system multi-node packages are not supported for general use.)
- If pkg1 is a failover package and its failover_policy is min_package_node, pkg2 must be a multi-node or system multi-node package.
- If pkg1 is a failover package and its failover_policy is configured_node, pkg2 must be:
  a multi-node or system multi-node package, or
  a failover package whose failover_policy is configured_node.
pkg2 cannot be a failover package whose failover_policy is min_package_node.
pkg2’s node list (see node_name, “node_name”) must contain all of the nodes on pkg1’s.
- Preferably the nodes should be listed in the same order if the dependency is between packages whose failover_policy is configured_node; cmcheckconf and cmapplyconf will warn you if they are not.
A package cannot depend on itself, directly or indirectly.
That is, not only must pkg1 not specify itself in the “dependency_condition”, but pkg1 must not specify a dependency on pkg2 if pkg2 depends on pkg1, or if pkg2 depends on pkg3 which depends on pkg1, etc.
If pkg1 is a failover package and pkg2 is a multi-node or system multi-node package, and pkg2 fails, pkg1 will halt and fail over to the next node on its node_name list on which pkg2 is running (and any other dependencies, such as resource dependencies or a dependency on a third package, are met).
In the case of failover packages with a configured_node failover_policy, a set of rules governs under what circumstances pkg1 can force pkg2 to start on a given node. This is called dragging and is determined by each package’s “priority”. See “Dragging Rules”.
If pkg2 fails, Serviceguard will halt pkg1 and any other packages that depend directly or indirectly on pkg2.
By default, Serviceguard halts packages in dependency order, the dependent package(s) first, then the package depended on. In our example, pkg1 would be halted first, then pkg2. If there were a third package, pkg3, that depended on pkg1, pkg3 would be halted first, then pkg1, then pkg2.
If the halt script for any dependent package hangs, by default the package depended on will wait forever (pkg2 will wait forever for pkg1, and if there is a pkg3 that depends on pkg1, pkg1 will wait forever for pkg3). You can modify this behavior by means of the successor_halt_timeout parameter (see “successor_halt_timeout”). (The successor of a package depends on that package; in our example, pkg1 is a successor of pkg2; conversely pkg2 can be referred to as a predecessor of pkg1.)

Dragging Rules

The priority parameter gives you a way to influence the startup, failover, and failback behavior of a set of failover packages that have a configured_node failover_policy, when one or more of those packages depend on another or others.

The broad rule is that a higher-priority package can drag a lower-priority package, forcing it to start on, or move to, a node that suits the higher-priority package.




	NOTE: This applies only when the packages are automatically started (package switching enabled); cmrunpkg will never force a package to halt.

Keep in mind that you do not have to set priority, even when one or more packages depend on another. The default value, no_priority, may often result in the behavior you want. For example, if pkg1 depends on pkg2, and priority is set to no_priority for both packages, and other parameters such as node_name and auto_run are set as recommended in this section, then pkg1 will normally follow pkg2 to wherever both can run, and this is the common-sense (and may be the most desirable) outcome.

The following examples express the rules as they apply to two failover packages whose “failover_policy” is configured_node. Assume pkg1 depends on pkg2, that node1, node2 and node3 are all specified (in some order) under “node_name” in the configuration file for each package, and that “failback_policy” is set to automatic for each package.




	NOTE: Keep the following in mind when reading the examples that follow, and when actually configuring priorities: “`auto_run`” should be set to `yes` for all the packages involved; the examples assume that it is. Priorities express a ranking order, so a lower number means a higher priority (10 is a higher priority than 30, for example). HP recommends assigning values in increments of 20 so as to leave gaps in the sequence; otherwise you may have to shuffle all the existing priorities when assigning priority to a new package. `no_priority`, the default, is treated as a lower priority than any numerical value. All packages with `no_priority` are by definition of equal priority, and there is no other way to assign equal priorities; a numerical priority must be unique within the cluster. See `priority` (“`priority`”) for more information.

If pkg1 depends on pkg2, and pkg1’s priority is lower than or equal to pkg2’s, pkg2’s node order dominates. Assuming pkg2’s node order is node1, node2, node3, then:

On startup:
- pkg2 will start on node1, or node2 if node1 is not available or does not at present meet all of its dependencies, etc.
  pkg1 will start on whatever node pkg2 has started on (no matter where that node appears on pkg1’s node_name list) provided all of pkg1’s other dependencies are met there.
  If the node where pkg2 has started does not meet all pkg1’s dependencies, pkg1 will not start.
On failover:
- If pkg2 fails on node1, pkg2 will fail over to node2 (or node3 if node2 is not available or does not currently meet all of its dependencies, etc.)
  pkg1 will fail over to whatever node pkg2 has restarted on (no matter where that node appears on pkg1’s node_name list) provided all of pkg1’s dependencies are met there.
  If the node where pkg2 has restarted does not meet all pkg1’s dependencies, pkg1 will not restart.
- If pkg1 fails, pkg1 will not fail over.
  This is because pkg1 cannot restart on any adoptive node until pkg2 is running there, and pkg2 is still running on the original node. pkg1 cannot drag pkg2 because it has insufficient priority to do so.
On failback:
- If both packages have moved from node1 to node2 and node1 becomes available, pkg2 will fail back to node1 only if pkg2’s priority is higher than pkg1’s:
  If the priorities are equal, neither package will fail back (unless pkg1 is not running; in that case pkg2 can fail back).
  If pkg2’s priority is higher than pkg1’s, pkg2 will fail back to node1; pkg1 will fail back to node1 provided all of pkg1’s other dependencies are met there;
  if pkg2 has failed back to node1 and node1 does not meet all of pkg1’s dependencies, pkg1 will halt.

If pkg1 depends on pkg2, and pkg1’s priority is higher than pkg2’s, pkg1’s node order dominates. Assuming pkg1’s node order is node1, node2, node3, then:

On startup:
- pkg1 will select node1 to start on.
- pkg2 will start on node1, provided it can run there (no matter where node1 appears on pkg2’s node_name list).
  If pkg2 is already running on another node, it will be dragged to node1, provided it can run there.
- If pkg2 cannot start on node1, then both packages will attempt to start on node2 (and so on).
Note that the nodes will be tried in the order of pkg1’s node_name list, and pkg2 will be dragged to the first suitable node on that list whether or not it is currently running on another node.
On failover:
- If pkg1 fails on node1, pkg1 will select node2 to fail over to (or node3 if it can run there and node2 is not available or does not meet all of its dependencies; etc.)
- pkg2 will be dragged to whatever node pkg1 has selected, and restart there; then pkg1 will restart there.
On failback:
- If both packages have moved to node2 and node1 becomes available, pkg1 will fail back to node1 if both packages can run there;
  otherwise, neither package will fail back.

Guidelines

As you can see from the “Dragging Rules”, if pkg1 depends on pkg2, it can sometimes be a good idea to assign a higher priority to pkg1, because that provides the best chance for a successful failover (and failback) if pkg1 fails.

But you also need to weigh the relative importance of the packages. If pkg2 runs a database that is central to your business, you probably want it to run undisturbed, no matter what happens to application packages that depend on it. In this case, the database package should have the highest priority.

Note that, if no priorities are set, the dragging rules favor a package that is depended on over a package that depends on it.

Consider assigning a higher priority to a dependent package if it is about equal in real-world importance to the package it depends on; otherwise assign the higher priority to the more important package, or let the priorities of both packages default.

You also need to think about what happens when a package fails. If other packages depend on it, Serviceguard will halt those packages (and any packages that depend on them, etc.) This happens regardless of the priority of the failed package.

By default the packages are halted in the reverse of the order in which they were started; and if the halt script for any of the dependent packages hangs, the failed package will wait indefinitely to complete its own halt process. This provides the best chance for all the dependent packages to halt cleanly, but it may not be the behavior you want. You can change it by means of the successor_halt_timeout parameter (see “successor_halt_timeout”).

If you set successor_halt_timeout to zero, Serviceguard will halt the dependent packages in parallel with the failed package; if you set it to a positive number, Serviceguard will halt the packages in the reverse of the start order, but will allow the failed package to halt after the successor_halt_timeout number of seconds whether or not the dependent packages have completed their halt scripts.

About External Scripts

The package configuration template for modular scripts explicitly provides for external scripts. These replace the CUSTOMER DEFINED FUNCTIONS in legacy scripts, and can be run either:

On package startup and shutdown, as essentially the first and last functions the package performs. These scripts are invoked by means of the parameter “external_pre_script”; or
During package execution, after volume-groups and file systems are activated, and IP addresses are assigned, and before the service and resource functions are executed; and again, in the reverse order, on package shutdown. These scripts are invoked by “external_script”.

The scripts are also run when the package is validated by cmcheckconf and cmapplyconf, and must have an entry point for validation; see below.

A package can make use of both kinds of script, and can launch more than one of each kind; in that case the scripts will be executed in the order they are listed in the package configuration file (and in the reverse order when the package shuts down).

Each external script must have three entry points: start, stop, and validate, and should exit with one of the following values:

0 - indicating success.
1 - indicating the package will be halted, and should not be restarted, as a result of failure in this script.
2 - indicating the package will be restarted on another node, or halted if no other node is available.




	NOTE: In the case of the `validate` entry point, exit values `1` and `2` are treated the same; you can use either to indicate that validation failed.

The script can make use of a standard set of environment variables (including the package name, SG_PACKAGE, and the name of the local node, SG_NODE) exported by the package manager or the master control script that runs the package; and can also call a function to source in a logging function and other utility functions. One of these functions, sg_source_pkg_env(), provides access to all the parameters configured for this package, including package-specific environment variables configured via the pev_ parameter (see “pev_”).

For more information, see the template in $SGCONF/examples/external_script.template.

A sample script follows. It assumes there is another script called monitor.sh, which will be configured as a Serviceguard service to monitor some application. The monitor.sh script (not included here) uses a parameter PEV_MONITORING_INTERVAL, defined in the package configuration file, to periodically poll the application it wants to monitor; for example:

PEV_MONITORING_INTERVAL 60

At validation time, the sample script makes sure the PEV_MONITORING_INTERVAL and the monitoring service are configured properly; at start and stop time it prints out the interval to the log file.

#!/bin/sh# Source utility functions. if [[ -z $SG_UTILS ]]then    . /etc/cmcluster.conf    SG_UTILS=$SGCONF/scripts/mscripts/utils.shfiif [[ -f ${SG_UTILS} ]]; then    . ${SG_UTILS}    if (( $? != 0 ))    then        echo "ERROR: Unable to source package utility functions file: ${SG_UTILS}"        exit 1    fielse    echo "ERROR: Unable to find package utility functions file: ${SG_UTILS}"    exit 1fi# Get the environment for this package through utility function# sg_source_pkg_env().sg_source_pkg_env $*function validate_command{    typeset -i ret=0    typeset -i i=0    typeset -i found=0    # check PEV_ attribute is configured and within limits    if [[ -z PEV_MONITORING_INTERVAL ]]    then        sg_log 0 "ERROR: PEV_MONITORING_INTERVAL attribute not configured!"        ret=1    elif (( PEV_MONITORING_INTERVAL < 1 ))    then        sg_log 0 "ERROR: PEV_MONITORING_INTERVAL value ($PEV_MONITORING_INTERVAL) not within legal limits!"        ret=1    fi    # check monitoring service we are expecting for this package is configured    while (( i < ${#SG_SERVICE_NAME[*]} ))    do        case ${SG_SERVICE_CMD[i]} in            *monitor.sh*) # found our script                          found=1                          break                          ;;                       *)                          ;;        esac        (( i = i + 1 ))    done    if (( found == 0 ))    then        sg_log 0 "ERROR: monitoring service not configured!"        ret=1    fi    if (( ret == 1 ))    then        sg_log 0 "Script validation for $SG_PACKAGE_NAME failed!"    fi    return $ret}function start_command{    sg_log 5 "start_command"    # log current PEV_MONITORING_INTERVAL value, PEV_ attribute can be changed    # while the package is running    sg_log 0 "PEV_MONITORING_INTERVAL for $SG_PACKAGE_NAME is $PEV_MONITORING_INTERVAL"    return 0}function stop_command{    sg_log 5 "stop_command"    # log current PEV_MONITORING_INTERVAL value, PEV_ attribute can be changed    # while the package is running    sg_log 0 "PEV_MONITORING_INTERVAL for $SG_PACKAGE_NAME is $PEV_MONITORING_INTERVAL"    return 0}typeset -i exit_val=0case ${1} in     start)           start_command $*           exit_val=$?           ;;     stop)           stop_command $*           exit_val=$?           ;;     validate)           validate_command $*           exit_val=$?           ;;     *)               sg_log 0 "Unknown entry point $1"           ;;esacexit $exit_val

For more information about integrating an application with Serviceguard, see the white paper Framework for HP Serviceguard Toolkits, which includes a suite of customizable scripts. The white paper is included in the Serviceguard Developer’s Toolkit, which you can download free of charge from http://www.hp.com/go/softwaredepot.

Using Serviceguard Commands in an External Script

You can use Serviceguard commands (such as cmmodpkg) in an external script run from a package. These commands must not interact with that package itself (that is, the package that runs the external script) but can interact with other packages. But be careful how you code these interactions.

If a Serviceguard command interacts with another package, be careful to avoid command loops. For instance, a command loop might occur under the following circumstances. Suppose a script run by pkg1 does a cmmodpkg -d of pkg2, and a script run by pkg2 does a cmmodpkg -d of pkg1. If both pkg1 and pkg2 start at the same time, the pkg1 script now tries to cmmodpkg pkg2. But that cmmodpkg command has to wait for pkg2 startup to complete. The pkg2 script tries to cmmodpkg pkg1, but pkg2 has to wait for pkg1 startup to complete, thereby causing a command loop.

To avoid this situation, it is a good idea to specify a run_script_timeout and halt_script_timeout for all packages, especially packages that use Serviceguard commands in their external scripts. If a timeout is not specified and your configuration has a command loop as described above, inconsistent results can occur, including a hung cluster.

Determining Why a Package Has Shut Down

You can use an external script (or CUSTOMER DEFINED FUNCTIONS area of a legacy package control script) to find out why a package has shut down.

Serviceguard sets the environment variable SG_HALT_REASON in the package control script to one of the following values when the package halts:

failure - set if the package halts because of the failure of a subnet, resource, or service it depends on
user_halt - set if the package is halted by a cmhaltpkg or cmhaltnode command, or by corresponding actions in Serviceguard Manager
automatic_halt - set if the package is failed over automatically because of the failure of a package it depends on, or is failed back to its primary node automatically (failback_policy = automatic)

You can add custom code to the package to interrogate this variable, determine why the package halted, and take appropriate action. For legacy packages, put the code in the customer_defined_halt_cmds() function in the CUSTOMER DEFINED FUNCTIONS area of the package control script (see “Adding Customer Defined Functions to the Package Control Script ”); for modular packages, put the code in the package’s external script (see “About External Scripts”).

For example, if a database package is being halted by an administrator (SG_HALT_REASON set to user_halt) you would probably want the custom code to perform an orderly shutdown of the database; on the other hand, a forced shutdown might be needed if SG_HALT_REASON is set to failure, indicating that the package is halting abnormally (for example because of the failure of a service it depends on).

last_halt_failed

cmviewcl -v -f line displays a last_halt_failed flag.




	NOTE: `last_halt_failed` appears only in the line output of cmviewcl, not the default tabular format; you must use the `-v` and `-f line` options to see it.

The value of last_halt_failed is no if the halt script ran successfully, or was not run since the node joined the cluster, or was not run since the package was configured to run on the node; otherwise it is yes.

About Cross-Subnet Failover

It is possible to configure a cluster that spans subnets joined by a router, with some nodes using one subnet and some another. This is known as a cross-subnet configuration (see “Cross-Subnet Configurations”). In this context, you can configure packages to fail over from a node on one subnet to a node on another.

The implications for configuring a package for cross-subnet failover are as follows:

For modular packages, you must configure two new parameters in the package configuration file to allow packages to fail over across subnets:
- “ip_subnet_node ” - to indicate which nodes a subnet is configured on
- “monitored_subnet_access” - to indicate whether a monitored subnet is configured on all nodes (FULL) or only some (PARTIAL). (Leaving monitored_subnet_access unconfigured for a monitored subnet is equivalent to FULL.)
(For legacy packages, see “Configuring Cross-Subnet Failover”.)
You should not use the wildcard (*) for node_name in the package configuration file, as this could allow the package to fail over across subnets when a node on the same subnet is eligible. Instead, list the nodes in order of preference.
Each subnet interface (NIC) used by the package must have a standby interface on the local bridged net.
The standby interface can be shared between subnets.
Deploying applications in this environment requires careful consideration; see “Implications for Application Deployment”.
If a monitored_subnet is configured for PARTIAL monitored_subnet_access in a package’s configuration file, it must be configured on at least one of the nodes on the node_name list for that package.
Conversely, if all of the subnets that are being monitored for this package are configured for PARTIAL access, each node on the node_name list must have at least one of these subnets configured.
- As in other cluster configurations, a package will not start on a node unless the subnets configured on that node, and specified in the package configuration file as monitored subnets, are up.

Implications for Application Deployment

Because the relocatable IP address will change when a package fails over to a node on another subnet, you need to make sure of the following:

The hostname used by the package is correctly remapped to the new relocatable IP address.
The application that the package runs must be configured so that the clients can reconnect to the package’s new relocatable IP address.
In the worst case (when the server where the application was running is down), the client may continue to retry the old IP address until TCP’s tcp_timeout is reached (typically about ten minutes), at which point it will detect the failure and reset the connection.

For more information, see the white paper Technical Considerations for Creating a Serviceguard Cluster that Spans Multiple IP Subnets, at http://docs.hp.com -> High Availability.

Configuring a Package to Fail Over across Subnets: Example

To configure a package to fail over across subnets, you need to make some additional edits to the package configuration file.




	NOTE: This section provides an example for a modular package; for legacy packages, see “Configuring Cross-Subnet Failover”.

Suppose that you want to configure a package, pkg1, so that it can fail over among all the nodes in a cluster comprising NodeA, NodeB, NodeC, and NodeD.

NodeA and NodeB use subnet 15.244.65.0, which is not used by NodeC and NodeD; and NodeC and NodeD use subnet 15.244.56.0, which is not used by NodeA and NodeB. (See “Obtaining Cross-Subnet Information” for sample cmquerycl output).

Configuring node_name

First you need to make sure that pkg1 will fail over to a node on another subnet only if it has to. For example, if it is running on NodeA and needs to fail over, you want it to try NodeB, on the same subnet, before incurring the cross-subnet overhead of failing over to NodeC or NodeD.

Assuming nodeA is pkg1’s primary node (where it normally starts), create node_name entries in the package configuration file as follows:

node_name nodeA

node_name nodeB

node_name nodeC

node_name nodeD

Configuring monitored_subnet_access

In order to monitor subnet 15.244.65.0 or 15.244.56.0, depending on where pkg1 is running, you would configure monitored_subnet and monitored_subnet_access in pkg1’s package configuration file as follows:

monitored_subnet 15.244.65.0

monitored_subnet_access PARTIAL

monitored_subnet 15.244.56.0

monitored_subnet_access PARTIAL




	NOTE: Configuring `monitored_subnet_access` as `FULL` (or not configuring `monitored_subnet_access`) for either of these subnets will cause the package configuration to fail, because neither subnet is available on all the nodes.

Configuring ip_subnet_node

Now you need to specify which subnet is configured on which nodes. In our example, you would do this by means of entries such as the following in the package configuration file:

ip_subnet 15.244.65.0

ip_subnet_node nodeA

ip_subnet_node nodeB

ip_subnet 15.244.65.82

ip_address 15.244.65.83

ip_subnet 15.244.56.0

ip_subnet_node nodeC

ip_subnet_node nodeD

ip_subnet 15.244.56.100

ip_subnet 15.244.56.101

Configuring a Package: Next Steps

When you are ready to start configuring a package, proceed to Chapter 6 “Configuring Packages and Their Services”; start with “Choosing Package Modules”. (If you find it helpful, you can assemble your package configuration data ahead of time on a separate worksheet for each package; blank worksheets are in Appendix F “Blank Planning Worksheets”.)