How the Package Manager Works

Packages are the means by which Serviceguard starts and halts configured applications. A package is a collection of services, disk volumes and IP addresses that are managed by Serviceguard to ensure they are available.

Each node in the cluster runs an instance of the package manager; the package manager residing on the cluster coordinator is known as the package coordinator.

The package coordinator does the following:

Decides when and where to run, halt, or move packages.

The package manager on all nodes does the following:

Executes the control scripts that run and halt packages and their services.
Reacts to changes in the status of monitored resources.

Package Types

Three different types of packages can run in the cluster; the most common is the failover package. There are also special-purpose packages that run on more than one node at a time, and so do not failover. They are typically used to manage resources of certain failover packages.

Non-failover Packages

There are two types of special-purpose packages that do not fail over and that can run on more than one node at the same time: the system multi-node package, which runs on all nodes in the cluster, and the multi-node package, which can be configured to run on all or some of the nodes in the cluster. System multi-node packages are reserved for use by HP-supplied applications, such as Veritas Cluster Volume Manager (CVM) and Cluster File System (CFS).

The rest of this section describes failover packages.

Failover Packages

A failover package starts up on an appropriate node (see node_name on “node_name”) when the cluster starts. A package failover takes place when the package coordinator initiates the start of a package on a new node. A package failover involves both halting the existing package (in the case of a service, network, or resource failure), and starting the new instance of the package.

Failover is shown in the following figure:

Figure 3-4 Package Moving During Failover

Configuring Failover Packages

You configure each package separately. You create a failover package by generating and editing a package configuration file template, then adding the package to the cluster configuration database; see Chapter 6 “Configuring Packages and Their Services”.

For legacy packages (packages created by the method used on versions of Serviceguard earlier than A.11.18), you must also create a package control script for each package, to manage the execution of the package’s services. See “Configuring a Legacy Package” for detailed information.

Customized package control scripts are not needed for modular packages (packages created by the method introduced in Serviceguard A.11.18). These packages are managed by a master control script that is installed with Serviceguard; see Chapter 6 “Configuring Packages and Their Services”, for instructions for creating modular packages.

Deciding When and Where to Run and Halt Failover Packages

The package configuration file assigns a name to the package and includes a list of the nodes on which the package can run.

Failover packages list the nodes in order of priority (i.e., the first node in the list is the highest priority node). In addition, failover packages’ files contain three parameters that determine failover behavior. These are the auto_run parameter, the failover_policy parameter, and the failback_policy parameter.

Failover Packages’ Switching Behavior

The auto_run parameter (known in earlier versions of Serviceguard as the PKG_SWITCHING_ENABLED parameter) defines the default global switching attribute for a failover package at cluster startup: that is, whether Serviceguard can automatically start the package when the cluster is started, and whether Serviceguard should automatically restart the package on a new node in response to a failure. Once the cluster is running, the package switching attribute of each package can be temporarily set with the cmmodpkg command; at reboot, the configured value will be restored.

The auto_run parameter is set in the package configuration file.

A package switch normally involves moving a failover package and its associated IP addresses to a new system on the same subnet. In this case, the new system must have the same subnet configured and working properly; otherwise the package will not be started.




	NOTE: It is possible to configure a cluster that spans subnets joined by a router, with some nodes using one subnet and some another. This is known as a cross-subnet configuration. In this context, you can configure packages to fail over from a node on one subnet to a node on another, and you will need to configure a relocatable IP address for each subnet the package is configured to start on; see “About Cross-Subnet Failover”, and in particular the subsection “Implications for Application Deployment”.

When a package fails over, TCP connections are lost. TCP applications must reconnect to regain connectivity; this is not handled automatically. Note that if the package is dependent on multiple subnets, normally all of them must be available on the target node before the package will be started. (In a cross-subnet configuration, all the monitored subnets that are specified for this package, and configured on the target node, must be up.)

If the package has a dependency on a resource or another package, the dependency must be met on the target node before the package can start.

The switching of relocatable IP addresses on a single subnet is shown in Figure 3-5 “Before Package Switching” and Figure 3-6 “After Package Switching”. Figure 3-5 “Before Package Switching” shows a two node cluster in its original state with Package 1 running on Node 1 and Package 2 running on Node 2. Users connect to the node with the IP address of the package they wish to use. Each node has a stationary IP address associated with it, and each package has an IP address associated with it.

Figure 3-5 Before Package Switching

Figure 3-6 “After Package Switching” shows the condition where Node 1 has failed and Package 1 has been transferred to Node 2 on the same subnet. Package 1’s IP address was transferred to Node 2 along with the package. Package 1 continues to be available and is now running on Node 2. Also note that Node 2 can now access both Package1’s disk and Package2’s disk.




	NOTE: For design and configuration information about site-aware disaster-tolerant clusters (which span subnets), see the documents listed under “Cross-Subnet Configurations”.

Figure 3-6 After Package Switching

Failover Policy

The Package Manager selects a node for a failover package to run on based on the priority list included in the package configuration file together with the failover_policy parameter, also in the configuration file. The failover policy governs how the package manager selects which node to run a package on when a specific node has not been identified and the package needs to be started. This applies not only to failovers but also to startup for the package, including the initial startup. The two failover policies are configured_node (the default) and min_package_node. The parameter is set in the package configuration file.

If you use configured_node as the value for the failover policy, the package will start up on the highest priority node available in the node list. When a failover occurs, the package will move to the next highest priority node in the list that is available.

If you use min_package_node as the value for the failover policy, the package will start up on the node that is currently running the fewest other packages. (Note that this does not mean the lightest load; the only thing that is checked is the number of packages currently running on the node.)

Automatic Rotating Standby

Using the min_package_node failover policy, it is possible to configure a cluster that lets you use one node as an automatic rotating standby node for the cluster. Consider the following package configuration for a four node cluster. Note that all packages can run on all nodes and have the same node_name lists. Although the example shows the node names in a different order for each package, this is not required.

Table 3-1 Package Configuration Data

Package Name	`NODE_NAME` List	`FAILOVER_POLICY`
pkgA	node1, node2, node3, node4	`MIN_PACKAGE_NODE`
pkgB	node2, node3, node4, node1	`MIN_PACKAGE_NODE`
pkgC	node3, node4, node1, node2	`MIN_PACKAGE_NODE`

When the cluster starts, each package starts as shown in Figure 3-7 “Rotating Standby Configuration before Failover”.

Figure 3-7 Rotating Standby Configuration before Failover

If a failure occurs, any package would fail over to the node containing fewest running packages, as in Figure 3-8 “Rotating Standby Configuration after Failover”, which shows a failure on node 2:

Figure 3-8 Rotating Standby Configuration after Failover




	NOTE: Using the `min_package_node` policy, when node 2 is repaired and brought back into the cluster, it will then be running the fewest packages, and thus will become the new standby node.

If these packages had been set up using the configured_node failover policy, they would start initially as in Figure 3-7 “Rotating Standby Configuration before Failover”, but the failure of node 2 would cause the package to start on node 3, as in Figure 3-9 “CONFIGURED_NODE Policy Packages after Failover”:

Figure 3-9 CONFIGURED_NODE Policy Packages after Failover

If you use configured_node as the failover policy, the package will start up on the highest priority node in the node list, assuming that the node is running as a member of the cluster. When a failover occurs, the package will move to the next highest priority node in the list that is available.

Failback Policy

The use of the failback_policy parameter allows you to decide whether a package will return to its primary node if the primary node becomes available and the package is not currently running on the primary node. The configured primary node is the first node listed in the package’s node list.

The two possible values for this policy are automatic and manual. The parameter is set in the package configuration file:

As an example, consider the following four-node configuration, in which failover_policy is set to configured_node and failback_policy is automatic:

Figure 3-10 Automatic Failback Configuration before Failover

Table 3-2 Node Lists in Sample Cluster

Package Name	`NODE_NAME` List	`FAILOVERPOLICY`	`FAILBACK POLICY`
pkgA	node1, node4	`CONFIGURED_NODE`	`AUTOMATIC`
pkgB	node2, node4	`CONFIGURED_NODE`	`AUTOMATIC`
pkgC	node3, node4	`CONFIGURED_NODE`	`AUTOMATIC`

Node1 panics, and after the cluster reforms, pkgA starts running on node4:

Figure 3-11 Automatic Failback Configuration After Failover

After rebooting, node 1 rejoins the cluster. At that point, pkgA will be automatically stopped on node 4 and restarted on node 1.

Figure 3-12 Automatic Failback Configuration After Restart of Node 1




	NOTE: Setting the `failback_policy` to `automatic` can result in a package failback and application outage during a critical production period. If you are using automatic failback, you may want to wait to add the package’s primary node back into the cluster until you can allow the package to be taken out of service temporarily while it switches back to the primary node.

Using Older Package Configuration Files

If you are using package configuration files that were generated using a previous version of Serviceguard, HP recommends you use the cmmakepkg command to open a new template, and then copy the parameter values into it. In the new template, read the descriptions and defaults of the choices that did not exist when the original configuration was made. For example, the default for failover_policy is now configured_node and the default for failback_policy is now manual.

For full details of the current parameters and their default values, see Chapter 6 “Configuring Packages and Their Services”, and the package configuration file template itself.

Using the Event Monitoring Service

Basic package resources include cluster nodes, LAN interfaces, and services, which are the individual processes within an application. All of these are monitored by Serviceguard directly. In addition, you can use the Event Monitoring Service registry through which add-on monitors can be configured. This registry allows other software components to supply monitoring of their resources for Serviceguard. Monitors currently supplied with other software products include EMS (Event Monitoring Service) High Availability Monitors, and an ATM monitor.

If a monitored resource is configured in a package, the package manager calls the resource registrar to launch an external monitor for the resource. Resources can be configured to start up either at the time the node enters the cluster or at the end of package startup. The monitor then sends messages back to Serviceguard, which checks to see whether the resource is available before starting the package. In addition, the package manager can fail the package to another node or take other action if the resource becomes unavailable after the package starts.

You can specify a monitored resource for a package in Serviceguard Manager, or on the HP-UX command line by using the command /opt/resmon/bin/resls. For additional information, refer to the man page for resls(1m).

Using the EMS HA Monitors

The EMS (Event Monitoring Service) HA Monitors, available as a separate product (B5736DA), can be used to set up monitoring of disks and other resources as package resource dependencies. Examples of resource attributes that can be monitored using EMS include the following:

Logical volume status
Physical volume status
System load
Number of users
File system utilization
LAN health

Once a monitor is configured as a package resource dependency, the monitor will notify the package manager if an event occurs showing that a resource is down. The package may then be failed over to an adoptive node.

The EMS HA Monitors can also be used to report monitored events to a target application such as OpenView IT/Operations for graphical display or for operator notification. Refer to the manual Using High Availability Monitors (http://www.docs.hp.com -> High Availability -> Event Monitoring Service and HA Monitors -> Installation and User’s Guide) for additional information.