United States-English |
|
|
Managing Serviceguard Fifteenth Edition > Chapter 4 Planning
and Documenting an HA Cluster Package Configuration Planning |
|
Planning for packages involves assembling information about each group of highly available services.
You may need to use logical volumes in volume groups as part of the infrastructure for package operations on a cluster. When the package moves from one node to another, it must be able to access data residing on the same disk as on the previous node. This is accomplished by activating the volume group and mounting the file system that resides on it. In Serviceguard, high availability applications, services, and data are located in volume groups that are on a shared bus. When a node fails, the volume groups containing the applications, services, and data of the failed node are deactivated on the failed node and activated on the adoptive node. In order for this to happen, you must configure the volume groups so that they can be transferred from the failed node to the adoptive node. As part of planning, you need to decide the following:
Create a list by package of volume groups, logical volumes, and file systems. Indicate which nodes need to have access to common file systems at different times. HP recommends that you use customized logical volume names that are different from the default logical volume names (lvol1, lvol2, etc.). Choosing logical volume names that represent the high availability applications that they are associated with (for example, lvoldatabase) will simplify cluster administration. To further document your package-related volume groups, logical volumes, and file systems on each node, you can add commented lines to the /etc/fstab file. The following is an example for a database application:
Create an entry for each logical volume, indicating its use for a file system or for a raw device. Don’t forget to comment out the lines (using the # character as shown).
For a failover package that uses the CVM or CFS, you configure system multi-node packages to handle the volume groups and file systems.
Veritas Cluster Volume Manager 3.5 uses the system multi-node package VxVM-CVM-pkg to manage the cluster’s volumes. You configure one heartbeat network for CVM 3.5 and VxVM-CVM-pkg. Multiple heartbeats are not supported. Using APA, Infiniband, or VLAN interfaces as the heartbeat network is not supported. Veritas Cluster Volume Manager 4.1 and later uses the system multi-node package SG-CFS-pkg to manage the cluster’s volumes. CVM 4.1 and later and the SG-CFS-pkg require you to configure multiple heartbeat networks, or a single heartbeat with a standby. Using APA, Infiniband, or VLAN interfaces as the heartbeat network is not supported. CFS (Veritas Cluster File System) is supported for use with Veritas Cluster Volume Manager Version 4.1 and later. The system multi-node package SG-CFS-pkg manages the cluster’s volumes. Two sets of multi-node packages are also used: the CFS mount packages, SG-CFS-MP-id#, and the CFS disk group packages, SG-CFS-DG-id#. Create the multi-node packages with the cfs family of commands; do not edit the configuration file. CVM 4.1 and later and the SG-CFS-pkg require you to configure multiple heartbeat networks, or a single heartbeat with a standby. Using APA, Infiniband, or VLAN interfaces as the heartbeat network is not supported. You create a chain of package dependencies for application failover packages and the non-failover packages:
You can add packages to a running cluster. This process is described in Chapter 7 “Cluster and Package Maintenance”. When adding packages, be sure not to exceed the value of max_configured_packages as defined in the cluster configuration file. (see “Cluster Configuration Parameters ”). You can modify this parameter while the cluster is running if you need to. To determine the failover behavior of a failover package (see “Package Types”), you define the policy that governs where Serviceguard will automatically start up a package that is not running. In addition, you define a failback policy that determines whether a package will be automatically returned to its primary node when that is possible. The following table describes different types of failover behavior and the settings in the package configuration file that determine each behavior. See “Package Parameter Explanations” for more information. Table 4-2 Package Failover Behavior
Failover packages can be also configured so that IP addresses switch from a failed LAN card to a standby LAN card on the same node and the same physical subnet. To manage this behavior, use the parameter “local_lan_failover_allowed” in the package configuration file. (yes, meaning enabled, is the default.)
Serviceguard provides a set of parameters for configuring EMS (Event Monitoring Service) resources. These are resource_name, resource_polling_interval, resource_start, and resource_up_value. Configure each of these parameters in the package configuration file for each resource the package will be dependent on. The resource_start parameter determines when Serviceguard starts up resource monitoring for EMS resources. resource_start can be set to either automatic or deferred. Serviceguard will start up resource monitoring for automatic resources automatically when the Serviceguard cluster daemon starts up on the node. Serviceguard will not attempt to start deferred resource monitoring during node startup, but will start monitoring these resources when the package runs. The following is an example of how to configure deferred and automatic resources.
Starting in Serviceguard A.11.17, a package can have dependencies on other packages, meaning the package will not start on a node unless the packages it depends on are running on that node. In Serviceguard A.11.17, package dependencies are supported only for use with certain applications specified by HP, such as the multi-node and system multi-node packages that HP supplies for use with Veritas Cluster File System (CFS) on systems that support it. As of Serviceguard A.11.18, package dependency is no longer restricted; you can make a package dependent on any other package or packages running on the same cluster node, subject to the restrictions spelled out in Chapter 6, under “dependency_condition”. Make a package dependent on another package if the first package cannot (or should not) function without the services provided by the second. For example, pkg1 might run a real-time web interface to a database managed by pkg2. In this case it might make sense to make pkg1 dependent on pkg2. In considering whether or not to create a dependency between packages, consider the “Rules” and “Guidelines” that follow. Assume that we want to make pkg1 depend on pkg2.
The priority parameter gives you a way to influence the startup, failover, and failback behavior of a set of failover packages that have a configured_node failover_policy, when one or more of those packages depend on another or others. The broad rule is that a higher-priority package can drag a lower-priority package, forcing it to start on, or move to, a node that suits the higher-priority package.
Keep in mind that you do not have to set priority, even when one or more packages depend on another. The default value, no_priority, may often result in the behavior you want. For example, if pkg1 depends on pkg2, and priority is set to no_priority for both packages, and other parameters such as node_name and auto_run are set as recommended in this section, then pkg1 will normally follow pkg2 to wherever both can run, and this is the common-sense (and may be the most desirable) outcome. The following examples express the rules as they apply to two failover packages whose “failover_policy” is configured_node. Assume pkg1 depends on pkg2, that node1, node2 and node3 are all specified (in some order) under “node_name” in the configuration file for each package, and that “failback_policy” is set to automatic for each package.
If pkg1 depends on pkg2, and pkg1’s priority is lower than or equal to pkg2’s, pkg2’s node order dominates. Assuming pkg2’s node order is node1, node2, node3, then:
If pkg1 depends on pkg2, and pkg1’s priority is higher than pkg2’s, pkg1’s node order dominates. Assuming pkg1’s node order is node1, node2, node3, then:
As you can see from the “Dragging Rules”, if pkg1 depends on pkg2, it can sometimes be a good idea to assign a higher priority to pkg1, because that provides the best chance for a successful failover (and failback) if pkg1 fails. But you also need to weigh the relative importance of the packages. If pkg2 runs a database that is central to your business, you probably want it to run undisturbed, no matter what happens to application packages that depend on it. In this case, the database package should have the highest priority. Note that, if no priorities are set, the dragging rules favor a package that is depended on over a package that depends on it. Consider assigning a higher priority to a dependent package if it is about equal in real-world importance to the package it depends on; otherwise assign the higher priority to the more important package, or let the priorities of both packages default. You also need to think about what happens when a package fails. If other packages depend on it, Serviceguard will halt those packages (and any packages that depend on them, etc.) This happens regardless of the priority of the failed package. By default the packages are halted in the reverse of the order in which they were started; and if the halt script for any of the dependent packages hangs, the failed package will wait indefinitely to complete its own halt process. This provides the best chance for all the dependent packages to halt cleanly, but it may not be the behavior you want. You can change it by means of the successor_halt_timeout parameter (see “successor_halt_timeout”). If you set successor_halt_timeout to zero, Serviceguard will halt the dependent packages in parallel with the failed package; if you set it to a positive number, Serviceguard will halt the packages in the reverse of the start order, but will allow the failed package to halt after the successor_halt_timeout number of seconds whether or not the dependent packages have completed their halt scripts. The package configuration template for modular scripts explicitly provides for external scripts. These replace the CUSTOMER DEFINED FUNCTIONS in legacy scripts, and can be run either:
The scripts are also run when the package is validated by cmcheckconf and cmapplyconf, and must have an entry point for validation; see below. A package can make use of both kinds of script, and can launch more than one of each kind; in that case the scripts will be executed in the order they are listed in the package configuration file (and in the reverse order when the package shuts down). Each external script must have three entry points: start, stop, and validate, and should exit with one of the following values:
The script can make use of a standard set of environment variables (including the package name, SG_PACKAGE, and the name of the local node, SG_NODE) exported by the package manager or the master control script that runs the package; and can also call a function to source in a logging function and other utility functions. One of these functions, sg_source_pkg_env(), provides access to all the parameters configured for this package, including package-specific environment variables configured via the pev_ parameter (see “pev_”). For more information, see the template in $SGCONF/examples/external_script.template. A sample script follows. It assumes there is another script called monitor.sh, which will be configured as a Serviceguard service to monitor some application. The monitor.sh script (not included here) uses a parameter PEV_MONITORING_INTERVAL, defined in the package configuration file, to periodically poll the application it wants to monitor; for example: PEV_MONITORING_INTERVAL 60 At validation time, the sample script makes sure the PEV_MONITORING_INTERVAL and the monitoring service are configured properly; at start and stop time it prints out the interval to the log file.
For more information about integrating an application with Serviceguard, see the white paper Framework for HP Serviceguard Toolkits, which includes a suite of customizable scripts. The white paper is included in the Serviceguard Developer’s Toolkit, which you can download free of charge from http://www.hp.com/go/softwaredepot. You can use Serviceguard commands (such as cmmodpkg) in an external script run from a package. These commands must not interact with that package itself (that is, the package that runs the external script) but can interact with other packages. But be careful how you code these interactions. If a Serviceguard command interacts with another package, be careful to avoid command loops. For instance, a command loop might occur under the following circumstances. Suppose a script run by pkg1 does a cmmodpkg -d of pkg2, and a script run by pkg2 does a cmmodpkg -d of pkg1. If both pkg1 and pkg2 start at the same time, the pkg1 script now tries to cmmodpkg pkg2. But that cmmodpkg command has to wait for pkg2 startup to complete. The pkg2 script tries to cmmodpkg pkg1, but pkg2 has to wait for pkg1 startup to complete, thereby causing a command loop. To avoid this situation, it is a good idea to specify a run_script_timeout and halt_script_timeout for all packages, especially packages that use Serviceguard commands in their external scripts. If a timeout is not specified and your configuration has a command loop as described above, inconsistent results can occur, including a hung cluster. You can use an external script (or CUSTOMER DEFINED FUNCTIONS area of a legacy package control script) to find out why a package has shut down. Serviceguard sets the environment variable SG_HALT_REASON in the package control script to one of the following values when the package halts:
You can add custom code to the package to interrogate this variable, determine why the package halted, and take appropriate action. For legacy packages, put the code in the customer_defined_halt_cmds() function in the CUSTOMER DEFINED FUNCTIONS area of the package control script (see “Adding Customer Defined Functions to the Package Control Script ”); for modular packages, put the code in the package’s external script (see “About External Scripts”). For example, if a database package is being halted by an administrator (SG_HALT_REASON set to user_halt) you would probably want the custom code to perform an orderly shutdown of the database; on the other hand, a forced shutdown might be needed if SG_HALT_REASON is set to failure, indicating that the package is halting abnormally (for example because of the failure of a service it depends on). cmviewcl -v -f line displays a last_halt_failed flag.
The value of last_halt_failed is no if the halt script ran successfully, or was not run since the node joined the cluster, or was not run since the package was configured to run on the node; otherwise it is yes. It is possible to configure a cluster that spans subnets joined by a router, with some nodes using one subnet and some another. This is known as a cross-subnet configuration (see “Cross-Subnet Configurations”). In this context, you can configure packages to fail over from a node on one subnet to a node on another. The implications for configuring a package for cross-subnet failover are as follows:
Because the relocatable IP address will change when a package fails over to a node on another subnet, you need to make sure of the following:
For more information, see the white paper Technical Considerations for Creating a Serviceguard Cluster that Spans Multiple IP Subnets, at http://docs.hp.com -> High Availability. To configure a package to fail over across subnets, you need to make some additional edits to the package configuration file.
Suppose that you want to configure a package, pkg1, so that it can fail over among all the nodes in a cluster comprising NodeA, NodeB, NodeC, and NodeD. NodeA and NodeB use subnet 15.244.65.0, which is not used by NodeC and NodeD; and NodeC and NodeD use subnet 15.244.56.0, which is not used by NodeA and NodeB. (See “Obtaining Cross-Subnet Information” for sample cmquerycl output). First you need to make sure that pkg1 will fail over to a node on another subnet only if it has to. For example, if it is running on NodeA and needs to fail over, you want it to try NodeB, on the same subnet, before incurring the cross-subnet overhead of failing over to NodeC or NodeD. Assuming nodeA is pkg1’s primary node (where it normally starts), create node_name entries in the package configuration file as follows: node_name nodeA node_name nodeB node_name nodeC node_name nodeD In order to monitor subnet 15.244.65.0 or 15.244.56.0, depending on where pkg1 is running, you would configure monitored_subnet and monitored_subnet_access in pkg1’s package configuration file as follows: monitored_subnet 15.244.65.0 monitored_subnet_access PARTIAL monitored_subnet 15.244.56.0 monitored_subnet_access PARTIAL
Now you need to specify which subnet is configured on which nodes. In our example, you would do this by means of entries such as the following in the package configuration file: ip_subnet 15.244.65.0 ip_subnet_node nodeA ip_subnet_node nodeB ip_subnet 15.244.65.82 ip_address 15.244.65.83 ip_subnet 15.244.56.0 ip_subnet_node nodeC ip_subnet_node nodeD ip_subnet 15.244.56.100 ip_subnet 15.244.56.101 When you are ready to start configuring a package, proceed to Chapter 6 “Configuring Packages and Their Services”; start with “Choosing Package Modules”. (If you find it helpful, you can assemble your package configuration data ahead of time on a separate worksheet for each package; blank worksheets are in Appendix F “Blank Planning Worksheets”.) |
Printable version | ||
|