In order to provide a high level of availability, a typical
cluster uses redundant system components, for example two or more
SPUs and two or more independent disks. This redundancy eliminates
single points of failure. In general, the more redundancy, the greater
your access to applications, data, and supportive services in the
event of a failure.
In addition to hardware redundancy, you must have the software
support which enables and controls the transfer of your applications
to another SPU or network after a failure. Serviceguard provides
this support as follows:
In the case of LAN failure, Serviceguard switches to a standby
LAN or moves affected packages to a standby node.
In the case of SPU failure, your application is
transferred from a failed SPU to a functioning SPU automatically
and in a minimal amount of time.
For failure of other monitored resources, such as
disk interfaces, a package can be moved to another node.
For software failures, an application can be restarted
on the same node or another node with minimum disruption.
Serviceguard also gives you the advantage of easily transferring
control of your application to another SPU in order to bring the
original SPU down for system administration, maintenance, or version
upgrades.
The
current maximum number of nodes supported in a Serviceguard cluster
is 16. SCSI disks or disk arrays can be connected to a maximum of
4 nodes at a time on a shared (multi-initiator) bus. Disk arrays
using fibre channel and those that do not use a shared bus — such
as the HP StorageWorks XP Series and the EMC Symmetrix — can
be simultaneously connected to all 16 nodes.
The guidelines for package failover depend on the type of
disk technology in the cluster. For example, a package that accesses
data on a SCSI disk or disk array can failover to a maximum of 4
nodes. A package that accesses data from a disk in a cluster using
Fibre Channel or HP StorageWorks XP or EMC Symmetrix disk technology
can be configured for failover among 16 nodes.
Note that a package that does not access
data from a disk on a shared bus can be configured to fail over
to as many nodes as you have configured in the cluster (regardless
of disk technology). For instance, if a package only runs local
executables, it can be configured to failover to all nodes in the
cluster that have local copies of those executables, regardless
of the type of disk connectivity.