The CPU Monitor (a part of
the diagnostic tool Event Monitor Services (EMS) and not a part of
the vPars Monitor) is designed to monitor cache parity errors within
the CPUs on the system. With its Dynamic Processor Resilience (DPR),
if the CPU Monitor detects a pre-determined number of errors, the
CPU Monitor will deactivate a CPU for the current boot session. If
the problems are severe enough, the CPU Monitor will deconfigure the
socket for the next boot of the system.
Deactivation of a CPU means
that the OS will attempt to no longer use the CPU by migrating all
threads off the CPU. Deactivation of a CPU is not persistent across an OS or system reboot.
Deconfiguration of a socket
means that the EMS issues a firmware call, marking the socket for
deconfiguration on the next system boot. On the next system boot,
none of the cores in the target socket are visible to either the OS
in standalone mode or the OS instances of the virtual partitions.
The deconfiguration is persistent across system boots.
Note here two items:
a deactivation of a CPU does not mean a deconfiguration
of its socket. The CPU Monitor is able to determine whether the CPU
needs to be deactivated or whether it needs to take further action
and deconfigure the socket.
reboot of a virtual partition is not the same as a
reboot of the system (the entire box or nPartition).
The exceptions to the deactivation of CPUs are
the boot processor of each OS instance (the boot processor has a logical
instance of zero) and the last CPU in a cell or nPartition. The exception
to the deconfiguration of sockets is that the last remaining socket
will not be deconfigured (otherwise, the system could not boot).
If any spare iCAP (formerly known as iCOD) or
PPU CPUs are available, the necessary number of CPUs will be activated
to replace the CPUs deactivated.