 |
» |
|
|
 |
Planned downtime (as opposed to unplanned downtime) is scheduled; examples
include backups, systems upgrades to new operating system revisions,
or hardware replacements. For planned downtime, application designers
should consider: Reducing the time needed
for application upgrades/patches. Can an administrator install a new version of the application without
scheduling downtime? Can different revisions of an application operate
within a system? Can different revisions of a client and server
operate within a system? Providing for online application reconfiguration. Can the configuration information used by the application
be changed without bringing down the application? Documenting maintenance operations. Does an operator know how to handle maintenance operations?
When discussing highly available systems, unplanned failures
are often the main point of discussion. However, if it takes 2 weeks
to upgrade a system to a new revision of software, there are bound
to be a large number of complaints. The following sections discuss ways of handling the different
types of planned downtime. Reducing
Time Needed for Application Upgrades and Patches |  |
Once a year or so, a new revision of an application is released.
How long does it take for the end-user to upgrade to this new revision?
This answer is the amount of planned downtime a user must take to
upgrade their application. The following guidelines reduce this
time. Provide
for Rolling Upgrades Provide for a “rolling upgrade” in a client/server
environment. For a system with many components, the typical scenario
is to bring down the entire system, upgrade every node to the new
version of the software, and then restart the application on all
the affected nodes. For large systems, this could result in a long
downtime. An alternative is to provide for a rolling upgrade. A rolling
upgrade rolls out the new software in a phased approach by upgrading
only one component at a time. For example, the database server is
upgraded on Monday, causing a 15 minute downtime. Then on Tuesday,
the application server on two of the nodes is upgraded, which leaves
the application servers on the remaining nodes online and causes
no downtime. On Wednesday, two more application servers are upgraded, and
so on. With this approach, you avoid the problem where everything changes
at once, plus you minimize long outages. The trade-off is that the application software must operate
with different revisions of the software. In the above example,
the database server might be at revision 5.0 while the some of the
application servers are at revision 4.0. The application must be
designed to handle this type of situation. For more information about the rolling upgrades, see Appendix E “Software
Upgrades ”, and the Release Notes
for your version of Serviceguard at http://docs.hp.com -> High Availability. Do
Not Change the Data Layout Between Releases Migration of the data to a new format can be very time intensive.
It also almost guarantees that rolling upgrade will not be possible.
For example, if a database is running on the first node, ideally,
the second node could be upgraded to the new revision of the database.
When that upgrade is completed, a brief downtime could be scheduled
to move the database server from the first node to the newly upgraded
second node. The database server would then be restarted, while
the first node is idle and ready to be upgraded itself. However,
if the new database revision requires a different database layout,
the old data will not be readable by the newly
updated database. The downtime will be longer as the data is migrated
to the new layout. Providing
Online Application Reconfiguration |  |
Most applications have some sort of configuration information
that is read when the application is started. If to make a change
to the configuration, the application must be halted and a new configuration file
read, downtime is incurred. To avoid this downtime use configuration tools that interact
with an application and make dynamic changes online. The ideal solution
is to have a configuration tool which interacts with the application.
Changes are made online with little or no interruption to the end-user.
This tool must be able to do everything online, such as expanding
the size of the data, adding new users to the system, adding new
users to the application, etc. Every task that an administrator
needs to do to the application system can be made available online. Documenting
Maintenance Operations |  |
Standard procedures are important. An application designer
should make every effort to make tasks common for both the highly
available environment and the normal environment. If an administrator
is accustomed to bringing down the entire system after a failure,
he or she will continue to do so even if the application has been
redesigned to handle a single failure. It is important that application
documentation discuss alternatives with regards to high availability
for typical maintenance operations.
|