United States-English |
|
|
HP-UX System Administrator's Guide: Logical Volume Management: HP-UX 11i Version 3 > Chapter 4 Troubleshooting LVMI/O Errors |
|
When a device driver returns an error to LVM on an I/O request, LVM classifies the error as either recoverable or nonrecoverable. When LVM encounters a recoverable (correctable) error, it internally retries the failed operation assuming that the error will correct itself or that you can take steps to correct it. Examples of recoverable errors are the following:
In these cases, LVM logs an error message to the console, but it does not return an error to the application accessing the logical volume. If you have a current copy of the data on a separate, functioning mirror, then LVM directs the I/O to a mirror copy, the same as for a nonrecoverable error. Applications accessing the logical volume do not detect any error. (To preserve data synchronization between its mirrors, LVM retries recoverable write requests to a problematic disk, even if a current copy exists elsewhere. However, this process is managed by a daemon internal to LVM and has no impact on user access to the logical volume.) However, if the device in question holds the only copy of the data, LVM retries the I/O request until it succeeds—that is, until the device responds or the system is rebooted. Any application performing I/O to the logical volume might block, waiting for the device to recover. In this case, your application or file system might appear to be stalled and might be unresponsive. By default, LVM retries I/O requests with recoverable errors until they succeed or the system is rebooted. Therefore, if an application or file system stalls, your troubleshooting must include checking the console log for problems with your disk drives and taking action to restore the failing devices to service. If retrying the I/O request never succeeds (for example, the disk was physically removed), your application or file system might block indefinitely. If your application is not responding, you might need to reboot your system. As an alternative to rebooting, you can control how long LVM retries a recoverable error before treating it as nonrecoverable by setting a timeout on the logical volume. If the device fails to respond within that time, LVM returns an I/O error to the caller. This timeout value is subject to any underlying physical volume timeout and driver timeout, so LVM can return the I/O error seconds after the logical volume timeout expired. The timeout value is normally zero, which is interpreted as an infinite timeout. Thus, no I/O request returns to the caller until it completes successfully. View the timeout value for a logical volume using the lvdisplay command, as follows:
Set the timeout value using the -t option of the lvchange command. This sets the timeout value in seconds for a logical volume. For example, to set the timeout for /dev/vg01/lvol1 to one minute, enter the following command:
Nonrecoverable errors are considered fatal; there is no expectation that retrying the operation will work. If you have a current copy of the data on a separate, functioning mirror, then LVM directs reads and writes to that mirror copy. The I/O operation for the application accessing the logical volume completes successfully. However, if you have no other copies of the data, then LVM returns an error to the subsystem accessing the logical volume. Thus, any application directly accessing a logical volume must be prepared for I/O requests to fail. File systems such as VxFS and most database applications are designed to recover from error situations; for example, if VxFS encounters an I/O error, it might disable access to a file system or a subset of the files in it. LVM considers the following two situations nonrecoverable. If an I/O request fails because of a media error, LVM typically prints a message to the console log file (/var/adm/syslog/syslog.log) when the error occurs. In the event of a media error, you must replace the disk (see “Replacing a Bad Disk”). If your disk hardware supports automatic bad block relocation (usually known as hardware sparing), enable it, because it minimizes media errors seen by LVM. |
Printable version | ||
|