I/O Errors

When a device driver returns an error to LVM on an I/O request, LVM classifies the error as either recoverable or nonrecoverable.

Recoverable Errors

When LVM encounters a recoverable (correctable) error, it internally retries the failed operation assuming that the error will correct itself or that you can take steps to correct it. Examples of recoverable errors are the following:

Device power failure
A disk that goes missing after the volume group is activated
A loose disk cable (which looks like a missing disk)

In these cases, LVM logs an error message to the console, but it does not return an error to the application accessing the logical volume.

If you have a current copy of the data on a separate, functioning mirror, then LVM directs the I/O to a mirror copy, the same as for a nonrecoverable error. Applications accessing the logical volume do not detect any error. (To preserve data synchronization between its mirrors, LVM retries recoverable write requests to a problematic disk, even if a current copy exists elsewhere. However, this process is managed by a daemon internal to LVM and has no impact on user access to the logical volume.)

However, if the device in question holds the only copy of the data, LVM retries the I/O request until it succeeds—that is, until the device responds or the system is rebooted. Any application performing I/O to the logical volume might block, waiting for the device to recover. In this case, your application or file system might appear to be stalled and might be unresponsive.

Temporarily Unavailable Device

By default, LVM retries I/O requests with recoverable errors until they succeed or the system is rebooted. Therefore, if an application or file system stalls, your troubleshooting must include checking the console log for problems with your disk drives and taking action to restore the failing devices to service.

Permanently Unavailable Device

If retrying the I/O request never succeeds (for example, the disk was physically removed), your application or file system might block indefinitely. If your application is not responding, you might need to reboot your system.

As an alternative to rebooting, you can control how long LVM retries a recoverable error before treating it as nonrecoverable by setting a timeout on the logical volume. If the device fails to respond within that time, LVM returns an I/O error to the caller. This timeout value is subject to any underlying physical volume timeout and driver timeout, so LVM can return the I/O error seconds after the logical volume timeout expired.

The timeout value is normally zero, which is interpreted as an infinite timeout. Thus, no I/O request returns to the caller until it completes successfully.

View the timeout value for a logical volume using the lvdisplay command, as follows:

# lvdisplay /dev/vg00/lvol1 | grep Timeout
IO Timeout (Seconds)        default

Set the timeout value using the -t option of the lvchange command. This sets the timeout value in seconds for a logical volume. For example, to set the timeout for /dev/vg01/lvol1 to one minute, enter the following command:

# lvchange -t 60 /dev/vg01/lvol1




	CAUTION: Setting a timeout on a logical volume increases the likelihood of transient errors being treated as nonrecoverable errors, so any application that reads or writes to the logical volume can experience I/O errors. If your application is not prepared to handle such errors, keep the default infinite logical volume timeout.




	TIP: Set the logical volume timeout to an integral multiple of any timeout assigned to the underlying physical volumes. Otherwise, the actual duration of the I/O request can exceed the logical volume timeout. For details on how to change the I/O timeout value on a physical volume, see pvchange(1M).

Nonrecoverable Errors

Nonrecoverable errors are considered fatal; there is no expectation that retrying the operation will work.

If you have a current copy of the data on a separate, functioning mirror, then LVM directs reads and writes to that mirror copy. The I/O operation for the application accessing the logical volume completes successfully.

However, if you have no other copies of the data, then LVM returns an error to the subsystem accessing the logical volume. Thus, any application directly accessing a logical volume must be prepared for I/O requests to fail. File systems such as VxFS and most database applications are designed to recover from error situations; for example, if VxFS encounters an I/O error, it might disable access to a file system or a subset of the files in it.

LVM considers the following two situations nonrecoverable.

Media Errors

If an I/O request fails because of a media error, LVM typically prints a message to the console log file (/var/adm/syslog/syslog.log) when the error occurs. In the event of a media error, you must replace the disk (see “Replacing a Bad Disk”).

If your disk hardware supports automatic bad block relocation (usually known as hardware sparing), enable it, because it minimizes media errors seen by LVM.




	NOTE: LVM does not perform software relocation of bad blocks. It recognizes and honors software relocation entries created by previous releases of LVM but does not create new ones. Enabling or disabling bad block relocation using lvchange has no effect.

Missing Device When the Volume Group Was Activated

If the device associated with the I/O was not present when the volume group was activated, LVM prints an error message to the user's terminal at activation time. You must either locate the disk and restore it to service, or replace it, then activate the volume group again.

I/O Errors

Technical documentation

» Table of Contents