As the technology sector becomes increasingly focused on redundancy, many systems managers achieve redundancy by multipathing: using separate Fibre Channel (FC) adapters connected to different switches. If you're already multipathing in your shop, you can tweak some tuning parameters to improve I/O performance and redundancy.
Multipathing lets you choose a single disk or SAN Logical Unit Number (LUN) to partition through more than one path, which provides redundancy through multiple dedicated FC adapter ports. The adapters might be either dedicated to a logical partition or assigned to a pair of Virtual I/O Servers (VIOSs), allowing an administrator to reboot or upgrade one VIOS without losing connectivity to the SAN for the client partitions.
Different options exist for multipathing software, including the AIX-native MPIO, SDDPCM, or proprietary software options such as HDLM and PowerPath.
Fibre Channel adapter firmware
The first step in improving your multipath performance is to make sure your FC adapter firmware is current. I've seen adapters that weren’t visible to an FC switch until their firmware was updated. The latest firmware might also include some big fixes and valuable performance enhancements. You can check for updates for supported adapters at IBM Fix Central. In most cases, firmware updates can run concurrently without downtime or user impact, but check your firmware description files for the adapters in your particular environment.
Ordinarily, adapter firmware is updated using the diag command on AIX: Log in to the VIOS restricted shell as padmin and use the diagmenu command. Remember, the firmware is set for each adapter port, so, for example, a dual-port adapter requires firmware updates on both fcs0 and fcs1. You might also need to change some attributes from the default settings on your FC adapters. For each FC port, there are two devices with tunable attributes: an adapter driver device (e.g., fcs0) and an FC SCSI I/O protocol device (e.g., fscsi0).
Fast I/O Failure
If your FC adapters are cabled to a switch and you have more than one path to the storage, you can enable fast I/O failure. If you’ve never seen this before, you might wonder why anyone would ever want their I/O to fail (and to fail quickly). But this parameter is actually a quick response to an I/O failure: If a path goes down -- perhaps due to a switch or cable failure -- the adapter sends a message to quickly fail over to alternate paths.
The fast_fail attribute is set on the fscsi device as:
If your adapter already has active devices connected to it, you need to add the -P flag to the command to make this change permanent rather than immediate. It will take effect the next time you reboot the partition.
If you're configuring fast_fail on the VIOS, and you're logged into the restricted shell as padmin, you'll need to use the VIOS CL syntax:
Again, if the device already has some child devices (such as hdisks), you'll need to make it a permanent change. On the VIOS command line, that means adding a -perm flag to the command and rebooting.
In earlier versions of AIX, if SAN settings changed -- for example, if a cable got moved to a different switch port or a switch was rebooted -- you had to unconfigure the storage device and adapter device first. So, if you worked with hundreds of LUNs, a switch reboot became a major exercise. Fortunately, AIX now supports dynamic tracking of FC devices, so if an FC device changes its N_Port ID (think of a SCSI ID for a disk), then the fscsi device picks up the change without an administrator having to intervene.
The command to enable dynamic tracking is similar to the fast_fail command:
Remember to add the -P flag if your fscsi device is in use, then wait for the next reboot. On the VIOS, the command appears as:
Don't forget the -perm flag if you need to postpone dynamic tracking until the next VIOS reboot.
Adapter Queue Depth
The number of simultaneous I/Os that can be sent to a disk is set using the queue_depth parameter. (See my article on I/O bottlenecks, "Open the I/O Turnstiles with Queue Depth," for more details on this process.) FC adapters have a similar queue depth parameter, called num_cmd_elems. The default value is 200, but you can ramp it up if you have the memory and both your storage subsystem and adapter can support the higher number of I/Os.
Here's the command to increase the maximum number of commands to queue to adapter fcs0:
On the VIOS, it reads:
Don't forget to set the -P (or -perm) flag if the adapter is in use. Once again, this will take effect after the next reboot.
Another attribute worth looking at is the max_xfer_size. This controls the maximum I/O size the adapter device can handle, as well as the memory area used for data transfers. The default setting is 16MB (shown in the command lsattr -El fcs0 as max_xfer_size=0x100000) but can be increased (e.g., to 128MB). The fcstat command shows whether the num_cmd_elems or max_xfer_size can help with performance bottlenecks.
Dive Into Multipathing
If you're keen on finding out more about the tuning parameters covered here, I recommend you read Dan Braden's IBM white paper on performance-tuning AIX disk queue depth. The white paper discusses how to design appropriate paths for storage when you're using multipath I/O, and it takes into account SAN zoning and LUN masking. It's a key document for AIX, storage, and SAN administrators, and it also covers multipathing from a VIO perspective.