15.3 DRD by Example

In this section we explore the drdmgr command in more detail and show some examples of the DAIO and served device I/O. We've discussed how I/O is supposed to work in a cluster for a DAIO device versus a served device, so let's prove it with a few examples. Another interesting thing to do here is demonstrate what happens if we create a path failure within the cluster. Sound like fun? A little bit scary? Okay, a lot scary? Well, at least our cluster is not in a production environment, so we can play around without the risk of getting fired – or so we hope.

15.3.1 DAIO Disk on a Shared Bus Examples

Let's look at how the DRD responds to I/O requests to character (/devices/rdisk/*) and block (/devices/disk/*) device special files. We discussed direct I/O and file I/O in chapter 13 so we will not show any examples here. In the following examples, we will use a two-member cluster although these examples could be done on a larger cluster.

15.3.1.1 Character I/O

We will use the dd(1) command to read and write to a spare device on the shared bus. We are not currently using dsk5, and there is no data on the disk that we cannot afford to lose. If you choose to try this exercise, we recommend that you make very, very sure that the device or partition is not in use. If you do not have a disk or partition available to write to, just read from it instead.

We have created two files, MyFile.molari and MyFile.sheridan, which contain the member's host name printed repeatedly within. Each file is between 5 and 8 MB.

Let's verify that the disk has not been recently accessed. We will get the statistics for the device using the drdmgr command.

 [molari] # drdmgr dsk5    View of Data from member molari as of 2001-07-19:16:39:53                       Device Name: dsk5                       Device Type: Direct Access IO Disk                     Device Status: OK                 Number of Servers: 2                       Server Name: molari                      Server State: Server                       Server Name: sheridan                      Server State: Server                Access Member Name: molari               Open Partition Mask: 0      Statistics for Client Member: molari         Number of Read Operations: 0        Number of Write Operations: 0              Number of Bytes Read: 0           Number of Bytes Written: 0

 [molari] # drdmgr -h sheridan dsk5    View of Data from member sheridan as of 2001-07-19:16:39:47                    Device Name: dsk5                    Device Type: Direct Access IO Disk                  Device Status: OK              Number of Servers: 2                    Server Name: molari                   Server State: Server                    Server Name: sheridan                   Server State: Server             Access Member Name: sheridan            Open Partition Mask: 0   Statistics for Client Member: sheridan      Number of Read Operations: 0     Number of Write Operations: 15822           Number of Bytes Read: 0        Number of Bytes Written: 8912310

Notice that the second drdmgr command uses the "-h" switch to get the statistics for member sheridan from member molari. The device is a DAIO device and each member is a server. Also note that each member's access member name is itself.

Since the statistics for sheridan are not zero, we will set them to zero before we begin.

 [molari] # drdmgr -a statistics=0 -h sheridan dsk5

 [molari] # drdmgr -a statistics -h sheridan dsk5    View of Data from member sheridan as of 2001-07-19:16:46:55                      Device Name: dsk5     Statistics for Client Member: sheridan        Number of Read Operations: 0       Number of Write Operations: 0             Number of Bytes Read: 0          Number of Bytes Written: 0

Let's write each member's "MyFile" to the "h" partition of dsk5 using the character device special file. We will write from molari and wait until the write is complete before starting to write from sheridan.

 [molari] # dd if=MyFile.molari of=/dev/rdisk/dsk5h 11718+1 records in 11718+1 records out

 [molari] # drdmgr -a statistics dsk5    View of Data from member molari as of 2001-07-19:16:55:19                        Device Name: dsk5       Statistics for Client Member: molari          Number of Read Operations: 0         Number of Write Operations: 11719               Number of Bytes Read: 0            Number of Bytes Written: 6000000

 [molari] # drdmgr -a statistics -h sheridan dsk5     View of Data from member sheridan as of 2001-07-19:16:55:25                         Device Name: dsk5        Statistics for Client Member: sheridan           Number of Read Operations: 0          Number of Write Operations: 0                Number of Bytes Read: 0             Number of Bytes Written: 0

Notice that the statistics on sheridan are still zero.

 [sheridan] # dd if=MyFile.sheridan of=/dev/rdisk/dsk5h 15625+0 records in 15625+0 records out

 [sheridan] # drdmgr -a statistics dsk5    View of Data from member sheridan as of 2001-07-19:17:00:01                             Device Name: dsk5            Statistics for Client Member: sheridan               Number of Read Operations: 0              Number of Write Operations: 15625                    Number of Bytes Read: 0                 Number of Bytes Written: 8000000

 [sheridan] # drdmgr -a statistics -h molari dsk5    View of Data from member molari as of 2001-07-19:17:00:09                         Device Name: dsk5        Statistics for Client Member: molari           Number of Read Operations: 0          Number of Write Operations: 11719                Number of Bytes Read: 0             Number of Bytes Written: 6000000

Since the device supports DAIO and each member has physical access to the device, each member is able to write to the device independently without passing the data across the cluster interconnect.

15.3.1.2 Block I/O

Performing block I/O (i.e., reading from and writing to the block device special file) is handled at the Cluster File System (CFS) layer prior to being sent to the DRD. Since the CFS is implemented in a client/server fashion, even though the device may support DAIO and be on a shared bus directly accessible by the members performing the I/O, the DRD will not get involved on those members that are CFS clients (unless the Direct I/O or Direct Access Cached Reads are invoked [see chapter 13]). The block device special files are located in the /devices directory hierarchy and are therefore served by the CFS server for the cluster_root domain.

So, let's demonstrate how block I/O is performed in a cluster. We will once again use the dd command, but this time we will use the block device special file for dsk5.

The first thing that we need to do is find out which member is the CFS server for the cluster_root domain. For this we will use the cfsmgr(8) command.

 # cfsmgr -d cluster_root   Domain or filesystem name = cluster_root   Server Name = sheridan   Server Status : OK

The node sheridan is the CFS server.

Let's reset the statistics on both members before we do any I/O.

 [molari] # drdmgr -a statistics=0 dsk5

 [molari] # drdmgr -a statistics=0 –h sheridan dsk5

We will issue the dd command on molari to illustrate that the DRD on sheridan will do the actual work. The request for I/O will go to the CFS server for the root (/) file system. The CFS server will dispatch the I/O request to the physical file system layer. The I/O will then be dispatched to the I/O mapper subsystem and then down to the DRD on the same member. The I/O architecture was illustrated in Figure 15-2.

 [molari] # dd if=MyFile.molari of=/dev/disk/dsk5h 11718+1 records in 11718+1 records out

Notice that we are using the block device special file (/dev/disk/dsk5h) and not the character device special file (/dev/rdisk/dsk5h). Let's see what the DRD statistics for dsk5 tell us from molari.

 [molari] # drdmgr -a statistics dsk5    View of Data from member molari as of 2001-07-19:22:30:12                               Device Name: dsk5              Statistics for Client Member: molari                 Number of Read Operations: 0                Number of Write Operations: 0                      Number of Bytes Read: 0                   Number of Bytes Written: 0

As you can see, the DRD on molari never saw any I/O. The DRD on sheridan, however, was rather busy.

 [molari] # drdmgr -a statistics -h sheridan dsk5   View of Data from member sheridan as of 2001-07-19:22:34:01                           Device Name: dsk5          Statistics for Client Member: sheridan             Number of Read Operations: 2930            Number of Write Operations: 11719                  Number of Bytes Read: 6000640               Number of Bytes Written: 24000512

15.3.2 DAIO Disk on a Private Bus Examples

When any disk is on a private bus, all I/O requests will be funneled to the member with direct physical access to the disk. This also means that if the only member with access to a device is down, then the device will not be accessible.

For example, dsk8 is on a bus that is local only to sheridan. How do we know? The most straightforward way to determine this is to use the hwmgr(8) command as follows:

 [molari] # hwmgr -view devices -dsf dsk8 hwmgr: No such hardware ID or category.

 [molari] # hwmgr -view devices -dsf dsk8 -m sheridan   HWID:      Device Name          Mfg       Model        Hostname      Location ----------------------------------------------------------------------------------------    103:     /dev/disk/dsk8c       COMPAQ    BB009235B6   sheridan      bus-2-targ-1-lun-0

The device does not exist on molari, but it does exist on sheridan. To take this one step further, we can check the bus.

 [molari] # hwmgr -show scsi -bus 2         SCSI               DEVICE     DEVICE     DRIVER    NUM     DEVICE    FIRST  HWID:  DEVICEID HOSTNAME  TYPE       SUBTYPE    OWNER     PATH    FILE      VALID PATH -------------------------------------------------------------------------------------    47:  1        molari    disk       none       0         1       dsk0       [2/0/0]

 [molari] # hwmgr -show scsi -bus 2 -m sheridan         SCSI               DEVICE     DEVICE     DRIVER    NUM     DEVICE    FIRST  HWID:  DEVICEID HOSTNAME  TYPE       SUBTYPE    OWNER     PATH    FILE      VALID PATH -------------------------------------------------------------------------------------   102:  1        sheridan  disk       none       2         1       dsk7      [2/0/0]   103:  2        sheridan  disk       none       2         1       dsk8      [2/1/0]

The fact that each member has a bus-2 is not as interesting as the fact that each member has devices on bus-2 that are different. That storage devices are unique in a cluster proves that this is a private bus. For more information on the hwmgr command, see chapter 7 as well as the hwmgr(8) reference page.

You can use the drdmgr command to check the path to a device, but this will only tell you if there is an active path to a device and not if the device is on a private bus.

Looking at the output from the drdmgr command from molari, you will see that it does not even include statistics, because the DRD on molari for dsk8 is only a client (i.e., it does not have direct physical access to the disk). Notice that the output, as seen from sheridan, includes statistics for both members, but molari's statistics are zero.

 [molari] # drdmgr dsk8    View of Data from member molari as of 2001-07-19:23:00:56                          Device Name: dsk8                          Device Type: Direct Access IO Disk                        Device Status: OK                    Number of Servers: 1                          Server Name: sheridan                         Server State: Server                   Access Member Name: sheridan                  Open Partition Mask: 0x80 < h >

 [molari] # drdmgr -h sheridan dsk8   View of Data from member sheridan as of 2001-07-19:23:33:52                             Device Name: dsk8                             Device Type: Direct Access IO Disk                           Device Status: OK                       Number of Servers: 1                             Server Name: sheridan                            Server State: Server                      Access Member Name: sheridan                     Open Partition Mask: 0x80 < h >            Statistics for Client Member: molari               Number of Read Operations: 0              Number of Write Operations: 0                    Number of Bytes Read: 0                 Number of Bytes Written: 0            Statistics for Client Member: sheridan               Number of Read Operations: 20319              Number of Write Operations: 12819                    Number of Bytes Read: 153012224                 Number of Bytes Written: 56055808

You can see that there has already been quite a bit of I/O on dsk8. This is primarily because of the "h" partition (as indicated in the "Open Partition Mask"). This partition contains the /kits file system (extra#kits).

 # cfsmgr -a devices -d extra **************************************************************              List of Devices Used for extra      Number of Devices = 1          1  -      dsk8h **************************************************************

All I/O to dsk8 goes through the DRD on sheridan.

If we write to the "a" partition on dsk8 (which we know is unused) from molari and then query the DRD statistics, you will see that the statistics on molari will remain zero, yet the statistics for sheridan will increase.

 [molari] # drdmgr -a statistics=0 -h sheridan dsk8

 [molari] # dd if=MyFile.molari of=/dev/disk/dsk8a 11718+1 records in 11718+1 records out

 [molari] # drdmgr -a statistics -h sheridan dsk8   View of Data from member sheridan as of 2001-07-19:23:50:25                              Device Name: dsk8             Statistics for Client Member: molari                Number of Read Operations: 0               Number of Write Operations: 0                     Number of Bytes Read: 0                  Number of Bytes Written: 0             Statistics for Client Member: sheridan                Number of Read Operations: 2933               Number of Write Operations: 11719                     Number of Bytes Read: 6006784                  Number of Bytes Written: 24000512

15.3.3 Served Device Examples

When a device is a served device, only one server can access it. If the served device is on a shared bus, any member connected to the bus can be the server for the device, but only one member will be the server at any point in time. We illustrated this back in Figure 15-9. In this section, we'll show how the I/O flows when accessing a served tape device.

On our two-member cluster, we have a tape device on a shared bus. As we stated in section 15.1.2.2, a tape device is not a DAIO device. Before showing how the I/O flows through the DRD for our served tape device, let's see which member is currently serving it.

 [sheridan] # drdmgr tape0   View of Data from member sheridan as of 2001-07-24:14:34:56                             Device Name: tape0                             Device Type: Served Tape                           Device Status: OK                       Number of Servers: 2                             Server Name: sheridan                            Server State: Not Server                             Server Name: molari                            Server State: Server                      Access Member Name: molari                     Open Partition Mask: 0            Statistics for Client Member: sheridan               Number of Read Operations: 0              Number of Write Operations: 0                    Number of Bytes Read: 0                 Number of Bytes Written: 0            Statistics for Client Member: molari               Number of Read Operations: 0              Number of Write Operations: 0                    Number of Bytes Read: 0                 Number of Bytes Written: 0

 [sheridan] # drdmgr -h molari tape0    View of Data from member molari as of 2001-07-24:14:35:24                          Device Name: tape0                          Device Type: Served Tape                        Device Status: OK                    Number of Servers: 2                          Server Name: sheridan                         Server State: Not Server                          Server Name: molari                         Server State: Server                   Access Member Name: molari                  Open Partition Mask: 0         Statistics for Client Member: sheridan            Number of Read Operations: 0           Number of Write Operations: 0                 Number of Bytes Read: 0              Number of Bytes Written: 0

As we can see from the output above, molari is the server. Since the statistics on the device are currently zero for both members, we will not need to reset the statistics before starting our example. If you're playing along with the home version (i.e., you are doing these examples on your cluster), however, your device might have counters greater than zero, so you may want to reset them before continuing. Okay, let's do some I/O.

 [sheridan] # tar -cvf /dev/tape/tape0 ./someCool.txtFile a ./someCool.txtFile 50 Blocks

 [sheridan] # tar -tvf /dev/tape/tape0 blocksize = 20 -rw------- 0/0 25157 Jul 24 14:13:41 2001 ./someCool.txtFile

Since molari is the server and sheridan is where we ran the tar(1) command, if everything works the way we would expect, then the I/O counters should increase on molari, but not on sheridan.

 [sheridan] # drdmgr -a statistics tape0   View of Data from member sheridan as of 2001-07-24:14:38:20                        Device Name: tape0       Statistics for Client Member: sheridan          Number of Read Operations: 0         Number of Write Operations: 0               Number of Bytes Read: 0            Number of Bytes Written: 0       Statistics for Client Member: molari          Number of Read Operations: 0         Number of Write Operations: 0               Number of Bytes Read: 0            Number of Bytes Written: 0

 [sheridan] # drdmgr -h molari -a statistics tape0   View of Data from member molari as of 2001-07-24:14:38:50                          Device Name: tape0         Statistics for Client Member: sheridan            Number of Read Operations: 3           Number of Write Operations: 3                 Number of Bytes Read: 151552              Number of Bytes Written: 30720

Let's change servers and verify that the change succeeded.

 [sheridan] # drdmgr -a server=sheridan tape0

 [sheridan] # drdmgr -a server tape0   View of Data from member sheridan as of 2001-07-24:14:39:48                      Device Name: tape0                      Device Type: Served Tape                    Device Status: OK                Number of Servers: 2                     Server Name: sheridan                    Server State: Server                     Server Name: molari                    Server State: Not Server

Now that sheridan is acting as the server, let's reset the statistics and do some I/O from molari and sheridan this time.

 [molari] # drdmgr -a statistics=0 tape4

 [molari] # drdmgr -h sheridan -a statistics=0 tape4

 [molari] # tar -cvf /dev/tape/tape0 ./someCool.txtFile a ./someCool.txtFile 50 Blocks

 [molari] # tar -tvf /dev/tape/tape0 blocksize = 20 -rw------     0/0 25157 Jul 24 14:13:41 2001 ./someCool.txtFile

 [sheridan] # tar -cvf /dev/tape/tape0 ./someCool.txtFile a ./someCool.txtFile 50 Blocks

 [sheridan] # tar -tvf /dev/tape/tape0 blocksize = 20 -rw------      0/0 25157 Jul 24 14:13:41 2001 ./someCool.txtFile

The I/O should now be tallied on sheridan this time.

 [sheridan] # drdmgr -a statistics tape0   View of Data from member sheridan as of 2001-07-24:14:44:23                                 Device Name: tape0                Statistics for Client Member: sheridan                   Number of Read Operations: 3                  Number of Write Operations: 3                        Number of Bytes Read: 151552                     Number of Bytes Written: 30720                Statistics for Client Member: molari                   Number of Read Operations: 3                  Number of Write Operations: 3                        Number of Bytes Read: 151552                     Number of Bytes Written: 30720

 [sheridan] # drdmgr -h molari -a statistics tape0   View of Data from member molari as of 2001-07-24:14:44:47                              Device Name: tape0             Statistics for Client Member: sheridan                Number of Read Operations: 0               Number of Write Operations: 0                     Number of Bytes Read: 0                  Number of Bytes Written: 0

15.3.4 DRD Path Failure Response

What happens if sheridan loses its direct physical access to a device? Let's find out.

In this example, we are going to simulate a path failure by pulling the SCSI cable out from one of sheridan's SCSI adapters. Specifically, we will pull the cable connected to the shared bus. "The shared bus? No, not the shared bus! You can't pull the cable on the shared bus because sheridan's boot_partition is on the shared bus! The cluster-common partitions are on the shared bus!" you exclaim. As we stated earlier in the chapter, our cluster is a test cluster, so we have nothing to lose. Will sheridan crash? Will sheridan hang? Let's find out but first let's see which disks are on the shared bus.

 [sheridan] # hwmgr -show scsi -bus 3            SCSI                 DEVICE    DEVICE   DRIVER NUM   DEVICE    FIRST   HWID:    DEVICEID  HOSTNAME   TYPE      SUBTYPE  OWNER  PATH  FILE      VALID PATH ---------------------------------------------------------------------------------     50:    3         sheridan   disk      none     2      1     dsk1      [3/0/0]     51:    4         sheridan   disk      none     2      1     dsk2      [3/1/0]     52:    5         sheridan   disk      none     2      1     dsk3      [3/2/0]     53:    6         sheridan   disk      none     2      1     dsk4      [3/3/0]     54:    7         sheridan   disk      none     2      1     dsk5      [3/4/0]     55:    8         sheridan   disk      none     2      1     dsk6      [3/5/0]

The shared bus for both sheridan and molari is bus 3, and there are six disks (dsk1-dsk6) on the bus. Using the cfs script that we introduced in chapter 13, we'll see which file systems are using which disks so that we can watch what happens when we pull the cable. You can also obtain this information by using the "cfsmgr -a devices" command, which we previously demonstrated in section 15.3.2.

 [sheridan] # cfs -s | grep dsk / [cluster_root#root] (dsk1a): /usr [cluster_usr#usr] (dsk1g): /var [cluster_var#var] (dsk1h): /kits [extra#kits] (dsk8h): /u1 [home#u1] (dsk7h): /cluster/members/member1/boot_partition [root1_domain#root] (dsk2a): /cluster/members/member2/boot_partition [root2_domain#root] (dsk3a): /fafrak [tcrhb#fafrak] (dsk6c):

Next, let's see which file systems sheridan is serving:

 [sheridan] # cfs -h sheridan CFS Server          Mount Point                  File System         FS Type -----------------   --------------------------   ------------------- ------- sheridan            /                            cluster_root#root   AdvFS sheridan            /usr                         cluster_usr#usr     AdvFS sheridan            /var                         cluster_var#var     AdvFS sheridan            /kits                        extra#kits          AdvFS sheridan            /u1                          home#u1             AdvFS sheridan            /cluster/members/member2/    root2_domain#root   AdvFS                        boot_partition

From the output of the cfs script, it appears that sheridan is the CFS server for the cluster-common file systems and its own boot_partition.

In our cluster configuration, dsk1 is the disk that holds cluster_root, cluster_usr, and cluster_var while dsk3 holds sheridan's boot_partition. So, let's see what information the DRD has for dsk1 and dsk3.

 [sheridan] # drdmgr -a server -a accessnode dsk1 dsk3   View of Data from member sheridan as of 2001-07-23:23:53:46                          Device Name: dsk1                          Device Type: Direct Access IO Disk                        Device Status: OK                    Number of Servers: 2                          Server Name: molari                         Server State: Server                          Server Name: sheridan                         Server State: Server                   Access Member Name: sheridan                          Device Name: dsk3                          Device Type: Direct Access IO Disk                        Device Status: OK                    Number of Servers: 2                          Server Name: molari                         Server State: Server                          Server Name: sheridan                         Server State: Server                   Access Member Name: sheridan

Now that we know which disks are on the shared bus, which file systems are using those disks, which file systems sheridan is acting as CFS server for, and how the DRD is configured on sheridan, let's get ready to pull the plug (or cable in this case).

On molari and sheridan, let's monitor EVM for events so that we can see what the cluster sees when the cable is disconnected. We'll use the following commands:

 # export EVM_SHOW_TEMPLATE="@member_id [@priority] @name" # evmwatch -A -f "[name *.scsi]|[name *.drd]|[name *.cfs]"

We pulled the plug; let's see what happened. We will look at the output from molari:

 [molari] # export EVM_SHOW_TEMPLATE="@member_id [@priority] @name" [molari] # evmwatch -A -f "[name *.scsi]|[name *.drd]|[name *.cfs]" 2 [200] sys.unix.clu.drd.server_leave._hwid.50 2 [200] sys.unix.clu.drd.server_leave._hwid.53 2 [200] sys.unix.clu.drd.server_add._hwid.50 2 [200] sys.unix.clu.drd.new_accessnode._hwid.50 2 [200] sys.unix.clu.drd.new_accessnode._hwid.50 2 [200] sys.unix.clu.drd.server_leave._hwid.52 2 [200] sys.unix.clu.drd.server_add._hwid.53 2 [200] sys.unix.clu.drd.new_accessnode._hwid.53 2 [200] sys.unix.clu.drd.new_accessnode._hwid.53 2 [200] sys.unix.clu.drd.server_add._hwid.52 2 [200] sys.unix.clu.drd.new_accessnode._hwid.52 2 [200] sys.unix.clu.drd.new_accessnode._hwid.52

It looks like the DRD on sheridan (member2) detected a problem. We only see a small number of events on molari as compared to what was seen from sheridan. "Do you mean that sheridan did not hang or crash?" you ask. That's correct, sheridan is still running. In fact, here is what sheridan saw when we pulled the cable:

 [sheridan] # export EVM_SHOW_TEMPLATE="@member_id [@priority] @name" [sheridan] # evmwatch -A -f "[name *.scsi]|[name *.drd]|[name *.cfs]" 2 [200] sys.unix.clu.drd.server_leave._hwid.50 2 [200] sys.unix.clu.drd.server_leave._hwid.53 2 [200] sys.unix.clu.drd.server_add._hwid.50 2 [200] sys.unix.clu.drd.new_accessnode._hwid.50 2 [200] sys.unix.clu.drd.new_accessnode._hwid.50 2 [200] sys.unix.clu.drd.server_leave._hwid.52 2 [200] sys.unix.binlog.hw.scsi._hwid.50 ... 2 [700] sys.unix.binlog.hw.scsi 2 [400] sys.unix.binlog.hw.scsi ... 2 [400] sys.unix.binlog.hw.scsi._hwid.51 2 [200] sys.unix.binlog.hw.scsi._hwid.51 2 [400] sys.unix.binlog.hw.scsi._hwid.52 2 [200] sys.unix.binlog.hw.scsi._hwid.52 2 [200] sys.unix.binlog.hw.scsi._hwid.53 2 [400] sys.unix.binlog.hw.scsi._hwid.54 2 [200] sys.unix.binlog.hw.scsi._hwid.54 2 [400] sys.unix.binlog.hw.scsi._hwid.55 2 [200] sys.unix.binlog.hw.scsi._hwid.55 ... 2 [200] sys.unix.clu.drd.server_add._hwid.53 2 [200] sys.unix.clu.drd.new_accessnode._hwid.53 2 [200] sys.unix.clu.drd.new_accessnode._hwid.53 2 [200] sys.unix.binlog.hw.scsi._hwid.50 2 [200] sys.unix.clu.drd.server_add._hwid.52 2 [200] sys.unix.clu.drd.new_accessnode._hwid.52 2 [200] sys.unix.clu.drd.new_accessnode._hwid.52 2 [200] sys.unix.binlog.hw.scsi._hwid.51 ...

We truncated the output to save a page or two, but as you can see, in addition to DRD events, sheridan also saw SCSI Hardware events. This is not a big surprise considering it can no longer see several disks.

How is it that sheridan is still running? Let's find out. Using the hwmgr command, let's see which devices we can still see:

 [sheridan] # hwmgr -show scsi -bus 3          SCSI              DEVICE  DEVICE   DRIVER  NUM   DEVICE  FIRST  HWID:  DEVICEID HOSTNAME  TYPE    SUBTYPE  OWNER   PATH  FILE    VALID PATH ----------------------------------------------------------------------------    50:  3        sheridan  disk    none     0       1     dsk1    51:  4        sheridan  disk    none     2       1     dsk2    52:  5        sheridan  disk    none     0       1     dsk3    53:  6        sheridan  disk    none     0       1     dsk4    54:  7        sheridan  disk    none     2       1     dsk5    55:  8        sheridan  disk    none     2       1     dsk6

If you compare this output to the output from the hwmgr command we received before disconnecting the cable, you can see that we no longer have any valid paths to the devices.

What does the DRD on sheridan see? We can use the same drdmgr command we used before to find out:

 [sheridan] # drdmgr -a server -a accessnode dsk1 dsk3   View of Data from member sheridan as of 2001-07-24:00:27:02                     Device Name: dsk1                     Device Type: Direct Access IO Disk                   Device Status: OK               Number of Servers: 1                     Server Name: molari                    Server State: Server              Access Member Name: molari                     Device Name: dsk3                     Device Type: Direct Access IO Disk                   Device Status: OK               Number of Servers: 1                     Server Name: molari                    Server State: Server              Access Member Name: molari

You can see by the output above that molari is now serving the data to sheridan.

Okay, now for the real tough question, "Which member is the CFS server for cluster_root, cluster_usr, cluster_var, and sheridan's boot_partition?" Well, we did not see any CFS events. Can sheridan still serve the file systems despite no longer having direct access to the devices? You bet.

 [sheridan] # cfs CFS Server          Mount Point                File System            FS Type ------------------ --------------------------- ---------------------- ------- molari             /cluster/members/member1/   root1_domain#root      AdvFS                      boot_partition molari             /fafrak                     tcrhb#fafrak           AdvFS sheridan           /                           cluster_root#root      AdvFS sheridan           /usr                        cluster_usr#usr        AdvFS sheridan           /var                        cluster_var#var        AdvFS sheridan           /kits                       extra#kits             AdvFS sheridan           /u1                         home#u1                AdvFS sheridan           /cluster/members/member2/   root2_domain#root      AdvFS                       boot_partition

How can this be? Well, the CFS server sends I/O requests to the DRD. The CFS never sees an error because the DRD handled the problem by automatically rerouting where it dispatched the request. The DRD sensed that it could no longer use the access member it was using, so it got a new access member from the list of servers for the device.

After we reattached the cable, the DRD almost immediately noticed the devices again. Checking the path to one of the devices, we see:

 [sheridan] # drdmgr -a check_path dsk1    View of Data from member sheridan as of 2001-07-24:00:42:42                      Device Name: dsk1                Local Device Path: Exists

The path has returned. The DRD on sheridan is also once again serving the disks locally.

 [sheridan] # drdmgr -a server -a accessnode dsk1 dsk3   View of Data from member sheridan as of 2001-07-24:00:42:53                          Device Name: dsk1                          Device Type: Direct Access IO Disk                        Device Status: OK                    Number of Servers: 2                          Server Name: molari                         Server State: Server                          Server Name: sheridan                         Server State: Server                   Access Member Name: sheridan

                           Device Name: dsk3                           Device Type: Direct Access IO Disk                         Device Status: OK                     Number of Servers: 2                           Server Name: molari                          Server State: Server                           Server Name: sheridan                          Server State: Server                    Access Member Name: sheridan

All we did was reattach the cable – the DRD did the rest, automatically and transparently. This is incredible stuff!