In this section we explore the drdmgr command in more detail and show some examples of the DAIO and served device I/O. We've discussed how I/O is supposed to work in a cluster for a DAIO device versus a served device, so let's prove it with a few examples. Another interesting thing to do here is demonstrate what happens if we create a path failure within the cluster. Sound like fun? A little bit scary? Okay, a lot scary? Well, at least our cluster is not in a production environment, so we can play around without the risk of getting fired – or so we hope.
Let's look at how the DRD responds to I/O requests to character (/devices/rdisk/*) and block (/devices/disk/*) device special files. We discussed direct I/O and file I/O in chapter 13 so we will not show any examples here. In the following examples, we will use a two-member cluster although these examples could be done on a larger cluster.
We will use the dd(1) command to read and write to a spare device on the shared bus. We are not currently using dsk5, and there is no data on the disk that we cannot afford to lose. If you choose to try this exercise, we recommend that you make very, very sure that the device or partition is not in use. If you do not have a disk or partition available to write to, just read from it instead.
We have created two files, MyFile.molari and MyFile.sheridan, which contain the member's host name printed repeatedly within. Each file is between 5 and 8 MB.
Let's verify that the disk has not been recently accessed. We will get the statistics for the device using the drdmgr command.
[molari] # drdmgr dsk5 View of Data from member molari as of 2001-07-19:16:39:53 Device Name: dsk5 Device Type: Direct Access IO Disk Device Status: OK Number of Servers: 2 Server Name: molari Server State: Server Server Name: sheridan Server State: Server Access Member Name: molari Open Partition Mask: 0 Statistics for Client Member: molari Number of Read Operations: 0 Number of Write Operations: 0 Number of Bytes Read: 0 Number of Bytes Written: 0
[molari] # drdmgr -h sheridan dsk5 View of Data from member sheridan as of 2001-07-19:16:39:47 Device Name: dsk5 Device Type: Direct Access IO Disk Device Status: OK Number of Servers: 2 Server Name: molari Server State: Server Server Name: sheridan Server State: Server Access Member Name: sheridan Open Partition Mask: 0 Statistics for Client Member: sheridan Number of Read Operations: 0 Number of Write Operations: 15822 Number of Bytes Read: 0 Number of Bytes Written: 8912310
Notice that the second drdmgr command uses the "-h" switch to get the statistics for member sheridan from member molari. The device is a DAIO device and each member is a server. Also note that each member's access member name is itself.
Since the statistics for sheridan are not zero, we will set them to zero before we begin.
[molari] # drdmgr -a statistics=0 -h sheridan dsk5
[molari] # drdmgr -a statistics -h sheridan dsk5 View of Data from member sheridan as of 2001-07-19:16:46:55 Device Name: dsk5 Statistics for Client Member: sheridan Number of Read Operations: 0 Number of Write Operations: 0 Number of Bytes Read: 0 Number of Bytes Written: 0
Let's write each member's "MyFile" to the "h" partition of dsk5 using the character device special file. We will write from molari and wait until the write is complete before starting to write from sheridan.
[molari] # dd if=MyFile.molari of=/dev/rdisk/dsk5h 11718+1 records in 11718+1 records out
[molari] # drdmgr -a statistics dsk5 View of Data from member molari as of 2001-07-19:16:55:19 Device Name: dsk5 Statistics for Client Member: molari Number of Read Operations: 0 Number of Write Operations: 11719 Number of Bytes Read: 0 Number of Bytes Written: 6000000
[molari] # drdmgr -a statistics -h sheridan dsk5 View of Data from member sheridan as of 2001-07-19:16:55:25 Device Name: dsk5 Statistics for Client Member: sheridan Number of Read Operations: 0 Number of Write Operations: 0 Number of Bytes Read: 0 Number of Bytes Written: 0
Notice that the statistics on sheridan are still zero.
[sheridan] # dd if=MyFile.sheridan of=/dev/rdisk/dsk5h 15625+0 records in 15625+0 records out
[sheridan] # drdmgr -a statistics dsk5 View of Data from member sheridan as of 2001-07-19:17:00:01 Device Name: dsk5 Statistics for Client Member: sheridan Number of Read Operations: 0 Number of Write Operations: 15625 Number of Bytes Read: 0 Number of Bytes Written: 8000000
[sheridan] # drdmgr -a statistics -h molari dsk5 View of Data from member molari as of 2001-07-19:17:00:09 Device Name: dsk5 Statistics for Client Member: molari Number of Read Operations: 0 Number of Write Operations: 11719 Number of Bytes Read: 0 Number of Bytes Written: 6000000
Since the device supports DAIO and each member has physical access to the device, each member is able to write to the device independently without passing the data across the cluster interconnect.
Performing block I/O (i.e., reading from and writing to the block device special file) is handled at the Cluster File System (CFS) layer prior to being sent to the DRD. Since the CFS is implemented in a client/server fashion, even though the device may support DAIO and be on a shared bus directly accessible by the members performing the I/O, the DRD will not get involved on those members that are CFS clients (unless the Direct I/O or Direct Access Cached Reads are invoked [see chapter 13]). The block device special files are located in the /devices directory hierarchy and are therefore served by the CFS server for the cluster_root domain.
So, let's demonstrate how block I/O is performed in a cluster. We will once again use the dd command, but this time we will use the block device special file for dsk5.
The first thing that we need to do is find out which member is the CFS server for the cluster_root domain. For this we will use the cfsmgr(8) command.
# cfsmgr -d cluster_root Domain or filesystem name = cluster_root Server Name = sheridan Server Status : OK
The node sheridan is the CFS server.
Let's reset the statistics on both members before we do any I/O.
[molari] # drdmgr -a statistics=0 dsk5
[molari] # drdmgr -a statistics=0 –h sheridan dsk5
We will issue the dd command on molari to illustrate that the DRD on sheridan will do the actual work. The request for I/O will go to the CFS server for the root (/) file system. The CFS server will dispatch the I/O request to the physical file system layer. The I/O will then be dispatched to the I/O mapper subsystem and then down to the DRD on the same member. The I/O architecture was illustrated in Figure 15-2.
[molari] # dd if=MyFile.molari of=/dev/disk/dsk5h 11718+1 records in 11718+1 records out
Notice that we are using the block device special file (/dev/disk/dsk5h) and not the character device special file (/dev/rdisk/dsk5h). Let's see what the DRD statistics for dsk5 tell us from molari.
[molari] # drdmgr -a statistics dsk5 View of Data from member molari as of 2001-07-19:22:30:12 Device Name: dsk5 Statistics for Client Member: molari Number of Read Operations: 0 Number of Write Operations: 0 Number of Bytes Read: 0 Number of Bytes Written: 0
As you can see, the DRD on molari never saw any I/O. The DRD on sheridan, however, was rather busy.
[molari] # drdmgr -a statistics -h sheridan dsk5 View of Data from member sheridan as of 2001-07-19:22:34:01 Device Name: dsk5 Statistics for Client Member: sheridan Number of Read Operations: 2930 Number of Write Operations: 11719 Number of Bytes Read: 6000640 Number of Bytes Written: 24000512
When any disk is on a private bus, all I/O requests will be funneled to the member with direct physical access to the disk. This also means that if the only member with access to a device is down, then the device will not be accessible.
For example, dsk8 is on a bus that is local only to sheridan. How do we know? The most straightforward way to determine this is to use the hwmgr(8) command as follows:
[molari] # hwmgr -view devices -dsf dsk8 hwmgr: No such hardware ID or category.
[molari] # hwmgr -view devices -dsf dsk8 -m sheridan HWID: Device Name Mfg Model Hostname Location ---------------------------------------------------------------------------------------- 103: /dev/disk/dsk8c COMPAQ BB009235B6 sheridan bus-2-targ-1-lun-0
The device does not exist on molari, but it does exist on sheridan. To take this one step further, we can check the bus.
[molari] # hwmgr -show scsi -bus 2 SCSI DEVICE DEVICE DRIVER NUM DEVICE FIRST HWID: DEVICEID HOSTNAME TYPE SUBTYPE OWNER PATH FILE VALID PATH ------------------------------------------------------------------------------------- 47: 1 molari disk none 0 1 dsk0 [2/0/0]
[molari] # hwmgr -show scsi -bus 2 -m sheridan SCSI DEVICE DEVICE DRIVER NUM DEVICE FIRST HWID: DEVICEID HOSTNAME TYPE SUBTYPE OWNER PATH FILE VALID PATH ------------------------------------------------------------------------------------- 102: 1 sheridan disk none 2 1 dsk7 [2/0/0] 103: 2 sheridan disk none 2 1 dsk8 [2/1/0]
The fact that each member has a bus-2 is not as interesting as the fact that each member has devices on bus-2 that are different. That storage devices are unique in a cluster proves that this is a private bus. For more information on the hwmgr command, see chapter 7 as well as the hwmgr(8) reference page.
You can use the drdmgr command to check the path to a device, but this will only tell you if there is an active path to a device and not if the device is on a private bus.
Looking at the output from the drdmgr command from molari, you will see that it does not even include statistics, because the DRD on molari for dsk8 is only a client (i.e., it does not have direct physical access to the disk). Notice that the output, as seen from sheridan, includes statistics for both members, but molari's statistics are zero.
[molari] # drdmgr dsk8 View of Data from member molari as of 2001-07-19:23:00:56 Device Name: dsk8 Device Type: Direct Access IO Disk Device Status: OK Number of Servers: 1 Server Name: sheridan Server State: Server Access Member Name: sheridan Open Partition Mask: 0x80 < h >
[molari] # drdmgr -h sheridan dsk8 View of Data from member sheridan as of 2001-07-19:23:33:52 Device Name: dsk8 Device Type: Direct Access IO Disk Device Status: OK Number of Servers: 1 Server Name: sheridan Server State: Server Access Member Name: sheridan Open Partition Mask: 0x80 < h > Statistics for Client Member: molari Number of Read Operations: 0 Number of Write Operations: 0 Number of Bytes Read: 0 Number of Bytes Written: 0 Statistics for Client Member: sheridan Number of Read Operations: 20319 Number of Write Operations: 12819 Number of Bytes Read: 153012224 Number of Bytes Written: 56055808
You can see that there has already been quite a bit of I/O on dsk8. This is primarily because of the "h" partition (as indicated in the "Open Partition Mask"). This partition contains the /kits file system (extra#kits).
# cfsmgr -a devices -d extra ************************************************************** List of Devices Used for extra Number of Devices = 1 1 - dsk8h **************************************************************
All I/O to dsk8 goes through the DRD on sheridan.
If we write to the "a" partition on dsk8 (which we know is unused) from molari and then query the DRD statistics, you will see that the statistics on molari will remain zero, yet the statistics for sheridan will increase.
[molari] # drdmgr -a statistics=0 -h sheridan dsk8
[molari] # dd if=MyFile.molari of=/dev/disk/dsk8a 11718+1 records in 11718+1 records out
[molari] # drdmgr -a statistics -h sheridan dsk8 View of Data from member sheridan as of 2001-07-19:23:50:25 Device Name: dsk8 Statistics for Client Member: molari Number of Read Operations: 0 Number of Write Operations: 0 Number of Bytes Read: 0 Number of Bytes Written: 0 Statistics for Client Member: sheridan Number of Read Operations: 2933 Number of Write Operations: 11719 Number of Bytes Read: 6006784 Number of Bytes Written: 24000512
When a device is a served device, only one server can access it. If the served device is on a shared bus, any member connected to the bus can be the server for the device, but only one member will be the server at any point in time. We illustrated this back in Figure 15-9. In this section, we'll show how the I/O flows when accessing a served tape device.
On our two-member cluster, we have a tape device on a shared bus. As we stated in section 15.1.2.2, a tape device is not a DAIO device. Before showing how the I/O flows through the DRD for our served tape device, let's see which member is currently serving it.
[sheridan] # drdmgr tape0 View of Data from member sheridan as of 2001-07-24:14:34:56 Device Name: tape0 Device Type: Served Tape Device Status: OK Number of Servers: 2 Server Name: sheridan Server State: Not Server Server Name: molari Server State: Server Access Member Name: molari Open Partition Mask: 0 Statistics for Client Member: sheridan Number of Read Operations: 0 Number of Write Operations: 0 Number of Bytes Read: 0 Number of Bytes Written: 0 Statistics for Client Member: molari Number of Read Operations: 0 Number of Write Operations: 0 Number of Bytes Read: 0 Number of Bytes Written: 0
[sheridan] # drdmgr -h molari tape0 View of Data from member molari as of 2001-07-24:14:35:24 Device Name: tape0 Device Type: Served Tape Device Status: OK Number of Servers: 2 Server Name: sheridan Server State: Not Server Server Name: molari Server State: Server Access Member Name: molari Open Partition Mask: 0 Statistics for Client Member: sheridan Number of Read Operations: 0 Number of Write Operations: 0 Number of Bytes Read: 0 Number of Bytes Written: 0
As we can see from the output above, molari is the server. Since the statistics on the device are currently zero for both members, we will not need to reset the statistics before starting our example. If you're playing along with the home version (i.e., you are doing these examples on your cluster), however, your device might have counters greater than zero, so you may want to reset them before continuing. Okay, let's do some I/O.
[sheridan] # tar -cvf /dev/tape/tape0 ./someCool.txtFile a ./someCool.txtFile 50 Blocks
[sheridan] # tar -tvf /dev/tape/tape0 blocksize = 20 -rw------- 0/0 25157 Jul 24 14:13:41 2001 ./someCool.txtFile
Since molari is the server and sheridan is where we ran the tar(1) command, if everything works the way we would expect, then the I/O counters should increase on molari, but not on sheridan.
[sheridan] # drdmgr -a statistics tape0 View of Data from member sheridan as of 2001-07-24:14:38:20 Device Name: tape0 Statistics for Client Member: sheridan Number of Read Operations: 0 Number of Write Operations: 0 Number of Bytes Read: 0 Number of Bytes Written: 0 Statistics for Client Member: molari Number of Read Operations: 0 Number of Write Operations: 0 Number of Bytes Read: 0 Number of Bytes Written: 0
[sheridan] # drdmgr -h molari -a statistics tape0 View of Data from member molari as of 2001-07-24:14:38:50 Device Name: tape0 Statistics for Client Member: sheridan Number of Read Operations: 3 Number of Write Operations: 3 Number of Bytes Read: 151552 Number of Bytes Written: 30720
Let's change servers and verify that the change succeeded.
[sheridan] # drdmgr -a server=sheridan tape0
[sheridan] # drdmgr -a server tape0 View of Data from member sheridan as of 2001-07-24:14:39:48 Device Name: tape0 Device Type: Served Tape Device Status: OK Number of Servers: 2 Server Name: sheridan Server State: Server Server Name: molari Server State: Not Server
Now that sheridan is acting as the server, let's reset the statistics and do some I/O from molari and sheridan this time.
[molari] # drdmgr -a statistics=0 tape4
[molari] # drdmgr -h sheridan -a statistics=0 tape4
[molari] # tar -cvf /dev/tape/tape0 ./someCool.txtFile a ./someCool.txtFile 50 Blocks
[molari] # tar -tvf /dev/tape/tape0 blocksize = 20 -rw------ 0/0 25157 Jul 24 14:13:41 2001 ./someCool.txtFile
[sheridan] # tar -cvf /dev/tape/tape0 ./someCool.txtFile a ./someCool.txtFile 50 Blocks
[sheridan] # tar -tvf /dev/tape/tape0 blocksize = 20 -rw------ 0/0 25157 Jul 24 14:13:41 2001 ./someCool.txtFile
The I/O should now be tallied on sheridan this time.
[sheridan] # drdmgr -a statistics tape0 View of Data from member sheridan as of 2001-07-24:14:44:23 Device Name: tape0 Statistics for Client Member: sheridan Number of Read Operations: 3 Number of Write Operations: 3 Number of Bytes Read: 151552 Number of Bytes Written: 30720 Statistics for Client Member: molari Number of Read Operations: 3 Number of Write Operations: 3 Number of Bytes Read: 151552 Number of Bytes Written: 30720
[sheridan] # drdmgr -h molari -a statistics tape0 View of Data from member molari as of 2001-07-24:14:44:47 Device Name: tape0 Statistics for Client Member: sheridan Number of Read Operations: 0 Number of Write Operations: 0 Number of Bytes Read: 0 Number of Bytes Written: 0
What happens if sheridan loses its direct physical access to a device? Let's find out.
In this example, we are going to simulate a path failure by pulling the SCSI cable out from one of sheridan's SCSI adapters. Specifically, we will pull the cable connected to the shared bus. "The shared bus? No, not the shared bus! You can't pull the cable on the shared bus because sheridan's boot_partition is on the shared bus! The cluster-common partitions are on the shared bus!" you exclaim. As we stated earlier in the chapter, our cluster is a test cluster, so we have nothing to lose. Will sheridan crash? Will sheridan hang? Let's find out but first let's see which disks are on the shared bus.
[sheridan] # hwmgr -show scsi -bus 3 SCSI DEVICE DEVICE DRIVER NUM DEVICE FIRST HWID: DEVICEID HOSTNAME TYPE SUBTYPE OWNER PATH FILE VALID PATH --------------------------------------------------------------------------------- 50: 3 sheridan disk none 2 1 dsk1 [3/0/0] 51: 4 sheridan disk none 2 1 dsk2 [3/1/0] 52: 5 sheridan disk none 2 1 dsk3 [3/2/0] 53: 6 sheridan disk none 2 1 dsk4 [3/3/0] 54: 7 sheridan disk none 2 1 dsk5 [3/4/0] 55: 8 sheridan disk none 2 1 dsk6 [3/5/0]
The shared bus for both sheridan and molari is bus 3, and there are six disks (dsk1-dsk6) on the bus. Using the cfs script that we introduced in chapter 13, we'll see which file systems are using which disks so that we can watch what happens when we pull the cable. You can also obtain this information by using the "cfsmgr -a devices" command, which we previously demonstrated in section 15.3.2.
[sheridan] # cfs -s | grep dsk / [cluster_root#root] (dsk1a): /usr [cluster_usr#usr] (dsk1g): /var [cluster_var#var] (dsk1h): /kits [extra#kits] (dsk8h): /u1 [home#u1] (dsk7h): /cluster/members/member1/boot_partition [root1_domain#root] (dsk2a): /cluster/members/member2/boot_partition [root2_domain#root] (dsk3a): /fafrak [tcrhb#fafrak] (dsk6c):
Next, let's see which file systems sheridan is serving:
[sheridan] # cfs -h sheridan CFS Server Mount Point File System FS Type ----------------- -------------------------- ------------------- ------- sheridan / cluster_root#root AdvFS sheridan /usr cluster_usr#usr AdvFS sheridan /var cluster_var#var AdvFS sheridan /kits extra#kits AdvFS sheridan /u1 home#u1 AdvFS sheridan /cluster/members/member2/ root2_domain#root AdvFS boot_partition
From the output of the cfs script, it appears that sheridan is the CFS server for the cluster-common file systems and its own boot_partition.
In our cluster configuration, dsk1 is the disk that holds cluster_root, cluster_usr, and cluster_var while dsk3 holds sheridan's boot_partition. So, let's see what information the DRD has for dsk1 and dsk3.
[sheridan] # drdmgr -a server -a accessnode dsk1 dsk3 View of Data from member sheridan as of 2001-07-23:23:53:46 Device Name: dsk1 Device Type: Direct Access IO Disk Device Status: OK Number of Servers: 2 Server Name: molari Server State: Server Server Name: sheridan Server State: Server Access Member Name: sheridan Device Name: dsk3 Device Type: Direct Access IO Disk Device Status: OK Number of Servers: 2 Server Name: molari Server State: Server Server Name: sheridan Server State: Server Access Member Name: sheridan
Now that we know which disks are on the shared bus, which file systems are using those disks, which file systems sheridan is acting as CFS server for, and how the DRD is configured on sheridan, let's get ready to pull the plug (or cable in this case).
On molari and sheridan, let's monitor EVM for events so that we can see what the cluster sees when the cable is disconnected. We'll use the following commands:
# export EVM_SHOW_TEMPLATE="@member_id [@priority] @name" # evmwatch -A -f "[name *.scsi]|[name *.drd]|[name *.cfs]"
We pulled the plug; let's see what happened. We will look at the output from molari:
[molari] # export EVM_SHOW_TEMPLATE="@member_id [@priority] @name" [molari] # evmwatch -A -f "[name *.scsi]|[name *.drd]|[name *.cfs]" 2 [200] sys.unix.clu.drd.server_leave._hwid.50 2 [200] sys.unix.clu.drd.server_leave._hwid.53 2 [200] sys.unix.clu.drd.server_add._hwid.50 2 [200] sys.unix.clu.drd.new_accessnode._hwid.50 2 [200] sys.unix.clu.drd.new_accessnode._hwid.50 2 [200] sys.unix.clu.drd.server_leave._hwid.52 2 [200] sys.unix.clu.drd.server_add._hwid.53 2 [200] sys.unix.clu.drd.new_accessnode._hwid.53 2 [200] sys.unix.clu.drd.new_accessnode._hwid.53 2 [200] sys.unix.clu.drd.server_add._hwid.52 2 [200] sys.unix.clu.drd.new_accessnode._hwid.52 2 [200] sys.unix.clu.drd.new_accessnode._hwid.52
It looks like the DRD on sheridan (member2) detected a problem. We only see a small number of events on molari as compared to what was seen from sheridan. "Do you mean that sheridan did not hang or crash?" you ask. That's correct, sheridan is still running. In fact, here is what sheridan saw when we pulled the cable:
[sheridan] # export EVM_SHOW_TEMPLATE="@member_id [@priority] @name" [sheridan] # evmwatch -A -f "[name *.scsi]|[name *.drd]|[name *.cfs]" 2 [200] sys.unix.clu.drd.server_leave._hwid.50 2 [200] sys.unix.clu.drd.server_leave._hwid.53 2 [200] sys.unix.clu.drd.server_add._hwid.50 2 [200] sys.unix.clu.drd.new_accessnode._hwid.50 2 [200] sys.unix.clu.drd.new_accessnode._hwid.50 2 [200] sys.unix.clu.drd.server_leave._hwid.52 2 [200] sys.unix.binlog.hw.scsi._hwid.50 ... 2 [700] sys.unix.binlog.hw.scsi 2 [400] sys.unix.binlog.hw.scsi ... 2 [400] sys.unix.binlog.hw.scsi._hwid.51 2 [200] sys.unix.binlog.hw.scsi._hwid.51 2 [400] sys.unix.binlog.hw.scsi._hwid.52 2 [200] sys.unix.binlog.hw.scsi._hwid.52 2 [200] sys.unix.binlog.hw.scsi._hwid.53 2 [400] sys.unix.binlog.hw.scsi._hwid.54 2 [200] sys.unix.binlog.hw.scsi._hwid.54 2 [400] sys.unix.binlog.hw.scsi._hwid.55 2 [200] sys.unix.binlog.hw.scsi._hwid.55 ... 2 [200] sys.unix.clu.drd.server_add._hwid.53 2 [200] sys.unix.clu.drd.new_accessnode._hwid.53 2 [200] sys.unix.clu.drd.new_accessnode._hwid.53 2 [200] sys.unix.binlog.hw.scsi._hwid.50 2 [200] sys.unix.clu.drd.server_add._hwid.52 2 [200] sys.unix.clu.drd.new_accessnode._hwid.52 2 [200] sys.unix.clu.drd.new_accessnode._hwid.52 2 [200] sys.unix.binlog.hw.scsi._hwid.51 ...
We truncated the output to save a page or two, but as you can see, in addition to DRD events, sheridan also saw SCSI Hardware events. This is not a big surprise considering it can no longer see several disks.
How is it that sheridan is still running? Let's find out. Using the hwmgr command, let's see which devices we can still see:
[sheridan] # hwmgr -show scsi -bus 3 SCSI DEVICE DEVICE DRIVER NUM DEVICE FIRST HWID: DEVICEID HOSTNAME TYPE SUBTYPE OWNER PATH FILE VALID PATH ---------------------------------------------------------------------------- 50: 3 sheridan disk none 0 1 dsk1 51: 4 sheridan disk none 2 1 dsk2 52: 5 sheridan disk none 0 1 dsk3 53: 6 sheridan disk none 0 1 dsk4 54: 7 sheridan disk none 2 1 dsk5 55: 8 sheridan disk none 2 1 dsk6
If you compare this output to the output from the hwmgr command we received before disconnecting the cable, you can see that we no longer have any valid paths to the devices.
What does the DRD on sheridan see? We can use the same drdmgr command we used before to find out:
[sheridan] # drdmgr -a server -a accessnode dsk1 dsk3 View of Data from member sheridan as of 2001-07-24:00:27:02 Device Name: dsk1 Device Type: Direct Access IO Disk Device Status: OK Number of Servers: 1 Server Name: molari Server State: Server Access Member Name: molari Device Name: dsk3 Device Type: Direct Access IO Disk Device Status: OK Number of Servers: 1 Server Name: molari Server State: Server Access Member Name: molari
You can see by the output above that molari is now serving the data to sheridan.
Okay, now for the real tough question, "Which member is the CFS server for cluster_root, cluster_usr, cluster_var, and sheridan's boot_partition?" Well, we did not see any CFS events. Can sheridan still serve the file systems despite no longer having direct access to the devices? You bet.
[sheridan] # cfs CFS Server Mount Point File System FS Type ------------------ --------------------------- ---------------------- ------- molari /cluster/members/member1/ root1_domain#root AdvFS boot_partition molari /fafrak tcrhb#fafrak AdvFS sheridan / cluster_root#root AdvFS sheridan /usr cluster_usr#usr AdvFS sheridan /var cluster_var#var AdvFS sheridan /kits extra#kits AdvFS sheridan /u1 home#u1 AdvFS sheridan /cluster/members/member2/ root2_domain#root AdvFS boot_partition
How can this be? Well, the CFS server sends I/O requests to the DRD. The CFS never sees an error because the DRD handled the problem by automatically rerouting where it dispatched the request. The DRD sensed that it could no longer use the access member it was using, so it got a new access member from the list of servers for the device.
After we reattached the cable, the DRD almost immediately noticed the devices again. Checking the path to one of the devices, we see:
[sheridan] # drdmgr -a check_path dsk1 View of Data from member sheridan as of 2001-07-24:00:42:42 Device Name: dsk1 Local Device Path: Exists
The path has returned. The DRD on sheridan is also once again serving the disks locally.
[sheridan] # drdmgr -a server -a accessnode dsk1 dsk3 View of Data from member sheridan as of 2001-07-24:00:42:53 Device Name: dsk1 Device Type: Direct Access IO Disk Device Status: OK Number of Servers: 2 Server Name: molari Server State: Server Server Name: sheridan Server State: Server Access Member Name: sheridan
Device Name: dsk3 Device Type: Direct Access IO Disk Device Status: OK Number of Servers: 2 Server Name: molari Server State: Server Server Name: sheridan Server State: Server Access Member Name: sheridan
All we did was reattach the cable – the DRD did the rest, automatically and transparently. This is incredible stuff!