26.4 Migration Example

Let's run through our "option 3" example. We will mostly use existing hardware, configure and test the version 5.X cluster while a reduced version 4.X cluster continues our live production, then complete the migration with a minimum of downtime. If there is a problem with the applications or CAA resources, you will see how to return to the full version 4.X cluster.

26.4.1 Planning

Plan. We cannot emphasize enough the importance of planning your migration and your cluster layout. If you skip this step, you may be forced to rebuild the cluster in the not too distant future. Part of your planning process should include reading the TruCluster manuals. The TruCluster Server Cluster Technical Overview is a small manual, which you should read cover to cover because it will ground you in the TruCluster Server components and technology. Another crucial part of planning is determining which layered products you will need to upgrade. Don't gloss over this point because some of your layered products that run on V5.X could differ from your current version and therefore require learning an updated product.

26.4.2 Check Your Existing Hardware for Compatibility

After you've planned how the cluster will be laid out and migrated, you are ready to begin the process. Install the TCRMIGRATE5XX subset and run clu_migrate_save (to build the data that will automate part of the migration). Also run "sys_check –escalate" on each member (both as a precautionary measure – you want this data in case you have to recreate part of the V1.[56] infrastructure – and so you can see what hardware you have on the cluster). As a result of planning and reviewing the "sys_check -escalate" data, purchase any hardware necessary for the new cluster. This would also be a good time to consider converting any UFS file systems to AdvFS. If you convert the file system(s) now, you won't have to repeat the task should the migration fail to complete for some reason. Figure 26-1 shows our V1.[56] cluster starting point.

click to expand
Figure 26-1: V1.[56] Cluster

Note: By converting to AdvFS at this state, you will be taking the old AdvFS on-disk structure forward. You should, at a later time, convert to the new structure by creating a new file domain and restoring the data to that domain.

26.4.3 Divide and Conquer

Pick a member to remove from the V1.[56] cluster (this will be our V5.X cluster beachhead). Add any of the identified required hardware to that system. Also update the firmware and install Tru64 UNIX V5.X on the removed system as seen in Figure 26-2. Notice that neither the storage nor the cluster interconnect is shared at this point. Remember, we recommend a fresh install of V5.X, not an installupdate to V5.X, because there are many major changes in the OS since V4.X (AdvFS, LSM, and clustering, just to name a few).

click to expand
Figure 26-2: Preparing for the New Cluster – Install Tru64 UNIX

26.4.4 Create a TruCluster Server Cluster

Install TruCluster Server on your newly installed V5.X system. This is a good time to apply the latest patch kit. (Simply installing the TruCluster subsets does not create a cluster.) Also, you should partition your disks for the cluster-common file systems (cluster_root, cluster_usr, and cluster_var). Now run clu_create to create a single-member cluster. Your new "split" cluster should look something like Figure 26-3. Again, these are two completely separate one-member clusters.

click to expand
Figure 26-3: Create a One-Member TruCluster Server Cluster

26.4.5 Test CAA Resources

Install the applications and set up any test databases if necessary. Depending on the application, it may be best to install the application before running clu_create but reference the application's installation guide for further information. Create the CAA application resources for the ASE services that will be migrated to the V5.X cluster and test them. After testing, set them OFFLINE so that the production storage can be moved into the new cluster (in place of the prototype) as seen in Figure 26-4.

click to expand
Figure 26-4: Create and Test Application Resources

26.4.6 Gather the Configuration Data

On the V1.[56] cluster, make sure that none of the services have a placement policy to prevent services from running on the remaining node. (Otherwise we won't pick up that service, and the service won't be available during this phase of the migration.) Change the placement policy if necessary and run clu_migrate_save. Then set all of the ASE services OFFLINE (with an "asemgr –x <service>" command) in case you need to return to the V1.[56] cluster, and shut down the V1.[56] cluster (only one member in our example) as shown in Figure 26-5. Example clu_migrate_save output:

click to expand
Figure 26-5: Save the Current Cluster's Configuration Information

 beginLog molari Thu May 16 15:24:22 EDT 2002              TruCluster Migration Data Gather Tool              _____________________________________      ********************* Running preliminary checks *********************      *********************** Backing up for Recovery ********************** Backing up console variables         (to /var/TruCluster_migration/molari-mc0/ConsoleVars) Backing up LSM configuration         (to /var/TruCluster_migration/molari-mc0/Backup.d/LSM.d/) voldg: Volume daemon is not accessible Backing up disk labels         (to /var/TruCluster_migration/molari-mc0/Backup.d/Disklabels.d/)         Saving disk label for rz0         Saving disk label for rz16         Saving disk label for rz17         Saving disk label for rz18         Saving disk label for rz19         Saving disk label for rz20 *** Warning: Cannot access /dev/rrz21c scu: Unable to open device '/dev/rrz21c', ENXIO (6) - No such device or address    *********** Labeling Disks for Device Name Mapping on V5.* *********** Labeling disk label packids ... Labeling device /dev/rrz0c with "@rz0"^[2]

 Labeling device /dev/rrz16c with "@rz16" Labeling device /dev/rrz17c with "@rz17" Labeling device /dev/rrz18c with "@rz18" Labeling device /dev/rrz19c with "@rz19" Labeling device /dev/rrz20c with "@rz20" *** Warning: Cannot access /dev/rrz21c scu: Unable to open device '/dev/rrz21c', ENXIO (6) - No such device or address     ***************** Gathering Information for Migration ***************** Gathering ASE database information (might take a long time for large ASEs) ... Saving ASE information ...        LSM disk group information        Distributed Raw Disk information        AdvFS domain information        Mount point information        NFS export information        Service information        User-defined action scripts

     ******* Copy Migration Information to a Tru64 UNIX V5.* System ******* The directories containing the information gathered by this utility must be copied to the system that will become the first member of the new cluster. If the /.rhosts file on that system allows access for root@molari, this utility can copy these directories automatically. Otherwise, manually copy the directories after this utility exits. Copy the directories? (y/n): After this utility exits, copy the following directory and its contents to the Tru64 UNIX system that will be the first member of the new cluster.           /var/TruCluster_migration/molari-mc0/ Press Return to continue ...     ****************** TruCluster Data Gather Completed ****************** Information regarding the gathered information can be found in the file:   /var/TruCluster_migration/molari-mc0/README A log of this session can be found in the file:   /var/TruCluster_migration/molari-mc0/Log.d/clu_migrate_save.log

If you happen to have more than one ASE, it's possible that you could have the same device name (/dev/rz10c) referencing two different physical devices (once in ASE 1, and once in ASE 2). You could relocate all services to a single member in each ASE and run clu_migrate_save on each member holding all the services per ASE (which saves two sets of data). Once you have the V5.X cluster installed and all members added, run clu_migrate_configure twice (once for each set of saved data from the two ASEs), and then manually resolve any conflicts based on duplicate disk device names.

26.4.7 Apply the Configuration Data

As shown in Figure 26-6, move the data (storage) to the V5.X cluster and run clu_migrate_configure to configure the storage. Then test the application resources with the actual production data. By the way, "clu_migrate_configure -x" will show you what the script would do without actually doing it.

click to expand
Figure 26-6: Apply the Configuration Data to the New Cluster

If testing is successful, continue with section 26.4.8; otherwise, return for now to the V1.[56] cluster by following the directions starting in section 26.4.11.

26.4.8 Connect the Other Member to the New Cluster

Add any required hardware to the other member; update its firmware; set the SRM environment variables (bootdef_dev, boot_reset, auto_action); and connect the "new" system back to the cluster interconnect and shared storage so that we can add it to the cluster as shown in Figure 26-7.

click to expand
Figure 26-7: Connect the Second System to the New Cluster

26.4.9 Add the Old Member to the New Cluster

As shown in Figure 26-8, run clu_add_member on the V5.X cluster to add the new member. Once clu_add_member is complete, you can boot the new V5.X cluster member, configure its network interface(s), and configure a quorum disk (at least we will configure a quorum disk since this is a two-member cluster).

click to expand
Figure 26-8: Add the Second System to the New Cluster

26.4.10 Admire New Cluster

If you've made it this far, congratulations! See Figure 26-9 for a picture of what the new migrated cluster looks like. Label the Emergency Repair disk as such and keep it for emergencies (i.e., in case you have to boot a non-clustered member to restore a root file system (see Chapter 22)).

click to expand
Figure 26-9: Welcome to TruCluster Server!

26.4.11 Retreat to the V1.[56] Cluster

If the migration failed, reconnect the production data (storage) back to the V1.[56] cluster; boot the V1.[56] cluster (which consists of the one member); run clu_migrate_recover and set the ASE services ONLINE (asemgr –s <service>) as shown in Figure 26-10.

click to expand
Figure 26-10: Failed Migration Recovery

If you wish to completely undo the migration, restore V4.0X/V1.[56] onto the removed node and add it back into the V1.[56] cluster.

26.4.12 Where's the Log?

A log of a successful option 3 upgrade can be found at:

http://www.tru64unix.compaq.com/docs/highavail/migration/migration_log.htm

^[2]The disklabel field "label" contains the name of the old device name: “label: @rz34.