Using Solaris Volume Manager | Solaris 9 Sun Certified System Administrator Study Guide

The Solaris Volume Manager (SVM) is packaged with Solaris 9 and provides advanced hard disk management capabilities, including creating RAID volumes, soft partitions, hot spare pools, and transactional volumes. Using SVM can help increase your storage capacity, ensure data availability, improve disk I/O, and lower administrative overhead. SVM was formerly known as the Solstice DiskSuite.

Note

One new enhancement of Solaris Volume Manager is the inclusion of soft partitions. A soft partition enables you to create more than the standard eight slices available per hard disk.

Overview of Solaris Volume Manager

Solaris Volume Manager enables you to manage large numbers of hard disks and data volumes. SVM works by using virtual disks to manage physical disks and the data on those disks. A virtual disk is also called a volume or a metadevice. Sun currently prefers the term volume, although some documentation and command-line utilities still refer to metadevices.

Volumes are built from disk slices or other volumes. In terms of functionality, applications and utilities see volumes just as they see physical disks or file systems. In a sense, the virtual disk sits transparently on top of the physical disk. So when a request to read or write data is passed from an application to the operating system, Solaris Volume Manager intercepts the request on behalf of the virtual disk and passes it to the appropriate physical disk.

Note

Do not confuse Solaris Volume Manager with the volume manager daemon (vold), which automatically mounts removable media such as CD-ROMs.

You can access the Solaris Volume Manager through one of two ways: a graphical interface or the command line. The graphical interface provided by the Solaris Management Console is easy to use and affords you complete flexibility when managing storage volumes. The Enhanced Storage node of the Solaris Management Console is shown in Figure 12.4.

click to expand
Figure 12.4: Solaris Volume Manager's graphical interface

If you prefer using a command line, most of the SVM commands begin with meta, such as metainit, metadb, and metastat. The available SVM commands are listed in Table 12.1, and their usage is covered in depth throughout the rest of the chapter.

Table 12.1: Solaris Volume Manager Commands
Command	Function
growfs	Increases the size of a UFS file system without destroying data
metaclear	Removes active volumes and hot spare pools
metadb	Creates and deletes state database replicas
metadetach	Detaches a volume from a RAID 1 volume, or a logging device from a transactional volume
metadevadm	Checks the configuration of device IDs
metahs	Manages hot spare devices and hot spare pools
metainit	Configures volumes
metaoffline	Takes submirrors offline
metaonline	Brings submirrors online
metaparam	Modifies volume parameters
metarecover	Recovers configuration information for soft partitions
metarename	Renames volumes
metareplace	Replaces components in submirrors and RAID 5 volumes
metaroot	Creates mirroring of the root file system (/)
metaset	Manages disk sets
metastat	Displays volume status or hot spare pool status
metasync	Synchronizes volumes
metattach	Attaches a component to a RAID 0 or RAID 1 volume, or a log device to a transactional volume

Regardless of whether you choose to use the graphical or command-line interface, you need root privileges to use Solaris Volume Manager.

Solaris Volume Manager and Crash Dumps

Although the Solaris Volume Manager is capable of managing all hard disks and storage devices in your system, it's best to keep your dump device for crash dumps out of the control of SVM.

Think about it this way: SVM is a software application that manages your hard disks. If the system crashes (and produces a crash dump), by definition, SVM is not running. So, if your dump device is controlled by SVM, you now have an inaccessible device to which a crash dump is supposed to be written. It won't work.

By default, your swap space is the crash dump device. It's okay to have Solaris Volume Manager control your swap space, but if you do, designate a dedicated dump device that's not controlled by SVM. For more information on crash dumps, see Chapter 11, "Virtual File Systems and NFS."

Key Files

Like any other Solaris service, Solaris Volume Manager requires several files to operate properly. The Solaris Volume Manager is started by the /etc/rcS.d/S35svm.init script during boot and synchronized (if necessary) by the /etc/rc2.d/S95svm.sync script, which also starts the active monitoring daemon.

The /etc/lvm/mddb.cf file contains the locations of your state database replicas and is considered the "metadevice database." The /etc/lvm/md.cf file holds information automatically generated for the default (unspecified or local) disk set. Never edit either of these files directly or you will corrupt your Solaris Volume Manager configuration.

The /kernel/drv/md.conf file is read by SVM at system startup. It contains configuration information, and there are two fields within this file you can edit. The first field is nmd, which sets the number of metadevices that your computer can support, and the second is md_nsets, which is the number of disk sets that your computer can support. The nmd field has a default of 128, but can be increased to a maximum of 8192. The md_nsets field defaults to 4 and can be increased up to 32.

Note

Do not increase the nmd and md_nsets values unless you need to support additional volumes or disk sets. Increasing these values causes more memory to be reserved for volume and disk set management, even if the volumes or disk sets do not exist. Using unnecessarily high values could degrade system performance.

One last file important to Solaris Volume Manager configuration is /etc/lvm/md.tab. It contains configuration information that an administrator can use to reconstruct an SVM configuration. SVM can use this file for input with utilities such as metadb, metahs, and metainit.

Solaris Volume Manager does not input information into the md.tab file. It exists to enable the administrator to create a configuration and input the configuration for use by SVM.

Volume Details

Logical storage devices created by Solaris Volume Manager are called volumes. A volume can reside on one physical disk or be a combination of slices from multiple physical disks.

There are five classes of volumes you can create in Solaris 9: RAID 0, RAID 1, RAID 5, transactional, and soft partitions. You have already had an introduction to RAID levels. Details about transactional volumes and soft partitions are covered in their sections later in this chapter.

After volumes are created, you can use most file system management commands to manipulate them. Commands such as mount, umount, mkfs, and ufsdump work normally, as do all file system navigation commands, such as ls and cd. The one command that does not work on volumes is format.

Expanding Volumes

Existing volumes can be enlarged by adding additional slices. After you grow your volume, you will probably want to grow the file system that uses that volume by using the growfs command. Although growing file systems does not result in data loss, it's still a good idea to back up your file system before attempting configuration changes. Volumes can be expanded but cannot be shrunk without destroying the volume and creating a new one.

Volume Names

Each volume is assigned a name, just as each disk slice is assigned a name by Solaris. Here are volume naming requirements and guidelines:

Volume names must have the format of d followed by a number-for example, d19 or d74.
By default, Solaris supports 128 volume names, from d0-d127. If you need more volumes, you must edit the /kernel/drv/md.conf file.
Each volume has a logical name that appears in the file system. For example, block device names are in the /dev/md/dsk directory, and raw device names are in the /dev/md/rdsk directory.
When using meta* commands, instead of supplying the full volume name, such as /dev/md/dsk/d0, you can use abbreviated names, such as d0.
You can rename a volume if it is not currently in use. However, each volume must have a unique name.

Before you can create volumes with Solaris Volume Manager, you must first create a state database.

Understanding State Databases

A state database is a database on the hard disk that stores information about your Solaris Volume Manager configuration. Changes made to your configuration are automatically updated within the state database.

The state database is a collection of multiple replicated database copies. Each copy is known as a state database replica. You should create at least three replicas for the state database and place the replicas on different controllers and/or different hard disks for maximum reliability. Obviously, if you have only one hard disk, this is not possible, so place all three replicas on the same slice. Replicas can be placed on dedicated slices, but it's not necessary. Placing replicas on slices that will become part of a volume is acceptable.

Note

Replicas cannot be placed on existing file systems or on the root (/), /usr, or swap file systems.

State database replicas are only 4MB each, so they don't take up a lot of room. You can add additional state database replicas to your system at any time.

If your state database becomes lost or corrupted, Solaris will poll the various replicas to see which ones still contain valid data. It determines this by using a majority consensus algorithm. Basically, Solaris reads the replicas and decides which information is valid by requiring a majority of the replicas to be available and in agreement. After valid data is determined, the state database can be restored, and the configuration corrected, if necessary.

You can create state database replicas from a command line by using the metadb command or from the Solaris Management Console.

Managing State Database Replicas from the Command Line

State database replicas are managed with the metadb command. To create the first state database replica, use metadb -a -f slice, where slice is the partition that will hold the replica. For example, you could use:

 # metadb -a -f c0t2d0s0

After you have added the first replica, you can add additional ones; additional replicas do not need the -f switch. If you wanted to add two replicas to one slice, you could use:

 # metadb -a -c 2 c0t0d0s7

Table 12.2 lists the switches available for metadb.

Table 12.2: *metadb* Arguments
Argument	Function
-a	Adds a new device to the database.
-c number	Specifies the number of replicas to add to the device. The default is 1.
-d	Deletes all replicas on the device.
-f	Creates the initial state database. Also forces the deletion of the last state database replica.
-h	Displays a usage message.
-i	Inquires about replica status.
-k file	Specifies the name of the kernel file to which the replica information should be written. It can be used only with the local disk set. The default file is /kernel/drv/md.conf.
-l blocks	Specifies the length of the replica. The default is 8192 blocks.
-p	Updates the system file (/kernel/drv/md.conf) with entries from the /etc/lvm/mddb.cf file. Normally used to update a newly built system before it is rebooted for the first time.
-s	Specifies the name of the disk set to use.

Using metadb, you can also specify multiple devices at the same time. For example, if you wanted to create an initial state database on four devices, you could use:

 # metadb -a -f c0t0d0s7 c0t2d0s1 c0t2d0s5 c0t2d0s7

To check the status of your state database replicas, use metadb -i, as shown here:

 # metadb -i   flags       first blk       block count   a   u       16              8192            /dev/dsk/c0t0d0s7   a   u       16              8192            /dev/dsk/c0t2d0s0  r - replica does not have device relocation information  o - replica active prior to last mddb configuration change  u - replica is up to date  l - locator for this replica was read successfully  c - replica's location was in /etc/lvm/mddb.cf  p - replica's location was patched in kernel  m - replica is master, this is replica selected as input  W - replica has device write errors  a - replica is active, commits are occurring to this replica  M - replica had problem with master blocks  D - replica had problem with data blocks  F - replica had format problems  S - replica is too small to hold current data base  R - replica had device read errors #

You don't need to memorize all the state flags. However, do notice that lowercase flags indicate proper operation, whereas uppercase flags indicate some sort of problem.

To delete a state replica database, use metadb -d. If it's the last replica, you will need to use -f as well. Here's an example:

 # metadb -d -f c0t2d0s7

Warning

If your computer has metadevices defined, do not delete all of your state database replicas. Doing so will cause the defined metadevices to fail.

Graphically Managing State Database Replicas

Here is how to create state database replicas by using the Solaris Management Console:

Select the State Database Replicas node to highlight it and then choose Action ➣ Create Replicas.
The first screen that appears will ask whether you want to use a disk set. If you haven't defined any disk sets or don't want to use an existing disk set for replicas, choose <none> and click Next.
You will then be asked to select the components on which you want to create the replicas.
Next, you will be asked to specify a replica length (the default is 8192 blocks, or 4MB) and number of replicas per device. After you have entered values (the defaults are usually okay), click Next.
The last screen enables you to review your selections and shows you the commands that will be executed by the Solaris Volume Manager. To make any changes, use the Back button. If you are satisfied with your selections, click Finish.

When the replicas are created, you will see them appear on the right side of the Solaris Management Console, as shown in Figure 12.5.

click to expand
Figure 12.5: State Database Replicas

Double-clicking a replica will provide status information.

To delete a replica, right-click it in the right pane and choose Delete. Or, you can select it to highlight it and then choose Edit ➣ Delete.

Managing RAID with Solaris Volume Manager

Because of the diversity they afford, RAID volumes are the most common type of volumes you will create with Solaris Volume Manager. Depending on your needs, you can create a RAID 0 volume for speed or a RAID 5 volume for fault tolerance. In all, there are three types of RAID supported in Solaris 9, as stated earlier. They are RAID 0, RAID 1, and RAID 5.

Managing RAID 0

Solaris 9 supports two types of RAID 0 volumes: striped volumes and concatenated volumes. A striped volume, as you learned earlier in this chapter, writes data evenly across all member disks, making the written data appear to create a "stripe" among the disks.

A concatenated volume also uses multiple disks, but unlike a stripe (where data is evenly distributed), a concatenation uses all available space on one component (slice, soft partition, or disk) before moving on to fill the next component. These types of volumes are not as fast as striped volumes but enable you to combine small areas of hard disk space into a larger logical volume.

A hybrid of the two RAID 0 levels is called a concatenated stripe. A concatenated stripe is a striped volume that has been expanded beyond its original size. Data in the original stripe will be written as a stripe; data in the expanded section will be written as a concatenation. None of the RAID 0 levels provide fault tolerance. If one device that's part of a RAID 0 volume fails, the entire volume fails.

RAID 0 volumes can be composed of slices, soft partitions, or entire hard disks. However, a RAID 0 volume cannot contain the following file systems: root (/), /usr, swap, /var, /opt, or any other file system used during an operating system installation or upgrade.

Striped Volumes

Striped volumes use equal amounts of space from two or more components. In fact, if you were to attempt to create a stripe out of two hard disks, one 10GB and the other 20GB, you would get a stripe 20GB in size (10GB from each disk). The remaining 10GB on the second disk would be free to use for another volume.

Tip

For striped volumes, use components located on different controllers for optimal performance. This enables multiple sets of disk heads to be reading and/or writing at the same time.

A block of data written to a stripe is called an interlace. If you were to create a stripe from three disks, interlace 1 would be on disk 1, interlace 2 on disk 2, and interlace 3 on disk 3. Interlace 4 would be on disk 1 again, and the sequence would repeat itself. The default interlace value is 16KB, and the valid interlace range is 8KB to 100MB. You can adjust the interlace value when creating the striped volume, but after the volume is created, you cannot change the interlace value without re-creating the stripe.

Tip

Set your interlace value to optimize disk I/O. Generally speaking, you want your disk I/O size to be larger than your interlace size. If your computer performs a lot of large file transfers, a larger interlace will be more efficient. For computers that utilize many smaller file transfers, you will want a smaller interlace.

Existing file systems cannot be converted into a stripe. You must back up the data, create the stripe, and then restore the data from backup.

Note

You must have a state database before you can create volumes.

Striped volumes are created with the metainit command. Here is the syntax:

 # metainit volume_name #_stripes components_per_stripe component_name1   component_name2 component_nameN -i interlace

The syntax for metainit might look complex, but it doesn't need to be difficult in practice. For example, if you wanted to create a stripe named d7, made of four slices with an interlace of 64KB, you could use:

 # metainit d7 1 4 c0t2d0s2 c0t3d0s2 c0t4d0s2 c0t5d0s2 -i 64k

The 1 in the metainit command means that you are creating one stripe, and the 4 indicates four components, which are listed individually after the 4. If you don't specify an interlace value, the default of 16KB will be used.

Concatenation Volumes

Concatenated volumes enable you to combine space from multiple physical disks into a larger logical volume. Concatenations also enable you to expand active UFS volumes without needing to bring the server down. During the expansion, no write operations can be performed on the file system, but the inconvenience is much less than if you had to reboot the computer.

Data is written sequentially among the member components. Interlaces 1, 2, and 3 (and so forth) will be on the first component until it fills up. Then, data will be written to the second component, and so on.

Note

Because of the way disk reads and writes are handled, concatenated volumes use less CPU time than striped volumes.

Concatenated volumes are created by using the metainit command. The syntax is very similar to the syntax used for creating a stripe:

 # metainit volume_name #_stripes [components_per_stripe component_names]

Here is an example. If you want to create a concatenation made of three slices, you could use:

 # metainit d12 3 1 c0t2d0s7 1 c0t3d0s5 1 c0t4d0s2

The newly created d12 volume consists of three components (stripes), each made of a single slice per disk (the 1 in front of each component).

Note

The syntax for metainit differs only slightly for each type of volume you want to create. When using metainit, you must be careful because the syntax itself determines the type of volume you create. This will become evident again when you look at how to create RAID 1 and RAID 5 volumes, as well as soft partitions and transactional volumes.

Concatenations are also used as building blocks for mirrored volumes or volumes that you might want to easily expand later. To begin the process of mirroring a volume, you typically start by creating a concatenation of one volume. To do this, you would use a command such as:

 # metainit d55 1 1 c0t0d0s7

This creates a concatenation of one volume. Creating a mirror by using this concatenation is covered later in this chapter in the "Managing RAID 1 Volumes" section.

EXPANDING EXISTING FILE SYSTEMS

You can use concatenated volumes to expand existing file systems without needing to reboot your computer. In the following example, you will expand the /files1 file system, which is located on the /dev/dsk/c0t0d0s5 slice:

Unmount the existing file system.
```
 # umount /files1 
```
Expand the file system by using metainit. The first slice indicated in the command must be the slice that contains the existing data, or else the data will be corrupted.
```
 # metainit d44 2 1 c0t0d0s5 1 c0t1d0s2 
```
Edit the /etc/vfstab file to reference the new d44 volume instead of the c0t0d0s5 slice.
Remount the file system.
```
 # mount /files1 
```

EXPANDING STRIPED VOLUMES

By expanding a striped volume, you have created a concatenated stripe. This is done with the metattach command. Use the following syntax for metattach:

 # metattach volume_name component_names

If you have a striped volume named d53, you could attach an additional slice with the following command:

 # metattach d53 c0t4d0s2

To add additional slices to the volume, simply specify each component you want to add.

If the volume you are expanding contains a UFS file system, after you have expanded the volume, you need to run the growfs command to grow the UFS file system to utilize the full space of the volume. Although the growfs command works on mounted file systems, the file system will be unavailable for write operations during the growing period.

Removing Volumes

Volumes are deleted with the metaclear command. Before removing any volume, you must unmount it if it's currently mounted. After it's unmounted, use metaclear volume_name. For example, to remove the d92 volume, use:

 # metaclear d92

If the volume you removed has an entry in the /etc/vfstab file, be sure to remove the entry so Solaris does not try to mount the nonexistent volume during the next reboot or mountall command usage.

Creating RAID 0 with Solaris Volume Manager

To create RAID volumes in Solaris Volume Manager, you need to select the Volumes node to highlight it, as shown in Figure 12.6.

click to expand
Figure 12.6: Volumes node of Enhanced Storage

After you are in the Volumes node, click Action ➣ Create Volume and follow these steps:

Choose whether to use an existing state database or to create a new state database. If you already have one, there is no need to create another. Click Next.
Choose a disk set if you want to. If not, leave the disk set choice at <none>. Click Next.
Select the type of volume you want to create. After you have made your choice (Stripe for this example), click Next.
Specify a volume name for your volume. Make sure that the name you choose isn't already being used.
Choose the components you want to create the volume on. After you have made your selection, click Next.
Set the order of components in the volume and select an interlace value. Click Next.
Choose a hot spare pool, if you want to use one. Click Next.
Confirm your choices in the Review screen. If you need to make any changes, use the Back button. Otherwise, click Finish.

The computer will take a moment to create the volume. After it's finished, the volume will appear in the right pane of the Solaris Management Console.

Volumes cannot be expanded through the Solaris Management Console. To delete a volume, right-click it and choose Delete, or choose Edit ➣ Delete.

Managing RAID 1

Mirrored volumes contain multiple identical copies of data in RAID 0 volumes. Mirrors provide data redundancy but cost more to implement than standard volumes or RAID 1 volumes because mirrors require double the hard disk space.

Mirroring requires writing data to two or three hard disks instead of one, so it slows down your disk write speed. Data can be read from any of the mirrored components, meaning that disk reads are faster than from standard volumes. After the mirror is created, applications and users see it as a single volume; they don't have to worry about writing the data twice or where they're reading it from because SVM handles all of that in the background.

Solaris implements mirrors by using a series of submirrors. Mirrors must have between one and three submirrors. A one-way mirror would contain one submirror, but does not provide any data redundancy (however, you typically start the creation of a mirrored volume by creating one submirror-the volume that has the data-and then adding a second submirror for redundancy). For practical purposes, a two-way mirror (one that contains two submirrors) is sufficient for fault tolerance. By creating a three-way mirror, you can bring one submirror offline for backup purposes while retaining your online fault tolerance.

If you are using multiple submirrors, one or more can be detached from the mirror at any time, as long as one submirror is always attached. When submirrors are detached, Solaris Volume Manager keeps track of the data written to the active mirrored volume. After the detached submirrors are brought back online, the portion of data written during the downtime is written to the out-of-date submirrors. This portion of data that needs to be updated across all mirrors is called a resynchronization region.

Combining RAID 0 and RAID 1

Solaris supports both RAID 1+0 and RAID 0+1. RAID 1+0 is a mirrored volume that is then striped, and is not always possible to create because of software limitations. However, if the submirrors are identical and made up of disk slices (not soft partitions), implementing RAID 1+0 should not be a problem.

RAID 0+1 is having striped sets that are mirrored. Technically, all RAID 1 in Solaris is RAID 0+1, because you create either a stripe or a concatenated volume first and then mirror it. Although at first this sounds overly complex for creating a mirror, it has a distinct advantage.

Imagine that you have a two-way mirror of a striped volume. For the sake of ease, assume that slices A, B, and C are part of submirror 1, and slices D, E, and F are part of submirror 2. Slices D, E, and F mirror A, B, and C, respectively. In Solaris, if slice A were to fail, the system could operate normally with the data from slice D. In fact, three slices could fail, and as long as one good copy of each set of data was present, data reads would proceed normally. If both parts of a mirror were to fail, such as slices A and D, the mirrored volume would function like a hard disk that has bad blocks. The data on slices B and C would still be accessible.

If Solaris used strict RAID 1 instead of RAID 0+1, then the previous scenario would not be possible. If one slice were to fail, then one entire side of the mirror would fail. A second slice failure on the second submirror would cripple the storage unit.

RAID 1 Read and Write Policies

Solaris Volume Manager enables you to configure three mirror read policies and two mirror write policies. Using the right policy for your mirrored volumes can increase your overall disk access speed. These policies are configured when the mirror is set up but can be changed while the mirror is operational.

Here are the three read policies:

Round robin This is the default read policy, which attempts to balance reads across all submirrors. Disk reads are made in a round-robin fashion from all submirrors in the mirrored volume.

Geometric This policy divides reads among submirrors based on logical disk block addresses. When you configure a hard disk, each block on the disk is assigned a logical address. Because the data is mirrored, the logical block addresses will be identical across submirrors.

For example, with a two-way submirror, each disk is "assigned" half of the logical block addresses from which to read. Data recalled from the range assigned to the first submirror will be read exclusively from the first submirror.

The geometric policy can reduce disk read time, because the seek time for data is effectively reduced. The amount of speed increase depends on your computer, configuration, and volume of disk access.

First With the first policy, all reads are directed to the first submirror. Only use this policy if the first submirror is significantly faster than other submirrors, or if one of the submirrors is offline.

You can use any of the three read policies with either of the two write policies. Different combinations of policies can be used on different mirrored volumes. There are two write policies, parallel and serial:

Parallel The default write policy, parallel writes data to all submirrors simultaneously.

Serial Serial writes to one submirror at a time. The write operation must be completed on the first submirror before it can be started on the second submirror. If one submirror is unavailable, this option might increase disk efficiency.

Resynchronization

If a submirror fails for any reason, or is taken offline for backup or maintenance purposes, the data between the submirrors can become out of synch. When the failed or offline submirror is brought back online, Solaris Volume Manager automatically resynchronizes the submirrors to ensure that users are able to access the proper data. Not only is resynchronization automatic-it's mandatory. You do not need to do anything to make it happen.

During the resynchronization process, all involved submirrors are available for reading and writing data. Solaris Volume Manager can perform three types of synchronization: full, optimized, and partial. Synchronization order is determined by a pass number.

Full resynchronization A full resynchronization happens when a new submirror is attached to a mirror. All data is copied to the new submirror. After the data is written to the new submirror, the data is available for reading.

Optimized resynchronization If you experience a system failure or if you bring a submirror offline, an optimized resynchronization occurs when the system or submirror is brought back online. Solaris Volume Manager tracks which regions of the mirror might be out of synch (known as dirty regions) and synchronizes those regions only.

Partial resynchronization If you've replaced part of a mirror, say a slice that failed, Solaris will perform a partial resynchronization after the replacement is complete. Data will be copied from working slices to the new slice.

Pass numbers When performing a resynchronization, Solaris Volume Manager uses a pass number to determine the order in which mirrors are resynchronized. The default pass number is 1, and smaller pass numbers are resynchronized first. A pass number of 0 indicates that mirror resynchronization should be skipped. Pass numbers should be set to 0 only on read-only mirrors.

Note

If a computer with a mirror on the root (/), /usr, or swap partitions is booted into single-user mode (run level S), the mirror will display the Needing Maintenance state when viewed with metastat. This situation, although apparently problematic, is normal. It happens because the metasynch -r command, which is normally run during boot, is interrupted when Solaris is booted into single-user mode. After the system is rebooted into multiuser mode, the error will disappear.

Creating and Managing RAID 1 Volumes

Mirrors are created by using the metainit command or through the Solaris Management Console. You will start by creating a one-way mirror, called the primary submirror. Then you will attach additional submirrors.

Note

It is not always necessary to create a one-way mirror first and then attach additional submirrors. However, following this procedure ensures that your data is properly replicated across all submirrors.

Here are some guidelines for creating and using mirrors:

Create a RAID 0 stripe or concatenation to serve as components for your mirror.
When creating a mirror of an existing file system, be sure to create the primary submirror on the existing file system, or data will be destroyed.
You can mirror unmountable file systems, such as root (/) and /usr, for fault tolerance.
To increase storage efficiency, use same-sized components for creating mirrors.
If possible, keep mirrored hard disks on different disk controllers, and mirrored volumes on different hard disks. This reduces the chance of a catastrophic failure.
Do not mount submirrors directly; mount only the mirror. Mounting submirrors could cause data corruption or crash the system.
Just because you have a mirrored volume doesn't mean you can avoid making system backups. You still need to do this regularly.

The method you will use to create a mirror depends on the disk configuration you have before you mirror, as well as the volumes you want to mirror. Here is the first example, creating a mirror from two unused slices:

Create two RAID 0 volumes (stripes or concatenations) to use as submirrors.
Create a one-way mirror by using metainit volume_name -m submirror_name. Using this example, you will create a mirror named d40, with a submirror of d41.
```
 # metainit d40 -m d41 
```
Add a second submirror by using metattach mirror_name submirror_name. In this example, you are adding the d42 submirror to the d40 mirror created in step 2.
```
 # metattach d40 d42 
```

After creating the d40 mirror, you should be able to use metastat or the Solaris Management Console to see the d41 and d42 submirrors as part of the d40 mirror. You can also create a mirror without creating a primary submirror first, but it's not recommended because proper synchronization cannot be guaranteed. (However, if you know that the d41 volume is empty, it's not a major problem.) If you wanted to perform the preceding example in one step, without guaranteeing synchronization, you could use the following command:

 # metainit d40 -m d41 d42

Solaris will set up the mirror d40, but it will also give you a warning about using metainit in that way.

Setting up a mirror of an existing file system is similar to the previous procedure. If you are going to mirror an existing file system, though, you want to be sure to create a one-way mirror first and then attach the second submirror. Mirroring an umountable file system, such as the root (/) or /usr, requires a few extra steps, including rebooting. Here is how to create a mirror for an existing file system:

Identify the slice that contains the file system you want to create a mirror of. Create a new RAID 0 volume on that slice. For example, if the slice is c0t1d0s0, you could use:
```
 # metainit d101 -f 1 1 c0t1d0s0 
```
Create a second RAID 0 volume on the slice you want to make the second submirror. For example, if the slice is c0t2d0s0, you could use:
```
 # metainit d102 1 1 c0t2d0s0 
```
Create a one-way mirror using the metainit command. In this example, you are creating a mirror named d100.
```
 # metainit d100 -m d101 
```
Unmount the file system (if possible) by using the umount command.
If you are mirroring a file system other than the root (/), edit the /etc/vfstab file to mount the mirror instead of the old device. For example, you would change all instances of c0t1d0s0 to d100. Failure to edit the /etc/vfstab file to mount the mirror instead of the submirror could cause data corruption.
Remount your newly mirrored file system. How you do this depends on the type of file system you are mirroring.
1. If you are mirroring a file system that can be unmounted, then unmount and remount the file system with the umount and mount commands.
2. If you are mirroring an unmountable file system other than the root (/), reboot your system.
3. If you are mirroring the root file system (/), run the metaroot d100 command. (Replace d100 with the name of your mirror if you used a different volume number.) Then run the lockfs -fa command and reboot.
Attach the second submirror.
```
 # metattach d100 d102 
```
Verify the creation of the mirrored volume.
```
 # metastat d100 
```

Tip

If you are mirroring the root file system (/), it's a good idea to record the path to the alternate boot device. That way, if your original root fails, you can modify the boot device in OpenBoot to boot to the mirrored root.

Note

If you're mirroring the root file system (/) on an IA system, be sure to install the boot information on the alternate boot disk before creating the RAID 0 or RAID 1 device.

SUBMIRROR MANAGEMENT

In addition to creating mirrors, you have a variety of commands to attach and detach submirrors, as well as bring submirrors online and offline. Submirrors are attached with the metattach command and detached with the metadetach command. You can have up to three submirrors per mirror.

To attach a submirror, use the metattach mirror submirror command. For example, to add the d33 submirror to the d30 mirror, you would use:

 # metattach d30 d33

To detach a submirror, the metadetach command is used. It has the same syntax as metattach. To detach the d33 submirror, you would use:

 # metadetach d30 d33

The metaoffline and metaonline commands are used to bring submirrors offline and online, respectively. Both commands follow the syntax of metattach. To bring the d33 submirror offline, use:

 # metaoffline d30 d33

And to bring it back online, use:

 # metaonline d30 d33

RAID 1 VOLUME STATUS

The metastat command is used to check the status of volumes. Submirrors can be in one of three states: okay, resynching, or needs maintenance. The okay and resynching statuses indicate that the submirror is operational, although resynching does indicate that a resynchronization is in progress. If your submirror is in the needs maintenance state, then all reads and writes from this submirror have been suspended, and you need to investigate further.

Slice states give you more detail than do submirror states. Submirror slices can be in one of four states: okay, resynching, maintenance, and last erred. As with submirror states, okay and resynching are acceptable states.

If you encounter the maintenance state, the component has failed, and no reads or writes are being performed. You must enable or replace the failed component. The metareplace -e command will give you further information as to how to proceed.

The last erred state means that the slice has failed to replicate information because of another slice failure. If you see a maintenance state, you will usually encounter a last erred state on another slice as well. Fix the slice needing maintenance, and the error should clear itself up after resynching.

Here is a sample metastat output for the d10 mirrored volume:

 # metastat d10 d10: Mirror     Submirror 0: d12       State: Okay     Submirror 1: d13       State: Resyncing     Resync in progress: 12 % done     Pass: 1     Read option: roundrobin (default)     Write option: parallel (default)     Size: 10240272 blocks d12: Submirror of d10     State: Okay     Size: 10240272 blocks     Stripe 0:         Device     Start Block   Dbase            State Reloc Hot Spare         c0t2d0s3          0      No                Okay   Yes d13: Submirror of d10     State: Resyncing     Size: 10477152 blocks     Stripe 0:         Device     Start Block  Dbase             State Reloc Hot Spare         c0t0d0s7       9072     Yes                Okay   Yes Device Relocation Information: Device   Reloc  Device ID c0t2d0   Yes    id1,dad@AWDC_WD200BB-00DGA0=WD-WMADL1128126 c0t0d0   Yes    id1,dad@AST320011A=3HT3DP98 #

As you can see, the d13 submirror is resynching and is currently 12 percent complete. This is because the mirror was just created and is in the process of initially synchronizing all submirrors.

CHANGING RAID 1 VOLUME OPTIONS

The metaparam command is used to change volume options, such as read policies, write policies, and pass number. The -r switch is used to change read policy, the -w switch changes write policy, and the -p switch changes pass number. For example, to change the read policy of mirror d10 to geometric, you would use:

 # metaparam -r geometric d10

You can also use metaparam to check the volume's parameters. After you have changed the read policy, verify that your setting took effect:

 # metaparam -r geometric d10 # metaparam d10 d10: Mirror current parameters are:     Pass: 1     Read option: geometric (-g)     Write option: parallel (default) #

UNMIRRORING

There might be a time when you want to delete a mirrored volume. To do this, you must first unmount the volume, detach the mirror, and then clear the mirror.

At least one of the submirrors must be in the okay state before you can detach the mirror. Here is an example of detaching a mirror. The mirror is d10, composed of the d12 and d13 submirrors. The mirror is mounted as /docs. Here are the steps:

Verify that at least one submirror is in the okay state.
```
 # metastat d10 
```
Unmount the file system.
```
 # umount /docs 
```
Use metadetach to detach the first submirror. This submirror will be used for the remaining file system after the mirror is destroyed.
```
 # metadetach d10 d12 
```
Delete the mirror and the second submirror with the metaclear command.
```
 # metaclear -r d10 
```
If necessary, edit the /etc/vfstab file to point to the underlying volume instead of the mirror. In this case, you would change all references to d10 to d12.

For a file system that cannot be unmounted, such as the root (/), you need to follow a slightly different procedure:

Verify that at least one submirror is in the okay state.
```
 # metastat d10 
```
Use metadetach on the mirror that contains the root file system (/).
```
 # metadetach d10 d13 
```
Run the metaroot command on the slice that will be the boot slice.
```
 # metaroot /dev/dsk/c0t0d0s0 
```
Reboot.
Clear the mirror and clear the remaining submirror.
```
 # metaclear -r d10 # metaclear d13 
```

The mirrored root should be cleared.

Managing RAID 1 with the Solaris Management Console

The Solaris Management Console enables you to create mirrors through a graphical interface that is much easier to use than the command line. First, open the Solaris Management Console and navigate to the Volumes node. From there, choose Action ➣ Create Volume and follow these steps:

Choose to create an additional state database replica or use an existing one. Click Next.
Choose a disk set. If you do not want to use a disk set, choose <none> and click Next.
Select the Mirror (RAID 1) radio button and click Next.
Specify a volume name and click Next.
Select a volume to be a primary submirror. After you have made your selection, click Next.
Select your secondary submirrors. Click Next.
Configure your mirror read, write, and pass number parameters. Click Next.
The final screen displays your choices. If you need to change any parameters, use the Back button. Otherwise, click Finish.

You can change the mirror policies by selecting the mirror to highlight it in the right pane of the Solaris Management Console and choosing Action ➣ Properties. Select the Parameters tab and click the Set Parameters button. You will see a screen like the one following step 7.

To delete the mirror, right-click it in the Solaris Management Console and choose Delete, or choose Edit ➣ Delete.

Managing RAID 5

RAID 5 volumes combine speed with data redundancy. Like a RAID 0 volume, RAID 5 volumes write data in stripes, and like RAID 1, RAID 5 provides fault tolerance. RAID 5 isn't as fast as a striped volume but is more efficient in terms of disk usage than a mirrored volume.

The Solaris Volume Manager will automatically resynchronize a RAID 5 volume if a component fails (and is replaced) or if it detects that data is out of synch. RAID 5 volumes cannot be used for the root (/), /usr, or swap file systems. Creating a RAID 5 volume on an existing volume will destroy all data on that volume.

After you have created a RAID 5 volume, you can concatenate the volume by adding additional components. However, the new components will not have parity contained on them. The parity blocks assigned during the original creation of the volume will hold parity for the new interlaces. Concatenated RAID 5 volumes are not designed for long-term use; rather, they are a temporary solution if your RAID 5 volume needs more space and you do not have enough time to back the volume up, re-create it, and restore data.

Here are some guidelines for creating and using RAID 5 volumes:

You must use at least three components. Using three or more physical hard disks is recommended.
RAID 5 volumes cannot be part of a mirror or other striped volume, nor can RAID 5 volumes be concatenated with each other.
Creating a RAID 5 volume from a component containing existing data will destroy the existing data.
RAID 5 volumes have a configurable interlace value, just like RAID 0 volumes.
A RAID 5 volume can operate with a single component failure. Multiple component failures will cause the volume to fail.
When creating a RAID 5 volume, use components of the same size.
If your volume writes more than 20 percent of the time, RAID 5 volumes are inefficient because of parity calculation. Using a mirrored volume would be a better solution.

As with creating RAID 0 and RAID 1 volumes, you need to be the superuser or have an equivalent role to create RAID 5 volumes. RAID 5 volumes are created by using the metainit command with the -r switch. Here is an example of creating a RAID 5 volume consisting of three slices:

 # metainit d70 -r c0t1d0s2 c1t1d0s2 c1t2d0s2

This command creates a RAID 5 volume named d70. If you wanted to specify an interlace value, you could do so with the -i switch, just as you did when you created a striped volume.

You will not be able to use the RAID 5 volume immediately. Solaris Volume Manager needs to initialize the volume, which might take several minutes, depending on how large the components are as well as how busy the computer is.

RAID 5 Maintenance, Repair, and Expansion

To check the status of your RAID 5 device, use the metastat command. For example, to check the status of the device you just created, you could use:

 # metastat d70

The metastat command will show the status of the RAID 5 volume, as well as the status of each member component in the volume. After you create a RAID 5 volume, the state of the volume will be initializing. After it's done initializing, the status will change to okay. If there are any problems, the needs maintenance state will be displayed. If you see a needs maintenance status, look to see which component is causing the problem.

Here is an example of a metastat output indicating that a RAID 5 component needs to be replaced:

 # metastat d70 d70: RAID State: Needs Maintenance     Invoke: metareplace d70 c0t5d0s2 <new device>     Interlace: 32 blocks     Size: 16305020 blocks Original device:     Size: 16305520 blocks     Device         Start Block       Dbase   State         Hot Spare     c0t3d0s2           330           No      Okay     c0t4d0s2           330           No      Okay     c0t5d0s2           330           No      Maintenance     c0t6d0s2           330           No      Okay

This output shows that the d70 RAID 5 volume needs maintenance. The line beginning with Invoke: even shows you which command you should use to fix the problem. To fix the problem, you need another slice as large or larger than the slice you are replacing, c0t5d0s2 in this case. If you had another suitable device available, say c0t7d0s2, you could issue the following command:

 # metareplace d70 c0t5d0s2 c0t7d0s2

You will receive a message that c0t5d0s2 was replaced with c0t7d0s2. Running metastat d70 again should show that the volume is resynching.

There is one other way in which you can use metareplace. It's for cases when you are replacing a failed component with another component using the same identifier. Of course, two slices cannot have the same logical ID. However, perhaps the original slice had a soft failure, or you replaced the failed hard disk and labeled the new disk the same as the disk it replaced. In that case, you could use the following command (assuming the same problem component as in the last example):

 # metareplace -e d70 c0t5d0s2

This command will reactivate the c0t5d0s2 component and begin resynchronizing the RAID 5 volume.

RAID 5 volumes can also be expanded by using the metattach command. For example, if you wanted to add the c0t8d0s2 component to the existing d70 volume, you could use:

 # metattach d70 c0t8d0s2

But because parity will not be stored on this new component, it's recommended that you not use concatenated RAID 5 volumes as a long-term storage solution. If you do expand your RAID 5 volume, be sure to use the growfs command to expand the UFS file system residing on the volume as well.

RAID 5 volumes are removed with the metaclear command.

Managing RAID 5 with the Solaris Management Console

If you prefer a graphical interface, the Solaris Management Console provides complete RAID 5 administration. First, open the Solaris Management Console and navigate to the Volumes node. From there, choose Action ➣ Create Volume and follow these steps:

Choose to create an additional state database replica or use an existing one. Click Next.
Choose a disk set. If you do not want to use a disk set, choose <none> and click Next.
Select the RAID 5 radio button and click Next.
Specify a volume name and click Next.
Select the components you want to make part of the volume, as shown in the following graphic. After you have made your selection, click Next.
Choose an interlace value in KB, MB, or blocks. Click Next.
Choose whether you want to use a hot spare pool. You can create a new hot spare pool now if you want. Make your selection and click Next.
The final screen displays your choices. If you need to change any parameters, use the Back button. Otherwise, click Finish.

The Solaris Volume Manager will create the RAID 5 volume. During initialization of the volume, system performance will be degraded. The RAID 5 volume will appear in the right pane of the Solaris Management Console, as shown in Figure 12.7, before initialization of the volume is complete.

click to expand
Figure 12.7: RAID 5 volume

By highlighting the RAID 5 volume in the right pane of the Solaris Management Console and choosing Action ➣ Properties, you can add additional components (on the Components tab) or view volume performance statistics (on the Performance tab).

To delete the volume, right-click it in the Solaris Management Console and choose Delete, or choose Edit ➣ Delete.

Managing Soft Partitions

A soft partition enables you to overcome the traditional Solaris limit of eight slices per hard disk. When you create a soft partition, you can create as many logical volumes as you want to within the soft partition. You are, however, still limited to 8192 logical volumes because that's the maximum number of volumes that Solaris Volume Manager supports.

Each soft partition is named, just as other volume types (stripes, mirrors, and RAID 5) are named. Soft partitions can be created on top of hard disk slices, or on stripes, mirrors, or RAID 5 volumes. However, nesting of soft partitions is not allowed. For example, you cannot create a soft partition on a striped volume and then create a mirror on the soft partition.

Here are some guidelines for creating and using soft partitions:

To allow soft partitions on an entire hard disk, create a single slice that occupies the entire hard disk and then create the soft partition on that slice.
The maximum size of a soft partition is dependent upon the size of the slice on which it is created.
For speed or data redundancy, create a RAID 0, 1, or 5 volume and then create soft partitions within that RAID volume.

The management of soft partitions is done with the same Solaris Volume Manager commands that you are already familiar with: metainit, metastat, metattach, and metaclear.

Using the Command Line to Manage Soft Partitions

Soft partitions are created with the metainit -p command. The syntax for metainit is similar to what you've seen before, but with soft partitions, you need to specify a partition size as well. For example, to create a 12GB soft partition named d33 on the c0t2d0s0 slice, you would use:

 # metainit d33 -p c0t2d0s0 12g

One of the interesting metainit switches you can use with soft partitioning is -e. The -e switch tells metainit to reformat the entire disk as slice 0, except for a 4MB slice 7 reserved for a state database replica. This switch is useful if you know that you want your entire hard disk to be used for soft partitions.

Warning

Using metainit -e will destroy any existing data on the hard disk.

After your soft partition is created, you can use the metastat command to view the volume's status. To view the status of the soft partition just created (d33), you would use:

 # metastat d33

To increase the size of a soft partition, use the metattach command. With metattach, you must specify the soft partition you are growing, as well as the amount you want to add to the soft partition. For example, to grow d33 by 5GB, you would use:

 # metattach d33 5g

You might think this metattach command looks wrong; after all, it doesn't specify where to take the additional disk space from. When expanding a soft partition, the space is taken from the available free space designated for soft partitions. If you do not have enough space available, you will get an error message.

Soft partitions are removed with the metaclear command. You do have a few options as to what to delete, though, because soft partitioning on a volume can get complex. When using metaclear, you can specify a soft partition to delete or you can instruct metaclear to delete all soft partitions on a particular component. For example, metaclear -p c0t2d0s0 clears all soft partitions on the c0t2d0s0 slice, whereas metaclear -p d33 deletes only the d33 soft partition.

Using the Solaris Management Console to Manage Soft Partitions

Once again, it's time to look at your favorite graphical management interface: the Solaris Management Console. In the Solaris Management Console, navigate to the Volumes node. To create a soft partition, choose Action ➣ Create Volume and follow these steps:

Choose to create an additional state database replica or use an existing one. Click Next.
Choose a disk set. If you do not want to use a disk set, choose <none> and click Next.
Select the Soft Partition(s) radio button and click Next.
Select the components on which you want to create the soft partition. You will notice that if you have RAID volumes, they will appear here as valid choices. After you have made your selection, click Next.
Choose how you want to allocate the space selected for soft partitions. Click Next.
Provide a name for the soft partition you just created and click Next.
The final screen displays your choices. If you need to change any parameters, use the Back button. Otherwise, click Finish.

The Solaris Volume Manager will create your soft partitions; this might take some time. Figure 12.8 shows the finished product. Volumes d33 through d44 are soft partitions. Volume d10 is a striped volume, and volume d70 is a RAID 5 volume.

click to expand
Figure 12.8: Solaris Management Console Volumes node

If you have space left in your soft partition, you can grow any of your existing soft partitions by selecting Action ➣ Properties and clicking the Grow button on the General tab. Soft partitions can be deleted individually, by right-clicking each one in the Solaris Management Console and choosing Delete, or choosing Edit ➣ Delete.

Overview of Transactional Volumes

There are two types of file system logging available in Solaris 9: transactional volumes and UFS logging. A transactional volume is a logical volume on a hard disk slice used specifically to log UFS transactions.

When you use file system logging, all disk reads or writes are first written to a log file and then applied to the file system later. If the hard disk is incredibly busy, file system logging can help alleviate poor performance by saving the information and writing it to the disk when the disk is less busy.

With the release of Solaris 8, UFS logging was introduced. UFS logging provides the same capabilities of transactional volumes, but provides superior performance and lower system administration overhead. Another advantage of UFS logging is that it enables administrators to log the root file system (/), which transactional volumes do not.

Because of the obvious superiority of UFS logging over transactional volumes, Sun is quick to promote the use of UFS logging. In fact, transactional volumes are scheduled to be removed in an upcoming Solaris release. If you want to log your UFS file systems, use UFS logging.

Note

For information on how to enable UFS logging, see Chapter 7.

A transactional volume consists of two devices: the master device and the log device. The master device is the device that's being logged, and the log device is the slice or volume that contains the log file. Master and log devices can be a physical slice or a logical volume, but should always be separate devices. Multiple master devices can share one log device. For speed and data redundancy, you can place log files on RAID volumes.

Transactional volumes are managed with the meta* commands used to manage other volume types.

Managing Hot Spare Pools

RAID 1 and RAID 5 volumes provide data redundancy and fault tolerance. If one component in your RAID 1 or RAID 5 volume fails, the volume can still perform disk reads and writes. Granted, system performance will be slower, but at least Solaris will still be operational.

Multiple component failures can cause the operating system to crash, however. That's where hot spare pools become critical. A hot spare pool is a collection of slices used to provide additional fault tolerance for RAID 1 or RAID 5 volumes in the event of a component failure. Individually, the slices that are part of the pool are called hot spares.

If you are using a hot spare pool and a RAID 1 or RAID 5 component fails, Solaris Volume Manager automatically replaces the failed component with a hot spare. The hot spare is then synchronized with the rest of the volume. For hot spares to work, redundant data must be available. Because of this, hot spare pools do not work with RAID 0 volumes or one-way mirrors.

The only disadvantage to using hot spare pools is cost. The slices in the hot spare pool cannot be used to hold data while they are in the pool. They must remain idle and are used only in the event of a component failure. Consequently, you must spend more on hard disks than you would if you didn't use hot spares. However, if computer uptime is critical to you, the benefits far outweigh the costs.

Understanding How Hot Spare Pools Work

When you create a hot spare pool in Solaris Volume Manager, you specify one or more slices that will be part of the pool. You then associate the pool with one or more active RAID 1 or RAID 5 volumes. You can create multiple pools if you choose, and individual hot spares can be part of more than one pool. Although a hot spare pool can be associated with several RAID volumes, each RAID volume can use only one hot spare pool.

In the event of a component failure, Solaris Volume Manager searches the hot spare pool in the order in which the hot spares were added to the pool. After Solaris Volume Manager finds a hot spare as large or larger than the component that needs to be replaced, that hot spare is taken, marked "In-use," and synchronized with the impaired volume. The used hot spare cannot be used by any other volume, unless the original volume is repaired and the hot spare returned to "Available" status in the pool.

Because Solaris Volume Manager searches the pool based on the order that the slices were added, it's a good idea to add slices in order of size; add the smallest ones first. If you do this, then you reduce the chance of Solaris Volume Manager using a slice that's far bigger than the one it's replacing and wasting space. For example, if you have a 1GB slice fail and you can replace it with a 1GB or a 10GB hot spare, which would you choose? Obviously, all you need is the 1GB hot spare. If Solaris Volume Manager sees the 10GB hot spare first, though, it will use that slice.

As with RAID volumes, hot spare pools can be managed from the command line or through the Solaris Management Console.

Managing Hot Spare Pools from the Command Line

You must give a hot spare pool a name, just as you did RAID volumes. However, hot spare pools are named with an hsp prefix. Some administrators like to name their hot spare pools similarly to the RAID volume that they serve. For example, if your RAID 5 volume is d10, then your hot spare pool for that volume could be hsp010.

Note

Hot spare pools will have a three-digit number in their name. If you tell Solaris Volume Manager to create the hot spare pool hsp10, it will be created as hsp010.

Hot spare pools are created with the metainit command. If you wanted to create a hot spare pool named hsp010 containing two disk slices, you could use:

 # metainit hsp010 c0t2d0s2 c0t3d0s2

When you create a hot spare pool, be sure that the slices you pick as hot spares are large enough to replace the components that they are designed to replace. Solaris Volume Manager doesn't check this for you, nor will it give you an error if your hot spare slices are too small.

Adding an additional slice to the hot spare pool is done with the metahs command. The metahs command has several important switches. The -a switch adds the slice to one hot spare pool. Here's how you would add a slice to the hot spare pool created earlier:

 # metahs -a hsp010 c0t3d0s2

Or, if you had several hot spare pools and wanted the same slice added as a hot spare to all pools, you would use:

 # metahs -a all c0t3d0s2

The metahs command is also used to manage several details of hot spare pools. Table 12.3 shows some of the options for the metahs command.

Table 12.3: *metahs* Usage Options
Syntax	Example	Description
metahs -a pool slice	metahs -a hsp010 c0t2d0s2	Adds the slice to the specified hot spare pool. You can also use all instead of a pool number to add the slice to all pools.
metahs -d pool slice	metahs -d hsp011 c0t2d0s2	Removes the slice from the hot spare pool.
metahs -e slice	metahs -e c0t2d0s2	Enables the hot spare slice. This is used after a slice is repaired or if the slice needs to be placed back in "Available" status.
metahs -i pool	metahs -i hsp013	Displays the status of the hot spare pool.
metahs -r pool \ old_slice new_slice	metahs -r hsp014 c0t2d0s2 \ c0t3d0s2	Replaces the old hot spare slice with a new hot spare slice in the pool.

The all argument can be used in place of slice designations with the -a, -d, and -r switches. Although specifying all slices at once can be useful, it can also be dangerous. Use it carefully, or you might end up re-creating your hot spare pool configuration.

After your hot spare pool is created, you can attach it to a RAID 1 or RAID 5 volume. This is done with the metaparam command; here's the syntax:

 # metaparam -h hot_spare_pool component

So, to add your hot spare pool hsp010 as the pool for the d10 RAID 5 volume, you would use:

 # metaparam -h hsp010 d10

If you want to remove the hot spare pool association, use none instead of hsp010 in the previous command example.

To check the status of your hot spare pool, use the metastat command. Here is an example:

 # metastat hsp010 hsp010: 2 hot spares         c0t2d0s2             Available          44000 blocks         c0t3d0s2             Available          56000 blocks #

Hot spares can be in one of three states: Available, In-use, or Broken.Available means that the hot spare is ready for use; In-use means that it's being used. The Broken state indicates that there is a problem with the hot spare or that all hot spares are being used.

Managing Hot Spare Pools from the Solaris Management Console

The Solaris Management Console enables you to manage hot spare pools through a convenient graphical interface. In the Solaris Management Console, navigate to the Hot Spare Pools node as shown in Figure 12.9.

click to expand
Figure 12.9: Hot Spare Pools

To create a hot spare pool, click Action ➣ Create Hot Spare Pool and follow these steps:

Choose whether to use an existing state database replica or to create a new state database replica. Click Next.
Choose a disk set. If you do not want to use a disk set, choose <none> and click Next.
Select a name for the hot spare pool you are creating and click Next.
Select the components you want to place in the hot spare pool. After you have made your selection, click Next.
The Review screen displays a summary of your selections and the commands that will be executed to configure the system. If you want to make any changes now, use the Back button. Otherwise, click Finish.

Solaris Volume Manager will configure your hot spare pool. By right-clicking the hot spare pool (as shown in Figure 12.9) and choosing Properties, or by highlighting the hot spare pool and choosing Action ➣ Properties, you can configure additional information about the pool. For example, adding, deleting, replacing, and enabling hot spares is done from the Hot Spares tab, and attaching hot spares to a RAID 1 or RAID 5 volume is done from the Used By tab.

Hot spare pools can be deleted by right-clicking them in the Solaris Management Console and choosing Delete, or choosing Edit ➣ Delete.

Managing Disk Sets

A disk set (also known as a shared disk set) is a set of disk drives that can be shared by multiple hosts. A disk set contains its own volumes and hot spares, and in a sense, is like its own separate, independent drive configuration within your Solaris computer. Disk sets can be shared between hosts, but only one host can control the disk set at once.

If a host with a shared disk set fails, then another host can take over the failed host's disk set. This is called a failover configuration. The version of Solaris Volume Manager supplied with Solaris 9 does not provide all the necessary components needed to fully implement failover configurations. To provide this, you need Sun Cluster, Solstice HA (High Availability), or another third-party HA package.

Each host has one disk set by default, known as the local disk set. The devices configured within the local disk set are not shared and cannot be taken over by another host in the event of a system failure. By default, all storage devices and volumes are considered to be part of the local disk set.

Because Solaris 9 does not support the full management of disk sets, this chapter doesn't cover them in great detail. However, here are some basic facts for you:

Volumes and hot spare pools built within a shared disk set must be built upon hard disks contained in that shared disk set.
Volumes created in shared disk sets cannot be mounted by the /etc/vfstab file.
State database replicas are created automatically by the Solaris Volume Manager.
When you add disks to a shared disk set, Solaris Volume Manager will repartition the disk automatically, placing the state database replica on slice 7.
Shared disk sets are given a name, and that name becomes part of the full path of the volume. For example, if your disk set is named set01 and you have a RAID volume named d0, the full path to the block device would be /dev/md/set01/dsk/d0.

The default maximum number of disk sets per computer is four, as specified by the md_nsets parameter in the /kernel/drv/md.conf file. You can edit this parameter to support a maximum of 32 disk sets. The number of disk sets you create is actually one less than whatever this parameter is set to, because the default local disk set counts as one.

Disk sets are created and managed with the metaset command or from the Solaris Management Console. For more information, see man metaset.

Real World Scenario: Optimizing RAID on Your Server

You just got permission from management to upgrade your Solaris 8 server to Solaris 9. It took a lot of bargaining, but you also appropriated funds to purchase an additional SCSI controller and new hard disks to place in the upgraded server. When your new hardware arrives, you will have two SCSI controllers in the server, along with two 20GB hard disks and ten 40GB hard disks (even though you could have gotten bigger drives, you got a great deal on the 40GB ones and you decided to impress management with your economic savvy).

Because your office runs on a Monday-Friday schedule, you are planning to come in on the weekend to perform a complete upgrade of the server. You are going to back up the existing data on the server, redesign the volume structure as necessary, and reinstall the data. You are also going to place a large amount of data on this server from an older server that desperately needs to be retired.

Here is what you will need to accommodate: a bootable system disk (of course), user storage for approximately 150 users (users are given, on average, 200MB of storage each on the server), two mission-critical databases (one customer service and one development), and various applications and miscellaneous data. The two databases are perhaps the most important. The customer service database is about 25GB, and the engineering one is nearly 40GB. Both databases must be accessible at all times.

There are a variety of ways you could tackle this problem with your newly available hardware. Here is one solution (feel free to design your own, and if possible, debate the merits of each design with your peers). Use one of the 20GB hard disks for the root (/) and /usr partitions (slices 0 and 6). This, of course, will be your system disk. Mirror the system disk with your other 20GB hard disk. Create a RAID 0 volume from two of the 40GB hard disks for user file storage. (Currently, you need about 30GB for user storage, and this gives you room to grow.) Create two RAID 5 volumes (three 40GB disks each), one for each database. This will provide a speed increase, as well as fault tolerance for the databases. Use one 40GB hard disk for applications and miscellaneous storage. You could perhaps create one slice and use soft partitions if you would like. Make the remaining 40GB hard disk a hot spare. Assign it to both of the RAID 5 volumes.

Again, this is only one possible solution that could be used. However, the volume types presented in this chapter are there to make your storage situation easier to deal with. Use them and combine them as you see fit, to optimize your storage and meet your storage needs.