While much is known about standard SCSI disks, knowledge of Sun StorEdge T3b arrays is less common. This section aims to address this issue. The Sun StorEdge T3b array is the second release of the Sun StorEdge T3 array. Upon release, it became known as the Sun StorEdge T300 array, and soon changed to the Sun StorEdge T3 array. The Sun StorEdge T3b array is a mid-life upgrade to the Sun StorEdge T3 line. It consists of a new controller and an increase in the size of its cache from 256 Mbytes to 1 Gbyte. Sun StorEdge T3b Array ArchitectureSun StorEdge T3b arrays can be deployed in two ways.
The standalone workgroup array is more performant than partner-pairs but this is at the expense of resiliency. Regardless of deployment method, the basic hardware architecture is the same. The Sun StorEdge T3b array has the following characteristics:
LUN RAID ConfigurationA Sun StorEdge T3b array LUN must be configured as a hardware RAID volume. RAID levels are discussed in "RAID Explained for Directory Administrators" on page 459, but for now, it is only necessary to be aware of the basics of RAID, and what levels can be configured on a Sun StorEdge T3b array's LUNs. The following levels can be used:
Given that a Sun StorEdge T3b array can have up to two LUNs, the combinations of volumes shown in TABLE 8-1 can be used. Table 8-1. Sun StorEdge T3 Array Volumes
Note The hot spare is only applicable to RAID 5 and RAID 1 volumes. It provides further resiliency over and above that already provided by these RAID levels. Should a disk fail, the hot spare will automatically be swapped into the volume and populated with data and parity information (RAID 5 only) to bring the volume up to its full complement of disks. LUN Block Size ConfigurationTo create a volume, a blocksize must be set. The Sun StorEdge T3b array has three possible block sizes: 16 Kbytes, 32 Kbytes and 64 Kbytes. To determine the best block size, consideration must be given to the type of I/O the application performs. For example, if the application performs many small read and write operations, the 16-Kbytes size is probably most appropriate, whereas for very large read and writes (for example, data warehousing) a 64-Kbytes block size might be a better fit. As far as the Sun ONE Directory Server software is concerned , the internal database page size is 8 Kbytes. This is the size of I/O to and from the disk. The data values passed between the directory server and database is arbitrary, and the database tries to fit as many values onto a page as it can, or if a value is greater than 8 Kbytes, it uses overflow pages, again, 8 Kbytes in size. This basically means that whenever any write is made to the database, it is therefore done in an 8-Kbyte chunk . Having said this, in the index files (all except id2entry ) the data values are at or below 8 Kbytes by splitting them into blocks of approximately 8 Kbytes. This is the idlist block size which is a tunable configuration parameter. The parameter in the directory server that controls the page size used by the database is nsslapd-db-page-size in the entry cn=config,cn=ldbm database,cn=plugins,cn=config . Note, however, that if you want to change this, you must first export the database to LDIF, make the change, and re-import from LDIF. Furthermore, any binary backups you previously had will be invalid. Therefore, you should only change this if you have an extremely good reason to do so. Note Blocksize is a system-wide parameter. This means that all LUNs and volumes on the Sun StorEdge T3b array, or in the case of a partner-pairs, both arrays, must have the same block size. Other LUN ParametersOther parameters that need to be considered are:
The latter two parameters only affect partner-pair configurations. LUN reconstruction rate only impacts performance when a hot spare is swapped in. Read ahead policy is either on , meaning reads can be cached, or off , meaning no data is brought into cache early for future retrieval. The cache mode can be set to one of four modes: auto , writebehind , writethrough or none . If set to auto , the Sun StorEdge T3b array intelligently decides on-the-fly whether the cache should be used in writebehind or writethrough mode. Writebehind results in writes that are made into the cache and then de-staged to physical disk at a time of the Sun StorEdge T3b array's choosing. Writethrough provides that a write will go straight through the cache to disk. Setting the mode to writebehind or writethrough forces the cache to always behave in the chosen manner. Cache mode can affect resiliency. Sun StorEdge T3 Array Configuration ConsiderationsFIGURE 8-1 shows a Sun StorEdge T3b workgroup array in its recommended default configuration of eight disks in a single RAID 5 LUN with a hot spare. This is sometimes confusingly referred to as a 7+1+1 RAID 5 layout, presumably to identify that one disk will be lost to parity, although the parity is amortized over all of the eight disks. Figure 8-1. Basic Sun StorEdge T3b Array Setup
When a Sun StorEdge T3b array is configured as shown in FIGURE 8-1 and connected to a server, its hardware volume, or LUN, presents itself to the server's operating system as if it were a very large disk. FIGURE 8-2 helps explain how a host machine's operating system views a Sun StorEdge T3b array LUN. Figure 8-2. Solaris OS View of the Sun StorEdge T3b Array Volume
Sun StorEdge T3b Array High-Availability ConsiderationsCertain resiliency features are common to both workgroup arrays and partner-pair arrays. These features include a hot-spare disk ( assuming LUNs are configured with one), redundant power supplies , and batteries. If any of these items fail, the array will keep serving I/O requests, but if the cache mode was set to auto , its data is destaged to disk, and the mode will be set to writethrough . This way, the Sun StorEdge T3b array can guarantee that all future requests are immediately written to disk and the data is less susceptible to corruption after another failure. If the cache mode had been forced to writebehind , the Sun StorEdge T3b array would remain in this mode even after a failure, and therefore, be more susceptible to corruption following a subsequent failure. Partner- pairing provides greater resilience. If a controller or the path to a controller on one of the Sun StorEdge T3b arrays becomes disabled, I/O requests are automatically redirected through interconnecting cables. FIGURE 8-3 conceptually shows how this happens. Figure 8-3. Controller and Fiber Path Failover With Sun StorEdge T3b Partner Pair Arrays
Note For redirection to work, alternate pathing software must be installed on the host. On Solaris OE platforms, alternate pathing software is provided by either Veritas DMP (part of the Veritas Volume Manager software), or mpxio which is part of the Solaris Operating Environment. The multi-pathing system parameter on the Sun StorEdge T3b array must also be enabled for controller and fiber path failover to work correctly. On a pathing or controller failure, the cache mode is set to writethrough if it was initially set to auto . To protect caches from power failures and memory errors, cache mirroring is usually set to on in partner-pair deployments. This means that every write is copied to the partner's cache, so if a cache fails, data integrity is preserved. There is overhead associated with cache mirroring; where copying and routing over the interconnect takes time, and the amount of cache available is reduced, so the likelihood of flooding increases. Thus, performance might decrease. In summary, partner-pairing does provide resilience by giving all components a level of redundancy, assuming the use of hot spare disks. In most cases, however, data integrity and Sun StorEdge T3b array availability cannot be guaranteed should a further similar failure occur. Multiple levels of redundancy are the domain of highend storage solutions such as the Sun StorEdge 9900 range, as is predictive fault analysis and repair. Recommended Configurations for the Sun ONE Directory Server 5.2 SoftwareOne of the main considerations when architecting a directory solution is in providing a performant solution while not completely ignoring resiliency or what would be reasonable to deploy in a production environment. If you are architecting an enterprise directory solution, the default single RAID 5 LUN layout of eight disks and one hot spare can be used with the Sun StorEdge T3b arrays configured as workgroup arrays. This provides protection against disk failures, which are the most likely failure, but not against controller failures. This configuration further allows the use of odd numbers of arrays, which on smaller directory server implementations may be something desirable For example, you might not want the expense of purchasing two Sun StorEdge T3b arrays, given a small directory data size and budget. Using the workgroup array configuration instead of partner-pairs means that cache mirroring and its overhead do not have to be considered. RAID 0 over the nine disks of an array is possibly the most performant layout, but the only way to add resiliency is to use a software mirror on another array. Doing this is the most expensive solution available. RAID 5's read performance is not that much worse than RAID 0's, given that no parity calculations have to be made (an eight-disk RAID 5 layout would approximate a seven-disk RAID 0 stripe), and may be considered a good choice when using data that is mostly read and seldom written. Using RAID 1 (1+0) layouts with nine disks would buy very little advantage, because the stripe only goes over four disks, assuming that the mirror would be on the other four, with the last disk being used as a hot spare. Directory server performance tests at Sun have shown that RAID 5 over eight disks has better performance than a 4+4 RAID 1 layout, although the latter is slightly more resilient. Sun ONE Directory Server 5.2 Enterprise and Software Volume ManagersVolume management provides a logical view of the disks under its control. Volume management can create volumes smaller than a physical disk and partition, or larger than a physical disk by combining disks into a volume. A volume manager can also provide resiliency by mirroring volumes, or by using algorithms that allow the rebuilding of data should a disk in a volume fail. Thus, volume management provides an easy, flexible way to manage storage. Volume management is synonymous with RAID levels (see "RAID Explained for Directory Administrators" on page 459). Solaris Volume Manager (formerly known as Solstice DiskSuite) software, and Veritas Volume Manager are the volume management products available for use with the Sun ONE Directory Server 5.2 software. Traditionally, the Solaris Volume Manager software was considered a good volume manager for a small numbers of disks, but it became more difficult to use and administer with larger numbers of disk subsystems. This manageability gap, coupled with advanced features implemented by Veritas, such as importing and exporting volumes and dynamic multi-pathing, led to Veritas becoming the de facto standard for Sun's customers. Veritas software might still be a viable choice for use in some Sun ONE Directory Server 5.2. environments. However, in new deployments, you should weigh the pros and cons carefully because the latest releases of the Solaris Volume Manager software now compares very favorably with Veritas software. Sun StorEdge T3b Array and Veritas Volume ManagerIn the previous section we described how a hardware volume or LUN is viewed as a disk by the operating system. Here is example output from the format utility: # format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> /ssm@0,0/pci@18,700000/pci@1/SUNW,isptwo@4/sd@0,0 1. c0t1d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> /ssm@0,0/pci@18,700000/pci@1/SUNW,isptwo@4/sd@1,0 2. c0t6d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> /ssm@0,0/pci@18,700000/pci@1/SUNW,isptwo@4/sd@6,0 3. c16t1d0 <SUN-T300-0200 cyl 34530 alt 2 hd 224 sec 64> /ssm@0,0/pci@1b,700000/pci@2/SUNW,qlc@4/fp@0,0/ssd@w50020f230000 096b,0 4. c17t1d0 <SUN-T300-0200 cyl 34530 alt 2 hd 224 sec 64> /ssm@0,0/pci@1b,600000/pci@1/SUNW,qlc@4/fp@0,0/ssd@w50020f230000 0947,0 5. c18t1d0 <SUN-T300-0200 cyl 34530 alt 2 hd 224 sec 64> Specify disk (enter its number): ^D Note T300 is a legacy name for T3/T3b. Note The operating system has no idea what the underlying volume is. For example, it could be RAID 0, RAID 5, or no RAID configuration at all. Before Veritas can use the Sun StorEdge T3b array LUNs attached to the host, it must add them to a logical construct called a disk group . Once added, the LUNs are known as Veritas disks (FIGURE 8-4). These logical disks can then be carved up or combined to form volumes. Here is an example of a command line to create a stripe over two Veritas disks: Figure 8-4. Veritas View of the LUNs
# vxassist -g blueprints make vol1 20g layout=striped stripeunit=128 ncolumn=2 alloc="disk1 disk2" To explain what this means, blueprints is the Veritas disk group, vol1 is the volume being made, ncolumn is the number of Veritas disks being used, and alloc is a list of the logical disks being used. The stripeunit is the number of Veritas blocks. A Veritas block is 512 bytes, and so 128 equates to a stripe width of 64 Kbytes. This means that the first 64 Kbytes of the stripe will be on disk1 , the second 64 Kbytes on disk2 , the third 64 Kbytes on disk1 , and so on until the requested size of the volume, (20 Gbytes) is reached. Once a volume is created, it is added to the host's device tree, can have a file system applied to it, and can be mounted in the normal way. Note The path takes into account the Veritas abstractions. TABLE 8-2 contrasts the device paths for a partition (slice) of a standard SCSI disk and a Veritas volume. Table 8-2. Device Paths
As discussed in the previous sections, striping over disks (physical and logical) improves performance. Striping over controllers further improves performance because the number of I/O operations and data throughput is aggregated (for example, a single fiber controller can handle 100 Mbytes/sec transfers, but two controllers can handle 200 Mbytes/sec). FIGURE 8-5 shows how the blocks that make up the software stripe are laid out on the hardware RAID 5 stripe. Conceptually, the software stripe can be thought of as running vertically and the hardware RAID 5 stripe as running horizontally, forming a weave which is known as plaiding . Therefore, when a write is made to the file system mounted on the software stripe (for example, /dev/vx/dsk/blueprints/vol1 ) the underlying software and hardware spreads this write over two 100-Mbytes/sec controllers and sixteen disks. Similarly, when a read is made, both controllers and all the disks are involved. Even without the use of the caches of the two Sun StorEdge T3b arrays, this parallelism greatly increases the I/O speed when compared to serialized I/O on single disks. Plaiding can be advantageous when architecting and deploying the Sun ONE Directory Server software. Figure 8-5. Plaiding Striped Software Volumes Over RAID 5 Hardware Volumes
File Systems: UFS versus Veritas File System (VxFS)Historically, the choice between using UFS (the default file system type on Solaris OE) and VxFS was clear cut. If a file system was relatively static, extent sizes could be easily calculated and deployed. Performance gains over UFS would be made by using VxFS because seeks to data blocks could be more easily calculated and take less time than inode traversal in UFS. The other major reason for using VxFS was that its metadata was written to a special area on disk known as a journal. Should the system crash causing damage to the file system, the file system could be quickly repaired using this journal. For a UFS-type file system, such a crash would result in a tedious , time-consuming file system check ( fsck(1M) ). The major disadvantage with VxFS is the management required to properly maintain the extents on a file system that are continually updated. If this work is ignored, performance can quickly degrade. The other disadvantage is that license fees are required for its use. Since Solaris 2.6 Operating Environment, Sun Microsystems has invested much time in improving UFS. Indeed, tests done by Rich McDougal and Jim Mauro of Sun Microsystems have shown that UFS performance on Solaris 8 OE approximates very closely with optimally tuned VxFS. Given that UFS does not require any real maintenance and does not cost any money, it seems advantageous to use it in lieu of VxFS. The only other advantage of VxFS, journalling, has also been negated as of the Solaris 7 OE, where UFS can be mounted with the logging option. This performs the same function as journalling, and therefore obviates the need for file system checks, and allows quick rebuilds in the event of a crash. Thus, from a performance, availability, and cost perspective, UFS should be chosen for directory server deployments. However, the UNIX File System (UFS) performance improvements that were made in the Solaris 9 12/02 OE release change the storage management outlook. Veritas might have been a better choice than UFS in previous versions of the Solaris OE, but this is no longer the case. A recent study conducted by Sun Microsystems compared the performance of the UFS (Sun's integrated and preferred file system for general purpose Solaris OE software installations) against the Veritas Foundation Suite 3.5 software (Veritas File System, VxFS 3.5 on Veritas Volume Manager, VxVM 3.5). The results of the study reveal the following.
While there was once a compelling reason to use Veritas for performance and feature reasons, nearly all of those reasons are now eliminated, either directly in the UFS, or through the use of other Sun technologies such as Sun StorEdge QFS (Quick File System) software and SAMFS (Storage and Archive Manager File System) software. In addition to the above points, there are a few additional reasons to favor the UFS over VxFS.
Veritas software might still be a viable choice for use in some Sun ONE Directory Server 5.2. environments. However, in new deployments, you should weigh the pros and cons carefully because the latest releases of the Solaris Volume Manager software now compares very favorably with Veritas software. |