A new installation of a system can be a difficult task, especially if it is the first system being purchased from a specific vendor. New contacts must be made, and business relationships must be established. Some new skills also might have to be recruited, or an investment in training might have to be made. All of these things need to be addressed well in advance of the actual delivery. New System as Part of a Project Chapter 2 discussed the use of projects. A new system installation is likely to be part of a project, with an associated budget cost code. Any training requirements should be funded as part of the project, potentially saving the system manager a considerable amount of money. Additionally, a project sometimes funds the support and the implementation for the first year or so. Be aware of the potential to charge these costs to the project. With a new system, the system manager will be heavily involved in the technical assurance of the project. He will have significant input into the specification of the system and the requirements that need to be satisfied. In addition to considering the project itself, he will have to take account of any likely effect on the business as a whole. An example of this is when a new installation threatens to overload the existing power output in the computer room ”here, the risk exists that services already being provided from other computer systems residing in the same computer room could be severely disrupted. This example highlights not only the question of who funds any necessary work to provide the additional power capacity, but also the issue of the disruption to operational services while the work is carried out. In mission-critical operations, this will be a serious consideration if all power has to be removed for several hours or, worse , several days. Most, if not all, businesses will simply not allow this kind of interruption to service and will demand a high level of resilience, a requirement that can be satisfied only through the use of multiple sites, automatic failover, and clustering. The system manager needs to consider the purchase options available for a new system, particularly if it is planned to expand considerably. After a finite period of project support, the new system will become the responsibility of the system manager, so he must take a longer term view of the project, assessing the life expectancy of the system being purchased, the potential for upgrade and expansion, and so on. A purchasing initiative from Sun Microsystems addresses this specific issue. The "capacity on demand suite" enables customers to purchase fully configured systems but pay only for those components actually being used, or to spread the cost over more than one year, thereby allowing better management of the IT budget. (The term system does not necessarily apply to one physical computer; it could mean 5 servers and 200 clients , for example, all of which form the system.) The level of availability required can often dictate how a new configuration will be planned. Larger companies can opt for multiple sites acting in a clustered configuration, providing greater contingency against a disaster, whereas medium- sized businesses might choose hardware redundancy with automatic failover. The small business probably makes use of an uninterruptible power supply (UPS) to protect the system from unscheduled power loss. The next few sections detail some of the issues surrounding the purpose, configuration, and implementation of a new system installation. A sample scenario sets the scene. ScenarioThe purpose of the scenario outlined here is to identify a number of issues that need to be addressed. It is designed to encourage consideration of how you, the reader, would deal with such a situation. The issues are then discussed in the following sections. Consider this hypothetical example of a company wanting to expand into a new market: CoverMe, Inc., the insurance company from Chapter 3, "Delivering the Goods," is expanding into the motor insurance business. The current systems are a few years old and are not capable of supporting the new requirements. It is anticipated that the new motor insurance system will grow by approximately 70% over the next 12 months, but this could be as much as 200% if the new advertising campaign is successful. The requirements for the new system are as follows :
The implementation is to be carried out over two sites in a clustered configuration so that the required level of availability and resilience can be achieved. Additionally, a number of Web servers are being installed, but these are being handled by a separate Internet project. Planning for a New SystemA new system requires some careful planning. Several steps are involved in planning for the arrival of a new system, apart from the configuration and purchase of the system itself. Some of them are often overlooked, so they are discussed here:
Room for ExpansionMany new projects seriously underestimate the amount of growth that will occur in the first year of production. Consequently, they find that the system cannot cope with the demands being placed on it and that an upgrade must be carried out, causing major disruption and loss of availability. Earlier, this chapter briefly mentioned the capacity assurance suite initiative from Sun Microsystems. This allows the capacity options to be decided early in the project. The options are discussed here:
For further details on purchasing initiatives available via Sun Microsystems, consult the Web site http://www.sun.com, or contact your Sun sales department. Multiprocessing CapabilitiesSun Microsystems provides unrivalled multiprocessing capabilities, not only in its hardware, but also through the use of its Solaris operating environment, which exploits the facility fully. Sun provides a range of enterprise servers that can accommodate up to 64 processors and 64GB of main memory (the Enterprise 10000). For some companies, though, this is probably far too large, so other systems are available to cater to enterprises of all sizes. The range of servers and workstations can be viewed online from the Sun Web site. An interesting aspect of these multiprocessing systems and the operating environment is that a number of processors can be configured to work together as an independent domain. This allows the resources to be directed to where they are needed most, greatly increasing the flexibility that businesses of today require. If a problem occurs, for example, a specific "domain" of processors can even be rebooted without affecting the rest of the system, a facility that enhances the availability of the system. The overall result is that a single enterprise server can be segmented to run different applications independently, while also being capable of sharing resources when required. Integration into the Existing InfrastructureAny new installation must conform to certain standards that are already in place within the organization. The system manager is responsible for ensuring that the new system will fit in. This includes allocating physical space in the computer room, ensuring that the power requirements of the system can be accommodated and also that the controlled environment (air conditioning and so on) can cope with the heat output produced by the new system. Network connections are needed to enable the new system to connect to the LAN/WAN, and these must be allocated and configured in advance so that there is no delay when the system is delivered. Naming conventions also must be adhered to ”for example, setup for a new database server running an Oracle database might involve the use of defined mount points when creating the database file systems so that consistency across all servers is maintained . Larger installations and those with external communications facilities, such as the Internet for example, will undoubtedly have a system security policy that all new systems must comply with before being allowed to connect to the corporate network. System security is discussed in more detail in Chapter 6, "Solaris Security." Purpose of the SystemHow a system is to be used often determines, to an extent, how it will be configured and managed. The system manager is responsible for ensuring that the required level of service can be provided. The next few sections cover a variety of categories that a large system might be used for. This could be a server providing operating system services to a number of clients, or distributed access to data and print facilities. A database server, for example, will be providing remote access to a database and must be configured to make optimal use of the resources so that acceptable response times for queries and similar items can be achieved. All of these are discussed in the following pages, along with the administration and management overheads that are incurred with each. Operating System ServerA server that is providing operating system resources to a number of clients must be configured so that it can respond to client requests in a timely manner. Failure to achieve this is normally because either the server has been setup incorrectly, or it is trying to support too many clients. As an approximate rule, a large server would be expected to be capable of supporting only about 50 diskless clients. So, it can be seen from the scenario earlier in the chapter that 4 or 5 operating system servers would probably be required to support the 250 client workstations that are being delivered as part of the business expansion plan. Mount Points The use of mount points applies only to databases using UFS filesystems. A popular way of storing the database is to use raw disk partitions and the associated device filenames, but even in this case, there might be symbolic links that adhere to a naming convention. Two major considerations must be addressed for operating system servers: their physical position on the network and the organization of the disk partitions. Both are these are addressed next:
Client Configurations The clients being supported will be either diskless or autoclient systems. These do not include standalone systems because they require no system resources from a server to function.
Listing 5.1 Sample Output Showing the Distribution of Swap Space for Even-Numbered Clientscoverme# cd /export/swap1 coverme# ls -l total 1704976 -rw - 1 root other 67108864 Jul 2 18:05 coverme02 -rw - 1 root other 67108864 Jul 2 18:07 coverme04 -rw - 1 root other 67108864 Jul 2 18:07 coverme06 -rw - 1 root other 67108864 Jul 2 18:07 coverme08 -rw - 1 root other 67108864 Jul 2 18:08 coverme10 -rw - 1 root other 67108864 Jul 2 18:08 coverme12 -rw - 1 root other 67108864 Jul 2 18:08 coverme14 -rw - 1 root other 67108864 Jul 2 18:09 coverme16 -rw - 1 root other 67108864 Jul 2 18:09 coverme18 -rw - 1 root other 67108864 Jul 2 18:09 coverme20 -rw - 1 root other 67108864 Jul 2 18:10 coverme22 -rw - 1 root other 67108864 Jul 2 18:10 coverme24 -rw - 1 root other 67108864 Jul 2 18:10 coverme26 -rw - 1 root other 67108864 Jul 2 18:10 coverme28 -rw - 1 root other 67108864 Jul 2 18:10 coverme30 -rw - 1 root other 67108864 Jul 2 18:10 coverme32 -rw - 1 root other 67108864 Jul 2 18:10 coverme34 -rw - 1 root other 67108864 Jul 2 18:10 coverme36 -rw - 1 root other 67108864 Jul 2 18:10 coverme38 -rw - 1 root other 67108864 Jul 2 18:10 coverme40 -rw - 1 root other 67108864 Jul 2 18:10 coverme42 -rw - 1 root other 67108864 Jul 2 18:10 coverme44 -rw - 1 root other 67108864 Jul 2 18:10 coverme46 -rw - 1 root other 67108864 Jul 2 18:10 coverme48 -rw - 1 root other 67108864 Jul 2 18:10 coverme50 coverme# Creating the clients can be done easily by using Solstice Adminsuite, a package that provides an easy-to-use graphical user interface (GUI) for system administrators to manage the network. Adminsuite is described in Chapter 14, "Network Management Tools." File/Print ServerA file server is one that provides access to files or data that are held on the server and distributed to a number of clients. Sharing of the files is achieved using Network File Systems (NFS). A popular use for such a server is for users' home directories and shared areas, such as the online manual pages. When a file server is providing these services to a large number of clients, it must be configured so that it can accommodate the requests being made of it. The server, by default, starts up 16 NFS daemons (nfsd) at bootup . Increased NFS performance can be obtained by modifying the following line in the NFS server startup file /etc/rc3.d/S15nfs.server: /usr/lib/nfs/nfsd -a 16 If the value is increased, say, to 32, then 32 NFS daemon processes will be started, which will allow a greater throughput of requests. The actual number of daemons required depends on the number of clients requesting data access from the server. Another configuration option for file servers is to use the automounter, a facility that mounts remote directories or file systems dynamically only when they are accessed. The directory or file system is automatically unmounted when a specified time of nonactivity has elapsed. One of the features of the automounter is that it provides resilience and flexibility. User home directories, for example, can be easily moved to another file server with minimal disruption, while static data, such as the online manual pages, can be made to use multiple file servers. The advantage of this is that, if one file server becomes unavailable, other users can mount the specified data from the next server in the list. If a user already had a directory mounted when a server failed, then restarting the workstation would enable that user to also mount the data from an alternate server. A print server is one that provides printing resources to users across the network. The print server holds files for spooling before the actual printing takes place. If a server is being configured to provide printing resources, it should definitely have the /var file system created separately when Solaris is installed. The /var File System With Solaris, /var, by default, does not exist as a file system. /var is usually a directory contained within the root (/) file system. It must be manually specified if a separate file system is required. All spool files are held within the /var directory and can potentially become very large. If a separate file system is not created, then there is a risk that the root (/) file system might be filled to capacity and might endanger the running of the entire system. Database ServerA database server contains a physical database, or a number of databases, and provides users with access to the information stored within the various tables of the database. Some of the issues surrounding database servers are discussed here, using Oracle as an example of a relational database management system:
set shmsys:shminfo_shmmax=4294967295 set shmsys:shminfo_shmmin=1 set shmsys:shminfo_shmmni=100 set shmsys:shminfo_shmseg=10 set semsys:seminfo_semmns=200 set semsys:seminfo_semmni=70 Backing Up Before Making Changes Before editing such an important file as /etc/system, be sure to always make a copy so that recovery is possible if any problems occur. To reference the original file, an interactive boot would be necessary.
Application ServerAn application server provides software resources to a number of clients. In larger installations, there likely will be a number of application servers so that a high level of resilience and availability can be delivered in case a server becomes unavailable unexpectedly. It is quite feasible for large application servers to have hundreds of users running the applications resident on the server, so these need to be fairly powerful, preferably with multiple processors. Because of the large number of potential users that may be using the server, it might be necessary to increase the number of concurrent processes that can be run. This is achieved by modifying a kernel parameter maxusers in the file /etc/system, which increases the size of the process table. This parameter is set, by default, to be approximately equal to the number of megabytes of physical memory present in the system. This example shows the line to modify or add to the file /etc/system to change the value of maxusers to 256: set maxusers=256 The same applies with database servers ”a reboot is necessary to make the change effective. As with file servers, extensive use can be made of the automounter software to enable users to automatically run the application from the nearest available server and can switch to another one if a failure occurs. This will probably require a restart of the workstation if a directory was being mounted from the failing server. For large installations with many applications, I always used to set up a separate automounter map, named auto_packages, which made administration and management much easier. Using application servers eliminates the need for software packages to be fully installed on each user's workstation ”instead, central copies of the software are accessed by many users. The users can also take advantage of the significant processing power available on such a server, relieving the load on user workstations considerably. Desktop WorkstationStandalone workstations are self-sufficient ”they do not require any resources from a server, and they can exist on a network independently of any other Solaris systems. No special configuration options are required for this type of system, but it is neither a server nor a client. A popular use for this type of installation is a network management console. To be capable of monitoring and reporting on other systems in the network, it is of prime importance that this system not rely on any other system for its resources, except for the network itself, of course. A further use for a standalone system might be for an engineer, particularly if using CAD or other technical software, although this use would be primarily for performance. This type of installation, however, requires a much greater system administration overhead because there is no central control, compared with diskless and autoclient systems. Normally, a standard user workstation making use of shared resources, such as a sales or marketing workstation, would not be expected to be configured as a standalone workstation. Support of Clients and ArchitecturesAn operating system server, as described earlier in this section, provides the resources necessary for certain types of clients to function ”namely, diskless and autoclient systems, both of which are briefly discussed here along with their advantages and disadvantages: The Standalone Workstation The standalone workstation configuration is deliberately not considered here, even though it is often termed a "client." This is because it does not require any resources from a server to function. It contains everything that it needs to run independently.
Autoclient Versus Dataless Client The autoclient configuration replaced the dataless client as of Solaris 2.6. Additionally, an operating system server could be required to support clients of differing architectures. For example, suppose that the 250 clients in the scenario are comprised of the following components: 100 Sparc Ultra 5 (architecture sun4u) 100 Sparc 20 (architecture sun4m) 50 Sparc 10 (architecture sun4m) The operating system server will have to contain the binaries and support for two architectures, and space must be allocated for this on the following basis: 15MB for each client architecture to be supported 20MB for each diskless client 20MB for each autoclient client RAID OptionsThis section runs briefly through the more popular options for configuring storage arrays, usually referred to as Redundant Array of Inexpensive Disks (RAID) arrays. RAID arrays provide advantages through increased performance and higher availability. Most larger companies use a RAID type of configuration for management of their disk storage. The more popular options use a storage management subsystem in which the intelligence is provided via additional hardware. It is possible, however, to implement a pseudo-RAID configuration through software using a product such as Solstice Disksuite, which is covered briefly later in this section. The concept of RAID is that a number of physical disk modules are bound together to form a logical unit, sometimes referred to as a stripe. It is worth noting that to make a RAID configuration truly reliable, multiple SCSI controllers and storage processors (SPs) should be used to ensure that disk access is maintained in case of a controller failure, creating a dual access to the disk modules. Storage processors are part of the RAID device; they control the physical access to the RAID elements (disks). Multiple SPs improve the resilience further. The Clarion storage subsystem supplied by EMC is a good example of such a device. The following subsections describe the more popular RAID configurations, along with some of the recommended uses for each of the configurations. RAID 0This option is sometimes called a nonredundant array. It is not a true RAID configuration because it does not have any fault tolerance. Figure 5.3 shows how a RAID 0 stripe is organized. RAID 0 offers significantly improved performance through simultaneous I/O to different disks; the data blocks are written to and read simultaneously and independently. Data is written sequentially in blocks across the stripe, spanning all the physical disk modules. RAID 0 does not offer any enhanced reliability, so the failure of any one of the disks means that the whole stripe is lost. Recommended uses for this type of configuration might be video editing and related applications requiring high bandwidth. It is not recommended to use this option for mission-critical data. Figure 5.3. A RAID 0 configuration provides significant performance improvements, but the failure of one module will render the whole stripe unusable.
RAID 1In this configuration, also known as a mirrored pair, two physical disks are bound together. The hardware writes the same data to both disks, thereby creating a mirror. Reading from the mirror is done from either disk. This option is highly fault-tolerant and is ideal for storing mission-critical data and for applications requiring very high availability. If one of the disk drives fails, the other continues, while copying to a replacement module (see the upcoming section "Hot Spares "). A further use for this option is the system disk, that is, the disk containing the root (/) and /usr filesystems. Figure 5.4 shows the RAID 1 configuration. Figure 5.4. The RAID 1 configuration writes to both modules and is ideal for protecting critical system information.
RAID 1/0This RAID configuration provides the reliability that RAID 0 fails to offer. It is essentially a combination of a RAID 0 stripe that is duplicated in a RAID 1 configuration ”that is, a mirrored RAID 0. The RAID 1/0 stripe is shown in Figure 5.5. Figure 5.5. A RAID 1/0 configuration is a good compromise for RAID 0, providing performance enhancement and also reliability, in case of failure.
RAID 3RAID 3 maintains one of the physical disks for parity information. It provides the necessary fault tolerance if one of the disk modules fails because the replaced module can be re-created from the parity information, and the user does not lose access to the data (including data that was stored on the failed module). However, performance is degraded during the module rebuild, which could take several hours. Any application that requires a high throughput would be suitable for this option, although single-task applications are best suited. Figure 5.6 demonstrates how the data blocks are organized in a RAID 3 configuration: Figure 5.6. RAID 3 is good for single-task applications, such as CAD , in which a high throughput is required. Multitask applications will suffer because of all the parity information being stored on a single module.
RAID 5This is the most popular option for larger systems, especially those using relational database applications. The data is striped across multiple physical disk modules, but unlike the RAID 3 configuration, parity data is stored on each of the modules, providing high availability and reliability. Again, if one of the disk modules fails, the disk can be re-created from the information contained on the remaining disks, allowing users to still access the data, including data that was stored on the failed module. As with RAID 3, performance is degraded while a failed disk module is rebuilt, and the process could take several hours to complete. File servers, application servers, and systems using databases benefit from using RAID 5, but in the case of database systems, it is generally recommended to store online transaction logs (redo logs) on a mirrored pair of disks for added resilience (RAID 1). Figure 5.7 describes how the RAID 5 configuration is implemented. Figure 5.7. The RAID 5 configuration distributes parity and data across all the modules, providing enhanced performance and reliability.
Hot SparesHot spares are essential for any storage subsystem to make full use of the fault tolerance that is made available by these configurations. A hot spare is a disk module that is managed by the storage hardware and allocated as a "spare." Frequently, there will be a pool of hot spare disks in a larger array configuration, allowing the system to continue operating even if several disk modules fail. If a failure occurs, the storage subsystem dynamically allocates one of the hot spare disks to replace the failed module and initiates the rebuild onto the spare disk module. The advantage of doing this is that the failed disk module can be replaced at a more convenient time, and the system continues to operate throughout. Figure 5.8 shows a typical disk array configured with three hot spare modules. Notice that the configuration displayed consists of more than one RAID stripe and more than one RAID configuration. A hot spare disk module automatically replaces any of the modules in the array that suffer a failure. Figure 5.8. Hot spare modules allow dynamic replacement of a failed unit without having to interrupt the operation of the system. The failed unit can be replaced at a more convenient time.
Solstice DisksuiteSolstice Disksuite is a software solution for emulating disk mirroring and striping, but without the added hardware normally associated with a RAID configuration. It is normally used in smaller systems where SCSI disks can be grouped together into structures that resemble RAID arrays. It also allows the creation of logical volumes that can span multiple physical disks; this, in turn , allows a file system to be larger than one physical disk, something that the standard Solaris implementation does not facilitate. Figure 5.9 shows the concatenation of several disks to create a large file system that is also mirrored. Figure 5.9. Even though the concatenation of multiple disks through software does not provide the same performance enhancements found in true RAID configurations, the pseudo-array can be cost-effective for smaller companies.
|
Top |