Lesson 1: Introduction to Highly Available Web Solutions | MCSE Training Kit (Exam 70-226): Designing Highly Available Web Solutions with Microsoft Windows 2000 Server Technologies (MCSE Training Kits)

Because World Wide Web technologies are rapidly becoming the platform of choice for supporting enterprise-wide applications, the infrastructure required to develop and host applications has grown in scale and complexity. Server technology is particularly hard-pressed to keep up with the daily client demands for Web pages. One of the greatest concerns of vendors today is to make their products and services available 24 hours a day, 7 days a week. Providing this kind of service isn’t only a business consideration, but also a matter of brand reputation. Businesses have spent millions of dollars to achieve the ideal of very high uptime. Even a small amount of downtime can cost a business a significant amount of revenue and damage its reputation. An outage can be caused by a variety of factors. The hardware, operating system, data storage, network, and management applications are some of the vulnerable areas that can lead to downtime. The system might not be resilient against disasters and faults in the system. To meet the demands of highly available Web sites, Microsoft Windows 2000 Advanced Server has been designed to address mission-critical needs. This lesson introduces you to Windows 2000 Advanced Server and provides an overview of some of the key terminology used in designing highly available Web solutions. In addition, this lesson provides an overview of the architectural changes that have occurred as networks have moved toward a Web computing model for business.

After this lesson, you will be able to

Describe which features in Windows 2000 Advanced Server support high availability and scalability
Define the key terminology used in designing highly available Web solutions
Describe the Web computing model for business

Estimated lesson time: 25 minutes

Windows 2000 Advanced Server

The Microsoft Windows 2000 Server family currently includes Windows 2000 Server, Windows 2000 Advanced Server, and Windows 2000 Datacenter Server. Windows 2000 Server offers core functionality appropriate to small and medium- sized organizations that have numerous workgroups and branch offices and that need essential services, including file, print, communications, infrastructure, and Web. Windows 2000 Advanced Server is designed to meet mission-critical needs—such as large data warehouses, online transaction processing (OLTP), messaging, e-commerce, and Web hosting services—for medium-sized and large organizations and for Internet service providers (ISPs). Datacenter Server includes all the functionality of Advanced Server, but provides greater reliability and availability. Datacenter Server is the best platform for large-scale line-of-business and enterprise.com back-end usage.

Windows 2000 Advanced Server evolved from Microsoft Windows NT Server 4, Enterprise Edition. It provides an integrated and comprehensive clustering infrastructure for high availability and scalability of applications and services, including main memory support up to 8 gigabytes (GB) on Intel Page Address Extension (PAE) systems. Designed for demanding enterprise applications, Advanced Server supports new systems with up to eight-way symmetric multiprocessing (SMP). SMP enables any one of the multiple processors in a computer to run any operating system or application threads simultaneously with the other processors in the system. Windows Advanced Server is well suited to database-intensive work and provides high availability server services and load balancing for excellent system and application availability.

Windows 2000 Advanced Server includes the full feature set of Windows 2000 Server and adds the high availability and scalability required for enterprise and larger departmental solutions. Windows 2000 Advanced Server includes the following functionality to support high availability:

Network Load Balancing (NLB)
The Cluster service, based on the Microsoft Cluster Server service in Windows NT Server 4, Enterprise Edition
Up to 8 GB main memory on Intel PAE systems
Up to eight-way SMP

If you’re uncertain about whether you have an Intel PAE computer system, contact your hardware vendor.

Key Terminology

In various types of documentation, the terminology used to describe specific characteristics of networks and Web sites often differs from one source to the next. In this section several key terms are defined to help you understand how specific terminology is used within this book.

Availability

Availability is a measure (from 0 to 100 percent) of the fault tolerance of a computer and its programs. The goal of a highly available computer is to run 24 hours a day, 7 days a week (100 percent availability), which means that applications and services are operational and usable by clients most of the time. Availability measures whether a particular service is functioning properly. For example, a service with an availability of 99.999 percent is available (functioning properly) for all but 5.3 minutes of unplanned downtime a year.

You can use many different methods to increase the availability of your Web site. They range from using servers with fault-tolerant components (such as hot swappable drives, RAID controllers, redundant network interfaces, and hot swappable system boards) to load-balanced clustered solutions (such as Cisco Local Directors or Microsoft Application Center Server 2000) to failover clustered solutions (such as the Cluster service or Veritas Cluster Server). In the case of a completely redundant computer system, the software model for using the hardware is one in which the primary computer runs the application while the other computer idles, acting as a standby in case the primary system fails. The main drawbacks to redundant systems are increased hardware costs with no improvement in system throughput, and, in some cases, no protection from application failure.

You can make front-end systems at the Web tier highly available through the use of clustered servers that provide a single virtual Internet Protocol (IP) address to their clients. You can use load balancing to distribute the load across the clones. Building a failure-detection process into the load-balancing system increases the service’s availability. A clone that’s no longer offering a service can be automatically removed from the load-balanced set while the remaining clones continue to offer the service.

You can make back-end systems highly available through the use of failover clustering. In failover clustering, if one node fails, the other node takes ownership of its resources. Failover clustering assumes that an application can resume on another computer that’s been given access to the failed system disk subsystem. The primary node will automatically failover to the secondary node when a clustered application, the operating system, or a hardware component of the primary node fails. The secondary node, which should be a replica of the primary node, must have access to the same data storage.

Failure

Failure is defined as a departure from expected behavior on an individual computer system or a network system of associated computers and applications. Failures can include behavior that simply moves outside of defined performance parameters. If the system’s specified behavior includes time constraints, such as a requirement to complete processing in a specified amount of time, performance degradation beyond the specified threshold is considered a failure. For example, a system that must process a transaction within 2 seconds may be in a failed state if transaction processing degrades beyond this 2-second window.

Software, hardware, operator and procedural errors and environmental factors can each cause a system failure. A single point of failure is any component in your network environment that prevents network communication, data transfer, or application availability. Single points of failure can include hardware, software, or external factors, such as power supplied by a utility company. One recent survey indicates that although hardware component failure accounts for up to 30 percent of all system outages, operating system and application failures account for almost 35 percent of all unplanned downtime. Typical hardware components that may fail include computer cooling fans, disk-drive hardware, and power supplies. Minimizing single points of failure or eliminating them completely will increase a site’s overall reliability.

Fault Tolerance

Fault tolerance is the ability of a system to continue functioning when part of the system fails. Fault tolerance combats problems such as disk failures, power outages, or corrupted operating systems, which can affect startup files, the operating system itself, or system files. Windows 2000 Server includes features that support certain types of fault tolerance.

For example, Windows 2000 supports two implementations of RAID: RAID-1 and RAID-5. RAID-1 provides fault tolerance through the use of mirroring. All data that is written to the primary volume is also written to a secondary volume, or mirror. If one disk fails, the system uses data from the other disk. RAID-5 provides fault tolerance by sharing data across all the disks in an array. The system generates a small amount of data, called parity information, which is used to reconstruct lost information in case a disk fails.

Although the data is always available and current in a fault-tolerant system, you still need to make tape backups to protect the information about your disk subsystem against user errors and natural disasters. Disk fault tolerance is not an alternative to a backup strategy with off-site storage.

Fault tolerance can also be achieved through hardware implementations of RAID. Many hardware RAID solutions provide power-redundancy, bus-redundancy, and cable-redundancy within a single cabinet and can track the state of each component in the hardware RAID firmware. The significance of these capabilities is that they provide data availability with multiple redundancies to protect against multiple points of failure. Hardware RAID solutions can also use an onboard processor and cache. Windows 2000 Advanced Server can use these disks as standard disk resources. Though more costly than the software RAID supported by Windows 2000 Server, hardware RAID is generally considered the superior solution.

Manageability

Manageability is the ability to make changes to the system easily. Management has many facets, but it can be loosely divided into the following disciplines:

Change and configuration management System administration, state management, and software life cycle management
Security management User access, authentication, and tracking
Performance management Tracing, tuning, and modeling applications and networks; also service level provisioning
Problem management Error isolation, diagnostics, trouble ticketing, and consolidated help facilities
Event management System information monitoring, consolidation, aggregation, and delivery
Batch/output management Job execution, scheduling dependency, submission control, and charge back
Storage management Storage hardware administration, data protection, and placement

Reliability

Reliability is a measure of the time that elapses between failures in a system. Hardware and software components have different failure characteristics. Although formulas based on historical data exist to predict hardware reliability, such formulas for predicting software reliability are harder to find.

Hardware components usually have what is known as an exponential failure distribution, which means that—under normal circumstances and after an initial phase—the longer a hardware component operates, the more frequently it will fail. Therefore, if you know when a component is likely to fail, you can estimate that component’s reliability.

Scalability

Scalability is a measure of how easily a computer, service, or application can expand to meet increasing performance demands. For server clusters, scalability refers to the ability to incrementally add one or more systems to an existing cluster when the cluster’s load exceeds its capabilities. The ability to "scale" is traditionally thought of as the ability to handle increased load over time with minimal intervention. Scalability is a critical component of Web-enabled applications because the nature of the Web is such that load can’t be predicted (but must always be handled). Scalability is also a critical component of intranet applications because these applications must support an ever-growing business.

Scalability can also be defined as the capacity of an application to perform increasing work while sustaining acceptable performance levels. In order to scale, business Web sites split their architecture into two parts: front-end systems (client accessible) and back-end systems where long-term persistent data is stored or where business-processing systems are located. Load-balancing systems are used to distribute the work across systems at each tier.

The Web Computing Model for Business

When Internet technology, notably the Web, moved into the computing mainstream in the mid-1990s, the model for business computing changed dramatically. This shift (as shown in Figure 1.1) was centered on the industry’s notion of client/server computing, which previously had been very complex, costly, and proprietary.

Figure 1.1 - Application architecture shifts since 1970

The Web model is characterized by loosely connected tiers of diverse collections of information and applications that reside on a broad mix of hardware platforms. This platform is flexible by design and not limited to one or two computing tiers. The only real limits to application development in the Internet world are computer capacity and the application designer’s imagination.

Because today’s businesses are models of dynamic change—often growing quickly and instituting rapid directional shifts—the Web model is ideally suited for business computing. Web sites can grow exponentially with demand and provide a full range of services that can be tailored according to user requirements. These services are often complex and need to be integrated with other services in the organization.

Architectural Goals

An architecture that addresses business computing needs must meet the following goals:

Scalability Enabling continuous growth to satisfy user demands and respond to business needs by providing near-linear, cost-effective scaling
Availability and reliability Ensuring that continuous services are in place to support business operations by using functional specialization and redundancy
Management Providing management with ease of use and completeness to ensure that operations can keep pace with growth and reduce the total cost of ownership (TCO)
Security Ensuring that adequate security is in place to protect the organization’s assets, namely its infrastructure and data

Architectural Elements

The key architectural elements of an n-tier business Web site (as illustrated in Figure 1.2) are as follows:

Clients
Front-end systems
Back-end systems

Figure 1.2 - Architectural elements of an n-tier business Web site

The site architect and application developer must consider all these elements in the context of scalability and reliability, security, and management operations.

Figure 1.2 shows the split between the front-end and back-end systems as well as the firewall and network segmentation, which are key security elements in site architectures.

Clients

Clients issue service requests to the server that’s hosting the application that the client is accessing. From the user’s perspective, the only things visible are a Uniform Resource Locator (URL) that identifies a page on a site, hyperlinks for navigation once the page is retrieved, or forms that require completion. Neither the client nor the user has any idea of the inner workings of the server that satisfies the request.

Front-End Systems

Front-end systems consist of the collections of servers that provide core services, such as Hypertext Transfer Protocol/Hypertext Transfer Protocol Secure (HTTP/HTTPS) and File Transfer Protocol (FTP), to the clients. These servers host the Web pages that are requested and all usually run the same software. For efficiency’s sake, it’s not uncommon for these servers (known as Web farms, or clusters) to have access to common file shares, business-logic components, or database systems located on the back-end systems (or middle-tier systems in more extended models).

Front-end systems are typically described as stateless because they don’t store any client information across sessions. If client information needs to persist between sessions, you can do that in several ways. The most common is through the use of cookies. Another technique involves writing client information into the HTTP header string of a Web page to be retrieved by the client. The last method is to store client information in a back-end database server. However, because the last technique can have significant performance implications you should use it judiciously. You can achieve scalability of the front-end systems by increasing the capacity of an individual server (scaling up) or by adding more servers (scaling out).

Back-End Systems

The back-end systems are the servers hosting the data stores that are used by the front-end systems. In some cases a back-end server doesn’t store data but accesses it from a data source elsewhere in the corporate network. Data can be stored in flat files, inside other applications, or in database servers such as Microsoft SQL Server. Table 1.1 summarizes data and storage areas.

Table 1.1 Types of Data in Storage Areas

Type of Storage Areas	Example	Type of Data
File systems	File shares	Hypertext Markup Language (HTML) pages, images, executables, scripts, Component Object Model (COM) objects
Databases	SQL Server	Catalogs, customer information, logs, billing information, price lists
Applications	Ad insertion, SAP Agent	Banner ads, accounting information, inventory/ stock information

Because of the data and state that back-end systems must maintain, they’re described as stateful systems. As such, they present more challenges to scalability and availability.

Security Infrastructure

Securing the assets of today’s businesses—with their mobile workers, business-to-business computer direct connections, and a revolving door to the Internet—is complex and costly. But not implementing computer security, or doing it poorly, can lead to an even more costly disaster.

At a high level, security domains—not to be confused with Internet or Windows NT/Windows 2000 domains—provide regions of consistent security with well-defined and protected interfaces between them. Large organizations may partition their computing environment into multiple domains, according to business division, geography, or physical network, to name but a few types. Security domains may be nested within one another or even overlap. There are as many security architectures as there are security mechanisms.

At a low level, the basic model for securing a single site involves setting up one or several perimeters to monitor and, if necessary, block incoming or outgoing network traffic. This perimeter defense (firewall) may consist of routers or specialized secure servers. Most organizations use a second firewall system, as shown in Figure 1.3. Security specialists refer to this area between the firewalls as the perimeter network.

Figure 1.3 - Using firewalls to establish a secure zone

Note that the configuration shown in Figure 1.3 is only a model; every organization builds its security architecture to meet its own requirements. In fact, some put their Web servers on the Internet side of the firewall because they’ve determined that the risk to, or cost of reconstructing, a Web server isn’t high enough to warrant more protection. Another factor in this decision is the performance cost of providing more protection.

Management Infrastructure

Site management systems are often built on separate networks to ensure high availability and to avoid having a harmful impact on the application infrastructure. The core architectural elements of a management system are as follows:

Management consoles serving as portals that allow administrators to access and manipulate managed servers
Management servers (also called monitoring servers) that continuously monitor managed servers, receive alarms and notifications, log events and performance data, and serve as the first line of response to predetermined events
Management agents, which are programs that perform management functions within the device on which they reside

As systems scale or their rate of change accelerates, the management and operation of a business Web site becomes critical in terms of reliability, availability, and scalability. Administrative simplicity, ease of configuration, and ongoing health/failure detection and performance monitoring become more important than application features and services.

Lesson Summary

Windows 2000 Advanced Server is designed to meet mission-critical needs and provides a comprehensive clustering infrastructure for high availability and scalability of applications and services. Availability is a measure (from 0 to 100 percent) of the fault tolerance of a computer and its programs. Failure is defined as a departure from expected behavior on an individual computer system or a network system of associated computers and applications. Fault tolerance is the ability of a system to continue functioning when part of the system fails. Manageability is the ability to make changes to the system easily. Reliability is a measure of the time that elapses between failures in a system. Scalability is a measure of how well a computer, service, or application can expand to meet increasing performance demands. To support availability, manageability, reliability, and scalability, the Web computing model has evolved into loosely connected tiers of diverse collections of information and applications that reside on a broad mix of hardware platforms. This platform is flexible by design and not limited to one or two computing tiers. The key architectural elements of an n-tier business Web site are clients, front-end systems, and back-end systems. At a high level, security domains provide regions of consistent security with well-defined and protected interfaces between them. The core architectural elements of a management system are management consoles, servers, and agents.