Architecture Overview

Introduction

Large business sites are models of dynamic change: They usually start small and grow exponentially with demand. They grow both in the number of unique users supported, which can grow extremely quickly, and in the complexity and integration of user services offered. The business plans for many site startups are vetted by their investors for a believable 10x to 100x scalability projection. Successful business sites manage this growth and change by incrementally growing the number of servers that provide logical services to their clients either by the servers offering multiple instances of themselves (clones) or by partitioning the workload among themselves and by creating services that integrate with existing computer systems. This growth is built on a solid architecture foundation that supports high availability, a secure infrastructure, and a management infrastructure.

Architectural Goals

The architecture described in this document strives to meet four goals:

  • Linear scalability: continuous growth to meet user demand and business complexity.
  • Continuous service availability: using redundancy and functional specialization to contain faults.
  • Security of data and infrastructure: protecting data and infrastructure from malicious attacks or theft.
  • Ease and completeness of management: ensuring that operations can match growth.

Scalability

To scale, business Web sites split their architecture into two parts: front-end (client accessible) systems and back-end systems where long-term persistent data are stored or where business-processing systems are located. Load-balancing systems are used to distribute the work across systems at each tier. Front-end systems generally do not hold long-term state. That is, the per-request context in a front-end system is usually temporary. This architecture scales the number of unique users supported by cloning or replicating front-end systems coupled with a stateless load-balancing system to spread the load across the available clones. We call the set of IIS servers in a clone set a Web cluster. Partitioning the online content across multiple back-end systems allows it to scale as well. A stateful or content-sensitive load-balancing system then routes requests to the correct back-end systems. Business logic complexity is increased in a manageable way by functional specialization. Specific servers are dedicated to task-specific services, including integration with legacy or offline systems. Cloning and partitioning, along with functionally specialized services, enable these systems to have an exceptional degree of scalability by growing each service independently.

Availability

Front-end systems are made highly available as well as scalable through using multiple cloned servers, all offering a single address to their clients. Load balancing is used to distribute load across the clones. Building failure detection into the load-balancing system increases service availability. A clone that is no longer offering a service can be automatically removed from the load-balance set while the remaining clones continue to offer the service. Back-end systems are more challenging to make highly available, primarily due to the data or state they maintain. They are made highly available by using failover clustering for each partition. Failover clustering assumes that an application can resume on another computer that has been given access to the failed systems disk subsystem. Partition failover occurs when the primary node supporting requests to the partition fails and requests to the partition automatically switch to a secondary node. The secondary node must have access to the same data storage, which should also be replicated, as the failed node. A replica can also increase the availability of a site by being available at a remote geographic location. Availability is also largely dependent on enterprise-level IT discipline, including change controls, a rigorous test process, and quick upgrade and fallback mechanisms.

Security

Security—managing risks by providing adequate protections for the confidentiality, privacy, integrity, and availability of information—is essential to any business site success. A business site uses multiple security domains, where systems with different security needs are placed and each domain is protected by a network filter or firewall. The three principal domains, each separated by a firewall, are: the public network; a DMZ (derived from the military term "demilitarized zone"), where front ends and content servers are placed; and a secure network, where content is created or staged and secure data is managed and stored.

Management

Management and Operations broadly refers to the infrastructure, tools, and staff of administrators and technicians needed to maintain the health of a business site and its services. Many sites are located in what is often called a hosted environment. That is, the systems are collocated with an Internet Service Provider (ISP) or a specialist hosting service, where rich Internet connectivity is available. Consequently, the management and monitoring of the systems must be done remotely. In this architecture, we describe such a management network and the types of management functions the network must support.

Architectural Elements

The key architectural elements of a business Web site highlighted in this section are client systems; load balanced, cloned front-end systems (that client systems access); load balanced, partitioned back-end systems (that front-end systems access for persistent storage); and three overarching architectural considerations: disaster tolerance, security domains, and management and operations.

Elements of a Large Business Web Site

Figure A.1 captures the concepts and essential elements of a business Web site as described in more detail in the remainder of this section.

click to view at full size

Figure A.1 Architectural elements

Figure A.1 shows the split between the front end and back end and the load balancing layers as described in this document. Firewall and network segmentation are key security elements.

Clients

In this site architecture, clients issue requests to a service name, which represents the application being delivered to the client. The end user and the client software have no knowledge about the inner working of the system that delivers the service. The end user typically types the first URL, for example, http://www.thebiz.com/, and then either clicks on hyperlinks or completes forms on Web pages to navigate deeper into the site.

For a Web site with a very broad reach, an important decision is whether to support the lowest common set of features in browsers or whether to deliver different content to different browser versions. Currently, HTML 3.2 is usually the lowest version supported, although there are still older browsers in use. For example, browsers could be classified into those that support HTML 3.2, such as Microsoft Internet Explorer 3.0; those that support dynamic HTML (DHTML), such as Internet Explorer 4.0; and those that support Extensible Markup Language (XML), such as Internet Explorer 5.0. Different content would then be delivered to each. IIS and tools can create pages that can be dynamically rendered to different browsers.

Front-End Systems

Front-end systems are the collection of servers that provide the core Web services, such as HTTP/HTTPS, LDAP, and FTP, to Web clients. Developers usually group these front-end systems into sets of identical systems called clones. They all run the same software and have access either through content replication or from a highly available file share to the same Web content, HTML files, ASPs, scripts, etc. By load balancing requests across the set of clones, and by detecting the failure of a clone and removing it from the set of working clones, very high degrees of scalability and availability can be achieved.

Clones (stateless front ends)

Cloning is an excellent way to add processing power, network bandwidth, and storage bandwidth to a Web site. Because each clone replicates the storage locally, all updates must be applied to all clones. However, coupled with load balancing, failure detection, and the elimination of client state, clones are an excellent way to both scale a site and increase its availability.

Stateless load balancing

The load-balancing tier presents a single service name to clients and distributes the client load across multiple Web servers. This provides availability, scalability, and some degree of manageability for the set of servers. There are a variety of approaches to load balancing, including Round Robin Domain Name Service (RRDNS) and various network-based and host-based load-balancing technologies.

Maintaining client state

It is not desirable to maintain client state in the cloned front-end systems because this works against transparent client failover and load balancing. There are two principal ways to maintain client state across sessions. One is to store client state in a partitioned back-end server. (Client state can be partitioned perfectly, and therefore it scales well. However, it is necessary to retrieve this state on each client request.) Another way to maintain client state across sessions is to use cookies and/or URLs. Cookies are small files managed by the client's Web browser. They are invaluable for minimizing load on stateful servers and maximizing the utility of stateless front ends. Data can also be stored in URLs and returned when the user clicks on the link on the displayed Web page.

Front-end availability

As application code runs in these front-end servers, either written in a high-level language such as Microsoft Visual Basic or C++ or written as a script, it is important to isolate programming errors from different Web applications. Running this application code out of process from the Web server is the best way to isolate programming errors from each other and avoid causing the Web server to fail.

Back-End Systems

Back-end systems are the data stores that maintain the application data or enable connectivity to other systems, which maintain data resources. Data can be stored in flat files, in database systems such as Microsoft SQL Server, or inside other applications as shown in Table A.1.

Table A.1 Different Types of Data Stores

File systemsDatabasesOther applications
ExampleFile sharesSQLAd insertion, SAP, Siebel
Data HTML, images, executables, scripts,COM objectsCatalogs, customer information, logs, billing information, price listsInventory/stock,banner ads, accounting

Back-end systems are more challenging to scale and make highly available, primarily due to the data and state they must maintain. Once the scalability of a single system is reached, it is necessary to partition the data and use multiple servers. Continuous scalability is therefore achieved through data partitioning and a data-dependent routing layer or a stateful load-balancing system, which maps the logical data onto the correct physical partition.

For increased availability, a cluster—which typically consists of two nodes with access to common, replicated or RAID (Redundant Array of Independent Disks) protected storage—supports each partition. When the service on one node fails, the other node takes over the partition and offers the service.

Partitions (stateful back-end systems)

Partitions grow a service by replicating the hardware and software and by dividing the data among the nodes. Normally, data is partitioned by object, such as mailboxes, customer accounts, or product lines. In some applications partitioning is temporal, for example by day or quarter. It is also possible to distribute objects to partitions randomly. Tools are necessary to split and merge partitions, preferably online (without service interruption), as demands on the system change. Increasing the number of servers hosting the partitions increases the scalability of the service. However, the choice of the partitioning determines the access pattern and subsequent load. Even distribution of requests, especially avoiding hot spots (a single partition that receives a disproportionate number of requests), is an important part of designing the data partitioning. Sometimes this is difficult to achieve, and a large multiprocessor system must host the partition. Partition failover, a situation in which services automatically switch to the secondary node (rolling back uncommitted transactions), provides continuous partition availability.

Stateful load balancing

When data is partitioned across a number of data servers, or functionally specialized servers have been developed to process specific types of Web requests, software must be written to route the request to the appropriate data partition or specialized server. Typically, this application logic is run by the Web server. It is coded to know about the location of the relevant data, and based on the contents of the client request, client ID, or a client supplied cookie, it routes the request to the appropriate server where the data partition is. It also knows the location of any functionally specialized servers and sends the request to be processed there. This application software does stateful load balancing. It is called stateful as the decision on where to route the request is based on client state or state in the request.

Back-end service availability

In addition to using failover and clustering to achieve high availability, an important consideration in the overall system architecture is the ability of a site to offer some limited degree of service, even in the face of multiple service failures. For example, a user should always be able to log on to an online mail service, possibly by replication of the user credentials, and then send mail using cloned Simple Mail Transfer Protocol (SMTP) routers, even if the user's mail files are unavailable. Similarly, in a business site the user should be able to browse the catalog even if the ability to execute transactions is temporarily unavailable. This requires the system architect to design services that degrade gracefully, preventing partial failures from appearing to be complete site failures to the end user.

Disaster Tolerance

Some business Web sites require continuous service availability, even in the face of a disaster: Their global business depends on the service being available. Disasters can either be natural—earthquake, fire, or flood—or may result from malicious acts such as terrorism or an employee with a grudge. Disaster-tolerant systems require that replicas or partial replicas of a site be located sufficiently far away from the primary site so that the probability of more than one site being lost through a disaster is acceptably small. At the highest level, there are two types of replicated sites. Active sites share part of the workload. Passive sites do not come into service until after the disaster occurs. Where very quick failover is required, active sites are usually used. Passive sites may simply consist of rented servers and connectivity at a remote location where backup tapes are stored that can be applied to these servers when necessary. Even this minimal plan should be considered for any business.

The challenge is keeping replicated sites up to date with consistent content. The basic methodology here is to replicate the content from central staging servers to staging servers at the remote sites, which update the live content on each site. For read-only content this method is sufficient. However, for more sophisticated sites where transactions are executed it is also necessary to keep the databases up to date. Database replication and log shipping is usually used where transactional updates to the database are shipped to a remote site. Typically, the databases will be several minutes out of synchronization. However, this is preferable to the complete loss of a site.

Security Domains

Security mechanisms protect the privacy and confidentiality of sensitive information from unauthorized disclosure; they protect system and data integrity by preventing unauthorized modification or destruction; and they help ensure availability through prevention of denial-of-service attacks and by providing contingency or disaster planning.

Security domains are regions of consistent security, with well-defined and protected interfaces between them. Application of this concept can help to ensure the right levels of protection are applied in the right places. A complex system such as a large business site and its environment can be partitioned into multiple security domains. Region can mean almost any desired division—for example, by geography, organization, physical network, or server or data types. For a business site, major divisions might reasonably correspond to the Internet, the site's DMZ, and the secure, corporate, and management networks. Domains may also be nested or overlapping. For example, credit card numbers within a database may require additional protection. Additional security controls, such as encrypting the card numbers, can provide this.

An analogy may help you to visualize security domains. The Internet is like a medieval castle and its surroundings: Outside its walls, few rules apply and unscrupulous characters are in abundance. Corresponding to this castle, the key architectural element used to protect a Web site is to construct a wall around it, with a main gate that is heavily guarded to keep out undesirables. The wall and any other gates need to be built to equivalent standards in order to maintain a given level of security. It certainly would not do to have an unprotected back door! For large business sites, this wall is known as the site's perimeter. In network terms, it means that the site's own communications facilities are private and isolated from the Internet, except at designated points of entry. The site's main gate is known as a firewall. It inspects every communications packet to ensure that only desirables are allowed in. Continuing the analogy, the stronghold in a castle holds the crown jewels. Additional walls and locked doors, or walls within walls, provide additional protection. Business sites similarly protect very sensitive data by providing an additional firewall and an internal network as illustrated in Figure A.2.

click to view at full size

Figure A.2 Firewall/DMZ

A firewall is a mechanism for controlling the flow of data between two parts of a network that are at different levels of trust. Firewalls can range from packet filters, which only allow traffic to flow between specific IP ports and/or ranges of IP addresses, to application-level firewalls, which actually examine the content of the data and decide whether it should flow or not. Sites often implement outward-facing firewalls that filter packets in conjunction with inward-facing firewalls that filter at the protocol and port layers.

Securing a site is complex; nevertheless the firewall/DMZ is a key architectural component. (It is actually a subset of network segmentation.) It is a necessary, but by no means sufficient, security mechanism to ensure a desired level of protection for a site. The "Security" section of this document is completely dedicated to securing a site.

Management Infrastructure

A site management system is often built on a separate network to ensure high availability. Using a separate network for the management system also relieves the back-end network of the management traffic, which improves overall performance and response time. Sometimes, management and operations use the back-end network; however, this is not recommended for large, highly available sites.

The core architectural components of a management system are management consoles, management servers, and management agents. All core components can scale independently of each other. Management consoles are the portals that allow the administrators to access and manipulate the managed systems. Management servers continuously monitor the managed systems, receive alarms and notifications, log events and performance data, and act as the first line of response to predefined events. Management agents are programs that perform primary management functions within the device in which they reside. Management agents and management servers communicate with each other using standard or proprietary protocols.

Once systems reach a certain scale and rate of change, the management and operation of a Web site becomes the critical factor. Administrative simplicity, ease of configuration, ongoing health monitoring, and failure detection are perhaps more important than adding application features or new services. Therefore, the application architect must deeply understand the operational environment in which the application will be deployed and run. In addition, operations staff must understand the cloning and partitioning schemes, administrative tools, and security mechanisms in depth to maintain continuously available Internet-based services.



Microsoft Application Center 2000 Resource Kit 2001
Microsoft Application Center 2000 Resource Kit 2001
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 183

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net