Scalability and Availability | Building XML Web Services for the Microsoft .NET Platform

Scaling Your Web Service

To be successful, your Web service must scale to handle an increasing number of client requests. If your Web service is hosted on the Internet, you might eventually have enormous numbers of requests from clients from all over the world.

You should establish scalability goals early in the project. One classic mistake is setting scalability goals based on the average number of requests over a period of time. You should instead establish goals based on the total number of peak requests.

For example, let's say the Securities Web service has an expected usage of 300,000 requests per month. Assuming a 30-day month, that equates to 10,000 requests per day. However, 40 percent of the transfers occur on the 1st and 15th days of the month, when customers typically get paid. This means that the Web service should actually be capable of handling 60,000 requests per day.

In the following sections, I examine two strategies for scaling a Web service and the resources it uses: scaling up and scaling out.

Scaling Up

Scaling up a Web service involves moving it to a bigger, faster, more powerful machine to accommodate increased workloads. One main advantage of this strategy is that it makes the infrastructure easier to manage: it does not increase the number of servers the system administrator has to maintain.

One of the main disadvantages of the scale-up strategy is cost. You typically pay premium prices for higher-end computers, so the cost per transaction for high-end servers is often higher than for their commodity counterparts. This is further compounded when redundant servers are required to meet availability requirements.

Another disadvantage of the scale-up strategy is that you can scale only as much as the fastest machine will allow. Also, high-end servers are often multiprocessor boxes, so resources must be designed to take advantage of multiprocessors to fully utilize the box.

In general, you should consider a scale-up strategy for resources that are difficult to scale out. (I address the scale-out strategy in the next section.) For example, stateful resources such as relational databases are often difficult to scale out, especially if the data is dynamic, highly relational, and shared across multiple clients.

Recall that the Banking Web service stores all user state within a SQL Server database. You can often scale up the machine hosting SQL Server and still keep hardware expenditures within reasonable levels. If so, the scale-up strategy is probably your ideal course of action.

For resources that are difficult to scale up, look for opportunities to minimize the work they execute. For example, avoid implementing business logic within database stored procedures or performing data transformations within the database engine itself. Instead, move these activities out of the database and into a business logic layer that can be more easily scaled out.

Scaling Out

When scaling up is not feasible, you can scale out a resource by hosting it on a cluster of machines and then distributing the requests made to that resource across multiple machines in the cluster. As the load on the resource increases, you can add more computers to the cluster to accommodate the increase. (I realize that you might be accustomed to a more specific definition of cluster, but here I use the word in a broader sense, to refer to a group of computers that are used to host a particular resource.)

One advantage of the scale-out strategy is that you can often achieve near-linear scalability as you add more computers. The cost per transaction remains relatively constant as the infrastructure is scaled.

One disadvantage of scaling out is increased complexity. Instead of maintaining a single box, you must maintain multiple machines in the cluster. For example, you must install and maintain each Web server in the Web farm.

You can use products such as Microsoft Application Center to help reduce the costs associated with maintaining multiple machines in a cluster. The primary goal of Application Center is to allow an administrator to maintain a clustered resource as if it were installed on a single system. Application Center provides out-of-the-box support for Web-based applications, so it is well suited for deploying and managing HTTP-based Web services.

Network Load Balancing

A clustering technology known as network load balancing (NLB) involves distributing requests across the nodes in the cluster at the network protocol level. The client sends a request to a particular IP address, and the NLB system intercepts the request and ensures that only one node in the cluster processes it.

Because the requests are handled at the network protocol level, the client sees the resource as a single system image. The client is oblivious to which node is actually handling the request, so in most cases it is not required to make any changes in the way the resource is accessed.

One common use of NLB is in the creation of a Web farm. A Web farm is a cluster of Web servers that are front-ended by a hardware- or software-based NLB system. Because a Web farm is designed to handle HTTP requests, you can use it to host an HTTP-based Web service.

NLB is not limited to distributing HTTP requests; you can use it to distribute network requests for a variety of protocols, including such non-HTTP resources as an FTP server or even a Common Internet File System (CIFS) file share.

To ensure that your network load–balanced resource offers the highest degree of availability, make sure it has the characteristics described in the following three sections.

The Nodes in the Cluster Should Be Independent of One Another

Each node needs to be capable of handling the client's request independently of the other nodes in the cluster because any node could fail at any time. Such a failure should not hinder any other nodes from processing requests.

For example, a node is not independent if it has data stored locally that is required for completion of the client's request. If the node fails, no other node in the cluster can complete the request.

Any Node Should Be Able to Handle Any Request

If a request can be handled by any node in the cluster, the load balancing system can more evenly distribute the requests across the nodes in the cluster. This characteristic also ensures that nodes can be easily added or removed, allowing the cluster to be expanded or contracted to meet changes in demand.

For any node to be able to handle any request, a resource cannot rely on state stored locally between requests. If the resource is stateful, all requests from a given client must be routed to the same node in the cluster. The following code shows an example of a stateful Web service:

using System; using System.Web.Services; class Banking : WebService {     [WebMethod(EnableSession=true)]     public void Initialize(int accountNumber)     {         this.Session["AccountNumber"] = accountNumber;     }     [WebMethod(EnableSession=true)]     public void RequestWireTransfer(bool destinationAccount,                                     double amount)     {         string accountNumber = this.Session["AccountNumber"];         // Set up bill to be paid via funds         // in the designated account....     } }

This implementation of the Banking Web service relies on session state between the call to Initialize and the call to RequestWireTransfer. By default, session state is saved locally on the server, so the call to RequestWireTransfer must be routed to the same node from which Initialize was called.

You can maintain server affinity based on the client's IP address, but this approach is problematic because many clients access the Internet through a cluster of proxy servers, and it is possible for two requests from the same client to go through two different proxy servers with two different IP addresses.

In most cases, you can solve the problem by routing all requests from a class C address space to a particular node in the cluster. However, large ISPs such as AOL might have a cluster of proxy servers that span multiple class C address spaces. In such cases, a more sophisticated server affinity strategy is needed, such as the cookie-based system provided by Microsoft Application Center.

It is best to completely avoid imposing server affinity. In the previous scenario, you can take two approaches to avoiding server affinity. The first way is to configure ASP.NET session state so that it is stored on a central server that is accessible to all Web servers in the Web farm. The second way is to look for opportunities to remove the dependency on session state altogether. For example, you can require the client to pass the account number with every call to PayBill, thereby avoiding the need to implement the Initialize method.

There are a few reasons why you should look for opportunities to avoid using session state. First, the Web service client must support cookies. As you recall from Chapter 6, ASP.NET proxies do not support cookies by default. Also, the implementation of PayBill incurs the cost of a network round-trip to obtain the account number. Finally, the central session state server introduces a single point of failure to the system.

Requests Should Be Distributed Evenly Across All Nodes in the Cluster

How you distribute requests evenly across all nodes in the cluster is often determined by how resource intensive it is to process an individual request. If a request is not very CPU or memory intensive, you can employ a load-balancing mechanism that uses a hash algorithm or a round-robin algorithm and achieve fairly uniform distribution across all nodes in the cluster.

NLB is one technology that you can use to distribute lightweight requests across nodes in a cluster. NLB ships with Windows 2000 Advanced Server and uses a hash algorithm based on the client's IP address and port number to determine which node will process the client's request.

If requests are CPU or memory intensive, you might want to use a load-balancing system that routes requests based on utilization of the nodes in a cluster. Such a system monitors the state of each node and then routes requests to the least-utilized nodes.

Partitioning the Resource

You can use partitioning to provide a scale-out strategy for resources that cannot effectively be network load balanced. Partitioning means dividing a particular resource across multiple servers. For example, say I have one database server that handles all client requests for the Banking Web service. As the load increases, I can split the data contained in the database across two or more servers based on ranges of account numbers.

Devising a way to partition a resource so that requests are evenly distributed across all the servers can be a tough challenge. In general, it is easier to partition a resource when you have a small number of clients that need access to a particular subset of the data that must be partitioned. Partitioning becomes more challenging when you have a large number of clients that need access to the same set of data.

For example, a client of the Banking Web service has access only to data associated with its account number. Therefore, it is relatively easy to partition the data across multiple servers in the cluster based on ranges of account numbers. However, it is relatively difficult to partition data in a reporting system that supports ad hoc queries.

In general, it is costly to create and maintain a partitioned resource. Without support from the application, partitioning usually requires a lot of manual and time-intensive work. Not only do you have to design and implement a partitioning scheme, but you also have to maintain it.

You also have to constantly monitor the workload of each node in the cluster to ensure that no nodes are overloaded. When a particular node becomes overloaded, you must repartition the data. For example, a number of highly active accounts might happen to reside within the same database partition used by the Banking Web service. In that case, you would need to adjust the ranges of accounts hosted on each partition in the cluster.

SQL Server supports a feature called updateable distributed partitioned views that simplifies partitioning data contained in one or more tables across multiple servers. However, this approach makes performing backups and disaster recovery operations more difficult. You must synchronize backups across each partition to ensure that referential integrity is maintained. Due to this increased complexity, you should consider partitioning only when scaling up is not feasible.

Replicating the Resource

The final scale-out strategy I will examine is replication. Replication involves duplicating the data hosted by a resource across all nodes in the cluster. This is an especially effective strategy for scaling out resources that provide access to read-only or mostly read-only data.

For example, suppose the Banking Web service has a database table that contains the fees charged to a client for using its service. Because the fees are relatively static, they can be replicated on multiple database servers. The implementation of the Web service can then obtain the fees from any database server in the cluster.

If the data is writable, implementing a replication strategy becomes more complicated. One issue with replicating writable data is maintaining the coherence of the data across the nodes in the cluster. Because multiple copies of the data reside within the cluster, you must ensure that updates made to one node are reflected across the other nodes in the cluster.

You also need to resolve merge conflicts. A merge conflict occurs when the same data is updated with two different values on two different nodes at the same time. One way to resolve a merge conflict is to allow the last write to win. This technique is used by Active Directory. For this strategy to be effective, the nodes in the cluster must have synchronized clocks.

Another way to resolve merge conflicts is to avoid them altogether. For example, you can allow writes to occur on only one node in the cluster. However, this scenario is practical only if the data is read more often than it is written because all writes are performed on one server.

Another design issue is how replication should be handled by the cluster. Replication takes a certain amount of time to perform. While replication is occurring, a node in the cluster might be queried and retrieve the original value. This can cause problems for your application.

For example, suppose a client modifies some data and then views the data to verify the results. If the data is written to one node in the cluster and then viewed from another node before the changes have had time to replicate, it will look like the data was not modified.

Some resources, such as SQL Server, support transactional replication order to solve this problem. When a client adds, modifies, or deletes data on one node in the cluster, the data can be accessed only after it has been successfully replicated to all other nodes in the cluster. The downside to transactional replication is that modifications made to replicated data will take longer to complete as more servers are added to the cluster.

Overcoming Scalability Bottlenecks

Sometimes your Web service will need to access resources that do not scale well. For example, the Banking Web service needs to coordinate fund transfer requests with other banks. This task is accomplished via a legacy LOB application.

The LOB application can handle 15,000 requests per day, but our peak load is around 60,000 requests per day. Unfortunately, it is not practical to scale the LOB application to meet the needs of the Banking Web service. However, recall that if the requests are averaged across the month, it comes out to only 10,000 requests per day, which is well under the 15,000-requests-per-day maximum load.

What we need is a way to buffer the LOB application from the peak load on the 1st and the 15th of every month. We can accomplish this by placing a queue between the Banking Web service and the LOB application. Instead of issuing the requests to transfer funds synchronously, the system queues the requests and the LOB application processes the requests at a steady pace.

One downside to this technique is that a request received from the Web service to transfer funds might not be processed promptly by the LOB application. In the case of the Banking Web service, the 60,000th request received on either the 1st or the 15th will not be processed until 24 hours later. If you leverage queuing to address scalability issues, you must manage your clients' expectations about the time it might take to process their requests.