Scaling Considerations

There are many issues to consider when you're building a clustered environment. Proper planning of your application architecture is important, as well. Many factors are involved, and laying out a plan before purchasing and building your clustered environment can save you many headaches later. Questions you may want to ask include:

How many servers do you need? The number of servers will depend on how much traffic you expect and how Web site functionality is distributed in your server farm.
What types of servers and operating systems do you want to deploy? Choosing servers and operating systems depends on many factors, including your team's skill sets and experience in these areas.
How do you balance traffic between the servers? The methods that you select for load balancing may affect your choice of load balancer. You may want users to stay on one machine for the length of their session. Failover and server monitoring are other considerations when balancing traffic in a cluster.
How will you keep the Web site content in sync among all the servers, and how will you deploy the Web site? This is potentially one of the most troublesome areas in Web site maintenance. Not only do you need to keep Web site content in sync, but each server requires periodic configuration changes, patches, and hot fixes to be deployed as well.

We'll try to answer some of these questions by breaking the Web site infrastructure into major elements and then discussing their implementation. These major elements include tiered application architecture, server and hardware components, and cluster load balancing.

What do you have when you have a Web site? You have a server or servers with operating systems, files, directories, configurations, hardware, and software. Your environment may be tiered, consisting of the Web server, application server, and a separate database server. Let's discuss this tiered application architecture first.

Tiered Application Architecture

One of the most common approaches to scaling a system is tiering. What tiering means is to logically or physically separate and encapsulate a set of processes or functionality. Generally when you are looking to scale a system, you want to consider physical tiering where you separate specific system functions physically, by putting them on their own machines or clusters. For example, most simple ColdFusion applications are three-tiered applications, where the browser is the client tier, the Web server and ColdFusion are the application tiers, and a database is the data tier. Complex applications can have any number of tiers, and it's not uncommon see authentication tiers, business object tiers, and others. This sort of architecture is usually called a physical architecture, in that the actual physical separation of software systems is represented on specific groups of servers or hardware.

Figure 3.1 shows a three-tiered Web site architecture where ColdFusion MX 7 is installed in the application server tier. This configuration can be accomplished by installing ColdFusion MX 7 on a supported J2EE application server platform. For more about deploying ColdFusion MX 7 on J2EE, see Chapter 4, "Scaling with J2EE."

Figure 3.1. Three-tiered server farm with ColdFusion MX 7.

Front-End Servers Vs. Back-End Servers

When creating your system infrastructure, it's important to design with security in mind. One of the best ways to do this is to limit public exposure to only those systems that absolutely need to be exposed, such as Web servers, and nothing else. This public set of servers and the network are often referred to as the front end; servers on the private network are referred to as the back end.

NOTE

In a two-tiered architecture, the Web server, all its content, and Web pages are separate from the database server for a single Web site.

The front end is the network segment between the public Internet and your Web cluster. The front end should be optimized for speed. Place a switched segment with lots of bandwidth in front of your Web servers. Your two primary goals on the front end are to avoid collisions and to minimize the number of hops (intervening network devices) between your Web servers and the public Internet. If you are using hardware-based load balancing, you could have a hardware load balancer in front of your front-end network.

The back end is the network segment between your Web cluster and your supporting servers. Because your support servers need to talk only to your Web servers and your LAN, you don't need to make this segment directly accessible to the public Internet. In fact, you might do better to deliberately prevent any access to these machines from the public Internet by using private IP addresses or a firewall. Doing so enables you to take advantage of useful network protocols that would be a security risk if they were made available to the public Internet. In addition, be sure to spend some time trying to minimize collisions on your back-end network. A configuration like this might look like that in Figure 3.2. You have either a single firewall that separates the public from the private system orbetter yeta firewall in front of your Web servers and another one in front of your back-end servers, which are connected to the public server via a trusted connection.

Figure 3.2. This figure shows how a firewall can separate servers available to the public internet from database servers and others systems on the backend or private internal network.

To protect the back-end servers from unwanted traffic, you can implement dual-homed servers. This strategy employs two network interface cards (NICs) in a Web server: one that speaks to the front end and one that speak to the back end. This approach improves your Web server's network performance by preventing collisions between front-end and back-end packets.

NOTE

If you choose to dual-home your Windows 2000 servers, you must contend with a particularly nasty problem known as dead gateway detection. Your server must detect whether a client across the Net has ended communications even though the request has not been fulfilled. This problem commonly occurs when a user clicks the Stop button on a Web browser in the middle of a download and goes somewhere else. If errors occur, Windows 2000 will eventually stop responding. The solution to this problem in Windows is an advanced networking topic and beyond the scope of this book. You can find information on this subject at the Microsoft Web site at www.microsoft.com/. The concept in general is covered in RFC-816. The full text of this RFC is available on many public sites throughout the Internet.

In a dual-homed configuration, depending on which type of load balancing you are using, you can use private, nonroutable IP addresses to address machines on the back-end server farm (see Figure 3.2). Using private nonroutables introduces another layer of complexity to your setup but can be a significant security advantage.

Server and Hardware Components

Several considerations regarding server and hardware configurations crop up when you attempt to scale your site. These issues include the number of CPUs per box, the amount of RAM, and the hard drive speed and server configuration in general.

If your server is implemented with one CPU, turning it into a two-CPU system does not double your performance, even if the two processors are identical. Depending on the hardware, your operating systems, and your application, you should expect only about a 60 percent performance increase. Adding a third CPU increases the performance even less, and the fourth CPU gives an even smaller boost. This is true because each additional CPU consumes operating system resources simply to keep itself in sync with the others. Also, not every operating system or application can effectively and efficiently take advantage of multiple CPUs. Generally, if a two-processor machine is running out of processor resources, you're better off adding a second two-processor machine than adding two processors to your existing machine. To illustrate, see Figure 3.3, which shows performance gains when adding up to four CPUs on one server. Notice that the performance gains are not linear. Each additional CPU has less improvement than the preceding CPU.

Figure 3.3. Performance gains by adding CPUs to a server are not linear.

You might ask why you would want a two-processor machine at all. Why not use four single-processor machines instead? In an abstract measure of processor utilization, you might be right. But you also must deal with problems of user experience. Even though you're not using 100 percent of the second processor on the server, you are getting a strong performance boost. With this increase, a page that takes 2 seconds to process on a one-processor box might take just over 1 second to process on a two-processor box. This can be the difference between a site that feels slow and a site with happy users. Another point in favor of two-processor machines: Many server-class machines, with configurations that support other advanced hardware features necessary for a robust server, support dual processors as part of their feature sets. If you're investing in server-class machines, adding a second processor before adding a second server can be cost effective.

Macromedia has worked with Intel and Microsoft to greatly improve multiple processor performance in Windows 2000. If you are using Windows 2000 Server, Advanced Server, DataCenter Server, or Windows 2003, you will see a far better performance improvement with additional processors than you would with Windows 2000. If you are developing a new site and you haven't yet chosen a Windows-based operating system, look into Windows 2003 for better performance.

Unix environments, on the other hand, are designed to take advantage of multiple processors and use them efficiently; ColdFusion profits from the extra processing power Unix environments provide. To determine how to scale a Unix environment (that is, whether to add processing power or another server), make your best judgment using your performance-testing data. Bear in mind, however, that although adding a few more processors will definitely increase your Unix site's performance, if you have only one Web server and that server goes down, no amount of processors will beat having an additional machine for redundancy.

Linux has become especially popular among ColdFusion MX 7 developers and hosting companies. ColdFusion MX 7 performs extremely well on Red Hat Linux, as well as some other, but unsupported, distributions.

RAM is another hardware issue to consider. The bottom line is that RAM is cheap, so put as much RAM in each machine as you can afford. I recommend at least 512 MB. Additional RAM allows for more cached database queries, templates, and memory-resident data. The more RAM you have, the more information you will be able to cache in memory rather than on disk, and the faster your site will run.

Another memory consideration is that Java and thus ColdFusion MX 7 tend to be more memory intensive due to the nature of Java Virtual Machines. With plenty of memory that you can assign to your JVM heap, you can have dramatic performance impact on systems that you expect to experience high usage. For more on this, read Chapter 4, "Tuning the JVM for Performance."

Hard-disk drive speed is an often-overlooked aspect of server performance. Traditionally, SCSI drives have offered better performance and stability than IDE drives and are usually recommended. Recently, however, speeds on IDE and Serial Advanced Technology Attachment (SATA) drives have greatly improved. Both SCSI and IDE offer good performance, as well as high availability in the case of drive failure; the same is true of serial ATA drives in a Redundant Array of Independent Drives (RAID) on a dedicated drive controller. Most production-level RAID controllers enable you to add RAM to the controller itself. This memory, called the first-in, first-out (FIFO) cache, allows recently accessed data to be stored and processed directly from the RAM on the controller. You get a pronounced speed increase from this type of system because data never has to be sought out and read from the drive.

If you use a RAID controller with a lot of RAM on board, you also should invest in redundant power supplies and a good uninterruptible power system (UPS). The RAM on the RAID controller is written back to the hard disk only if the system is shut down in an orderly fashion. If your system loses power, all the data in RAM on the controller is lost. If you don't understand why this is bad, imagine that the record of your last 50 orders for your product were in the RAM cache, instead of written to the disk, when the power failed. The more RAM you have on the controller, the greater the magnitude of your problem in the event of a power outage.

Many people believe that all servers should make use of RAID, but it often makes more sense to use RAID only on systems that are actually doing substantial data storage and file I/O, such as database servers. Often, a whole application layer is actually designed to run in an application server cluster's RAM; thus, minimal RAID or even mirrored drives might make sense. This decision will in large part be dictated by your application design, architecture, and available budget.

The type of load-balancing technology you use has a big impact on the way you build your system. If you are using load-balancing technology that distributes traffic equally to all servers, you want each of your systems to be configured identically, depending on that tier. (Your Web servers will likely have much different hardware requirements than your database servers.) Most dedicated load-balancing hardware can detect a failed server and stop sending traffic to it. If your system works this way, and you have some extra capacity in your cluster, it's acceptable for each box to be somewhat less reliable because the others can pick up the slack if it goes down. But if you're using a simple load-balancing approach such as round robin DNS (RRDNS), which can't detect a down server, you need each box to be as reliable as possible because a single failure means some of your users cannot use your site.

Because you want your users to have a uniform experience on your site, regardless of which server responds to their requests, keep your system configurations as close to identical as possible. Unfortunately, because of the advanced complexity of today's operating systems and applications, this consistency is a lot harder to accomplish than it sounds. Identical configurations also help to alleviate quality assurance issues for your Web site. If your servers are not identical, your Web site may not function the same way on the different servers. This condition makes managing your Web site unnecessarily complex. If you must have different servers in your configuration, plan to spend extra time performing quality assurance on your Web applications to ensure that they will run as expected on all servers in the cluster.

Considerations for Choosing a Load-Balancing Option

Before deploying your clustered server farm, consider how you want your servers to handle and distribute load, as well as what your budget is. Also take into account how much traffic you expect to handle and how much that traffic will grow. There are a variety of approaches for handling and distributing load, including dedicated load-balancing hardware, load-balancing software, and RRDNS. Software and hardware load-balancing systems employ user-request distribution algorithms, which can distribute user requests to a pre-specified server, to a server with the least load, or through other methods. A round robin configuration passes each user request to the next available server. This is sometimes performed regardless of the selected server's current load. Round robin configurations may involve DNS changes. Consult with your network administrator when discussing this option.

Round Robin DNS

The RRDNS method of load balancing takes advantage of some capabilities that are the result of the way the Internet's domain name system handles multiple IP addresses with the same domain name. To configure RRDNS, you should be comfortable with making changes to your DNS server.

CAUTION

Be careful when making DNS server changes. Making an incorrect DNS change is roughly equivalent to sending out change-of-address and change-of-phone-number forms to incorrect destinations for every one of your customers and vendors, and having no way to tell the people at the incorrect postal destination or the incorrect phone number to forward the errant mail back to you. If you broadcast incorrect DNS information, you could cut off all traffic to your site for days or weeks.

Simply put, RRDNS centers around the concept of giving your public domain name (www.mycompany.com) more than one IP address. You should give each machine in your cluster two domain names: one for the public domain, and one that lets you address each machine uniquely. See Table 3.1 for some examples.

Table 3.1. Examples of IP Addresses
SERVER	PUBLIC ADDRESS	MACHINE NAME	IP ADDRESS
#1	www	Web1	192.168.64.1
#2	www	Web2	192.168.64.2
#3	www	Web3	192.168.64.3

When a remote domain-name server queries your domain-name server for information about www.mycompany.com (because a user has requested a Web page and needs to know the address of your server), your DNS server returns one of the multiple IP addresses you've listed for www.mycompany.com. The remote DNS server then uses that IP address until its DNS cache expires, upon which it queries your DNS server again, possibly getting a different IP address. Each sequential request from a remote DNS server receives a different IP address as a response.

Round robin DNS is a crude way to balance load. When a remote DNS gets one of your IP addresses in its cache, it uses that same IP address until the cache expires, no matter how many requests originate from the remote domain and regardless of whether the target IP address is responding. This type of load balancing is extremely vulnerable to what is known as the mega-proxy problem.

Internet service providers (ISPs) manage user connections by caching Web site content and rotating their IP addresses among users using proxy servers. This allows the ISP to manage more user connections than it has available IP addresses. A user on your e-commerce site may be in the middle of checking out, and the ISP could change its IP addresses. The user's connection to your Web site would be broken and their cart will be empty. Similarly, an ISP's cached content may point to only one of your Web servers. If that server crashes, any user who tries to access your site from the ISP is still directed to that down IP address. The user's experience will be that your site is down, even though you might have two or three other Web servers ready to respond to the request.

Because DNS caches generally take one to seven days to expire, any DNS change you make to a RRDNS cluster will take a long time to propagate. So in the case of a server crash, removing the down server's IP address from your DNS server doesn't solve the mega-proxy problem because the IP address of the down server is still in the ISP's DNS cache. You can partially correct this problem by setting your DNS record's time to live (TTL) to a very low value, so that remote DNSs are instructed to expire their records of your domain's IP address after a brief period of time. This solution can cause undue load on your DNS, however. Even with low TTL, an IP address that you remove from the RRDNS cluster might still be in the cache of some remote DNS for a week or more.

RRDNS should really only be considered for applications that need not be highly available or that do not require real failover, and are often best for systems with relatively static content. Most system designers do not even consider RRDNS a real solution to load balancing, especially in light of the plentiful software-based load balancing solutions (such as ColdFusion Load Balancing). We have included RRDNS in this discussion for the sake of completeness, as well as to make you aware of all the options available to system developers and designers.

User-Request Distribution Algorithms

Most load-balancing hardware and software devices offer customizable user-request distribution algorithms. Based on a particular algorithm, users will be directed to an available server. Hardware and software load-balancing systems offer a number of sophisticated features besides load balancing, depending on the product and vendor. Work with a knowledgeable resource to pick a product for your specific system.

User-request distribution algorithms can include the following:

Users are directed to the server with the least amount of load or CPU utilization.
Clustered servers are set up with a priority hierarchy. The available server with the highest priority handles the next user request.
Web site objects can be clustered and managed when deployed with J2EE. Objects include Enterprise JavaBeans (EJBs) and servlets.
Web server response is used to determine which server handles the user's request. For example, the fastest server in the cluster handles the next request.

The distribution algorithms listed above are not meant to be a complete list, but they do illustrate that many methods are available to choose from. They offer very granular and intelligent control over request distribution in a cluster. Choosing your load-balancing device may depend on deciding among these methods for your preferred cluster configuration.

Session State Management

Another load-balancing consideration is session-aware or "sticky" load balancing. Session-aware load balancing keeps each user on the same server as long as their session is active. This is an effective approach for applications requiring that a session's state be maintained while processing the user's requests. It fails, however, if the server fails. The user's session is effectively lost and, even if it fails over to an alternate server in the cluster, the user must restart the session and all information accumulated by the original session will no longer exist. Storing session information centrally among all clustered servers helps alleviate this issue. See Chapter 5, "Managing Session State in Clusters," for more information on implementing session state management.

Failover

Consider how you want your application to respond when a server in the cluster fails. Consider how you want to deal with loss of state information, and how you might create a state management system that is impervious to loss of a specific Web server or even a database server. An effective strategy will allow seamless failover to an alternate server without the user's knowing that a problem occurred. Utilizing a load-balancing option with centralized session state management can help maintain state for the user while the user's session is transferred to a healthy machine. Understanding the capability of your hardware and networking infrastructure and designing with specific capabilities in mind can make application developers' jobs much easier. It's important to include system administrators, system designers, and networking professionals in the cycle of application and software design.

Failover considerations also come into play with Web site deployment. You can shut down a server that is ready for deployment without having to shut down your entire Web site, enabling you to deploy to each server in your cluster, in turn, while maintaining an active functioning Web site. As each server is brought back into the cluster, another is shut down for deployment.

Mixed Web Application Environments

If your Web site consists of mixed applications and application servers, choosing your load-balancing solution becomes even more difficult. Let's take an example where your current Web site is being rewritten and transformed from an ASP (Active Server Page) Web site to a ColdFusion (CFML) Web site. Your current Web site is in the middle of this transformation, and ASP pages are co-existing with CFML pages. Not all load-balancing solutions will be able to effectively handle server load at the application level. Some will be able to handle load at the Web-server level only. In addition, session state management may not work as planned. Because ASP session and ColdFusion sessions are not necessarily known between the two systems, you may want to implement session-aware load balancing in this "mixed" environment. This type of session-aware load balancing could consist of cookies or other variables that both applications can read.