Lesson 3: Planning Network Capacity | MCSE Training Kit (Exam 70-226): Designing Highly Available Web Solutions with Microsoft Windows 2000 Server Technologies (MCSE Training Kits)

When planning the capacity requirements for your Web site, you must determine the installation’s current and future needs and then choose the hardware and software that meets those needs. The planning process includes many steps, such as determining the site’s purpose, identifying the user base, and finding potential bottlenecks. This lesson describes each of the steps that make up the capacity planning process.

After this lesson, you will be able to

Identify and describe the steps necessary to plan your Web site’s capacity
Plan the capacity requirements for your Web site

Estimated lesson time: 30 minutes

The Planning Process

Planning for your network’s capacity requirements involves a number of steps. This section describes each of them.

Determining the Purpose and Type of Site

When you begin to plan the capacity requirements for your site, you must identify the site’s purpose and what type of site you’ll create. For example, you might be creating a transaction site that allows users to retrieve and store information, typically in a database. A transactional site involves both reliability and security requirements that don’t apply to other types of sites.

In addition to determining the type of site, you must determine whether the site will support some form of dynamic content. Dynamic content takes many forms and can be provided by a wide variety of Internet and database technologies, such as SQL, ASP, ISAPI, or CGI.

At its simplest, dynamic content involves the Web server contacting a database, retrieving data, formatting it, and then sending it to a user’s browser as a Web page. For example, if a user wants to see information on a specific product, the server might contact a SQL Server database to retrieve the product’s description, a photo, price information, and whether or not the product is in stock. The resulting page would display in the user’s browser using conventional HTML as if it were a static Web page, but it would be created on the fly by the server when the user requests it. A site that makes use of dynamic content requires much more processing capability than a static site.

Identifying the User Base

The next step in calculating your Web site’s capacity is to determine how many people typically use the site concurrently. This data usually comes from two main sources: market analysis and systems analysis.

If a site has yet to be built or launched, the site’s owners and operators will have probably commissioned a market analysis report that seeks to predict how much traffic the site can expect to receive at the time of the launch and afterward. Keep in mind that, as with any forecast, these numbers can be inaccurate.

If the site is already up and running, you should analyze your Web server log files to get a picture of how many hits the site receives at any given time as well as any usage trends that would indicate whether parts of the site have become more or less popular over time. When calculating how many users a site currently supports, remember to base your calculations on peak usage, rather than on typical or average usage.

Figure 7.7 shows an example of site usage. The chart illustrates average usage figures for a Web site that receives a great number of hits in the morning, fewer in the afternoon, and fewer still in the evenings. When planning capacity, this site’s operators should use data from the mornings as a baseline.

Figure 7.7 - Web site usage numbers

The site draws about 29,000 hits over six hours on Friday mornings, averaging 1.34 users per second. Taking the total number of weekly hits and averaging them over all seven days yields a much lower figure, 0.54 users per second. Using this lower figure as a baseline for capacity planning can lead to capacity shortfalls during busy periods.

You can determine the number of concurrent users by dividing the number of users by sessions totals. For example, if the site receives 500,000 users per day at an average session time of 11 minutes, you would use the following calculation to find the number of concurrent users:

500,000 ÷ (24 hours × 60 minutes ÷ 11 minutes per session) = 3,819 users

Of course, this doesn’t mean you can expect to see 3,819 users at any random time. Sometimes the traffic will peak at a much higher figure; therefore, one of the most important principles of capacity planning is to use peak activity, rather than average activity, as a baseline. As a rule of thumb, account for usage spikes and peaks by multiplying the average number of concurrent users by 2 (although this multiplier may differ depending on the nature of the site). In this case, doing this yields a figure of 7,600 concurrent users. If your site experiences peak traffic that’s more than twice the average traffic, consider this fact when determining where to set the baseline.

Determining Hardware Needs

You can determine your site’s hardware requirements by taking the number of anticipated or measured visitors that a site will have during a period of time and comparing it with the hardware’s capacity.

CPU Requirements

Web applications tend to be processor-bound. Contention issues, caused by more than one thread trying to execute the same critical section or access the same resource, can cause frequent expensive context switches and keep the CPU busy even though the throughput is low. It’s also possible to have low CPU utilization with low throughput if most threads are blocked, such as when waiting for the database.

There are two basic ways to get the processing power you need: you can add additional processors to each server, or you can add servers.

Adding processors to an existing server is often less expensive (and less troublesome) than adding servers. But for most applications there comes a point when adding processors doesn’t help. In addition, there is a limit to the number of processors that the operating system can support.

Adding servers allows you to scale linearly to as large a Web farm as you need. (Linear scaling means that two servers handle twice the load of one, three servers handle three times the load, nine servers handle nine times the load, and so on.)

Suppose you’re using a dual-processor 400 Pentium II computer running Windows Server 2000, and you determine that the CPU capacity is 1,350 users. Your site should support 7,600 concurrent users. To determine how many servers you’ll need to handle peak traffic, you must divide 7,600 by 1,350, as follows:

7,600 concurrent users ÷ 1,350 users per server = 6 servers

At times of normal usage, the load on the six servers will be lower, as shown here:

3,800 concurrent users ÷ 6 servers = 634 users per server

This means that the site is operating at 50 percent of site capacity when serving the anticipated amount of users. This is very important to sites that might experience usage spikes from time to time.

Memory Requirements

RAM access (at about 10 ns) is about a million times faster than disk access (about 10 ms), so every time you have to swap a page into memory, you’re slowing down the application. Adding sufficient RAM is the best and least expensive way to get good performance from any system.

You can make sure your application has enough memory by checking the paging counters (paging should be rare once the application is running) and by checking the working set size, which should be significantly smaller than available RAM in Windows 2000.

Storage Requirements

Network storage solutions are rapidly becoming available as the number and size of enterprise networks increase. Every organization has different priorities for selecting media and methods for data storage. Some are constrained by costs, and others place performance before all other considerations.

As you assess your storage needs, you need to compare the possible loss of data, productivity, and business to the cost of a storage system that provides high performance and availability. Consider the following needs before you develop your storage management strategy:

Technologies that are the most cost-effective for your organization
Adequate storage capacity that can easily grow with your network
The need for rapid, 24-hour access to critical data
A secure environment for data storage

When looking for the most cost-effective solution, you need to balance the costs of purchasing and maintaining hardware and software with the consequences of a disastrous loss of data. Costs can include the following expenses:

Initial investment in hardware, such as tape and disk drives, power supplies, and controllers
Associated media such as magnetic tapes and compact discs
Software, such as storage management tools and a backup tool
Ongoing hardware and software maintenance costs
Staffing
Training in how to use new technologies
Off-site storage facilities

Compare these costs to the following expenses:

Replacement costs for file servers, mail servers, or print servers
Replacement costs for servers running applications such as SQL Server or Systems Management Server (SMS)
Replacement costs for gateway servers running Routing and Remote Access Service (RRAS), SNA Server, Proxy Server, or Novell NetWare
Workstation replacement costs for personnel in various departments
Replacement costs for individual computer components, such as a hard disk or a network card

Another important factor to consider when you select a storage system is speed of data recovery. If you lose the data on a server, how fast can you reinstate that data? How long can you afford to have a server (or an entire network) down before it begins to have a serious impact on your business?

Storage technology changes rapidly, so it’s best to research the relative merits of each type before you make a purchasing decision. The storage system you use needs to have more than enough capacity to back up your most critical data. It should also provide error detection and correction during backup and restore operations.

Database Server and Disk Requirements

The database is a potential bottleneck that can be very difficult to fix. For read/write real-time data you have to have exactly one copy of the data, so increasing database capacity is much trickier. Sometimes the bottlenecks will be in the database server, sometimes they’ll be in the disk array.

If database server capacity becomes an issue, you have a number of options. If CPU capacity is the issue, add additional CPUs. Database applications such as SQL Server make good use of additional processors. If the disk is the bottleneck, use a faster disk array. More RAM can help as well if the database application uses advanced caching techniques.

Another option is to split the database across multiple servers. The first step is to put the catalog database on a server or set of servers. Because the catalog is usually read-only, it’s safe to replicate the data. You can also split off read-mostly data, such as customer information. But if you need multiple copies, replicating the information properly is more difficult.

However, it’s possible that your site could get so busy that the read/write data has to be segmented. This is relatively simple for most applications; you can segment based on postal code, name, or customer ID. But for some database applications, it takes application programming in the database access layer to make this work. The layer has to know the server to go to for each record. However, applications such as SQL Server support splitting a table across multiple computers, with no application programming.

Determining Network Bandwidth

Once you determine how many users you want to serve during a given time, you have the lower limit for your network connection bandwidth. You need to accommodate both normal load and usage spikes.

Of course, the type of site you operate has a large effect on this issue. For example, if you’re largely or entirely subscriber-based, or if your site is only on an intranet or an intranet/extranet combination, you probably already have a good idea of the maximum spike size. If, on the other hand, you issue software revisions to an audience of unknown size on the Web, there may not be a good way to predict the size of resulting spikes. You might, in fact, have to measure one or more actual occurrences to decide whether your bandwidth is sufficient.

A number of potential bottlenecks can occur in your networking hardware. First, your connection to the Internet might not be fast enough for all the bits you’re sending. If your application becomes very popular, you might need to obtain a higher-speed connection or redundant connections. Redundant connections also increase your reliability and availability. You can reduce your bandwidth requirements to prevent bottlenecks by reducing the amount of data you send, especially graphics, sound, and video. Your firewall can also become a bottleneck if it’s not fast enough to handle the traffic you’re asking it to handle.

Note that you can’t run an Ethernet network at anywhere near its theoretical capacity because you’ll create many collisions (two senders trying to transmit at the same time). When a collision happens, both senders must wait a random amount of time before resending. Some collisions are inevitable, but they increase rapidly as your network becomes saturated, leaving you with almost no effective bandwidth.

You can reduce collisions a great deal by using switches rather than hubs to interconnect your network. A switch connects two ports directly rather than broadcasting the traffic to all ports so that multiple pairs of ports can communicate without collisions. Given that the prices of switches have significantly decreased in the last few years, it’s usually a good idea to use a switch rather than a hub.

Defining the Site Topology

Each site has unique capacity requirements that can be affected by various considerations, such as available hardware and budget, available physical space for servers, and the amount of time the site is allowed to be offline. These requirements can have a direct effect on the design and construction of the site’s physical infrastructure.

Figure 7.8 provides a sample of a site topology. Note that the diagram represents only one possible strategy for designing the network topology. Each organization has its own needs and consequently necessitates its own network design.

Figure 7.8 - Sample site topology

When addressing site topology, consider the following questions:

What other server operations (such as backups and content replication) can influence site capacity? Because capacity measurements don’t include these extraneous operations, you should measure the resources that these operations use and add them to the server in addition to the capacity required to handle Web traffic.

For example, some calculations are based on the fact that ASP pages are cached and objects are in memory. When content replication takes place, IIS flushes most of its cache or Windows swaps a lot of the cache to disk, which causes paging and degrades system performance. You should determine how much memory content replication takes and then add it to the amount of memory your calculations predict are necessary to achieve the desired capacity.

A server rarely does only one thing at a time but rather performs a "symphony" of different operations at once. A carefully tuned server environment is like a well-conducted symphony. You can’t let only one operation run and then measure it and expect that measurement to be accurate. Often servers and entire sites have scheduled operations, such as content replication or content precaching, that takes place at regular intervals.

How often do you expect usage spikes to occur?The general rule is to plan enough capacity for twice the average number of concurrent users. If you anticipate significant usage spikes that exceed this baseline, plan for surplus CPU, disk, memory, and network capacity. Remember to take growth into consideration, as well as possibly more complex content in the future.
How important is it to be operational 100 percent of the time? How often will servers be offline for maintenance? If 100 percent site availability is critical, plan for system redundancy. Duplicate critical resources and eliminate single points of failure.
One way to do this is to use cluster technologies. You can use NLB to cluster Web servers, and the Cluster service to cluster SQL Server computers. With these technologies in place, you can take some servers offline for upgrades or repairs while the remaining servers continue to run the site.
When will you undertake capacity planning again? What growth do you expect? How will content complexity change? Over time, the average number of concurrent users on a site rises or falls, the content and content complexity change, and the typical user profile changes. These changes can have a big impact on a site’s capacity. Take change and growth into account when doing capacity planning and undertake it regularly or whenever these factors change sufficiently to affect site capacity.

Finding Potential Bottlenecks

Find out what’s likely to break first. Unless your site is extremely small, you’ll need a test lab to discover the bottlenecks. The following steps provide a guideline for determining potential bottlenecks:

Draw a block diagram showing all paths into the site. Include, for example, links to FTP download sites as well as other Uniform Resource Locators (URLs).
Determine what machine hosts each functional component (database, mail, FTP, and so on).
Draw a network model of the site and the connections to its environment. Define the topography throughout. Identify slow links.
For each page, create a user profile that answers the following questions:
- How long does the user stay on the page?
- What data gets passed to (or by) the page?
- How much database activity (or other activity) does the page generate?
- What objects live on each page? How system-friendly are they? (That is, how heavily do they load the system’s resources? If they fail, do they do so without crashing other objects or applications?)
- What is the threading model of each object? (The agile model, in which objects are specified as both-threaded, is typically preferable and is the only appropriate choice for application scope.)
Define which objects are server-side and which are client-side.
Build a lab. You’ll need at least two computers, because if you run all the pieces of WCAT on one computer, your results will be skewed by the tool’s own use of system resources. Monitor the performance counters at 1-second intervals. When ASP service fails, it does so abruptly, and an interval of 10 or 15 seconds is likely to be too long—you’ll miss the crucial action. Relevant counters include CPU utilization, pool nonpaged bytes, connections/sec, and so on.
Throw traffic at each object, or at a simple page that calls the object, until the object or the server fails. Look for the following:
- Memory leaks (steady decrease in pool nonpaged bytes and pool paged bytes)
- Stop errors and Dr. Watsons
- Inetinfo failures and failures recorded in the Event Log
Increase the loading until you observe a failure; document both the failure itself and the maximum number of connections per second you achieve before the system tips over and fails.
Go back to the logical block diagram, and under each block fill in the amount of time and resources each object uses. This tells you which object is most likely to limit your site, presuming you know how heavily each one will actually be used by clients. Change the limiting object to make it more efficient if you can, unless it’s seldom used and well off the main path.
Use the Traceroute utility among all points on the network. Clearly, you can’t trace the route throughout the entire Internet; but you can certainly examine a reasonable number of paths between your clients and your servers. If you’re operating only on an intranet, trace the route from your desk to the server. This gives you a ballpark estimate of the routing latencies, which add to the resolution time of each page. Based on this information, you can set your ASP queue and database connection timeouts.

Upgrading Your Web Site

Once you’ve determined how many users per server your site can support, you can consider scaling the site to support more users or to better serve the users you already have.

You can use three basic strategies to upgrade your site:

Increase the number of users per server
Increase the total number of concurrent users the site can support
Decrease the latency of the site for faster response times

To implement these strategies, you can use one or more of the following options:

Optimizing content Redesign your dynamic content to impose less of a burden on the site architecture. You can do this by writing smarter ASP or by changing the site so the average user (as defined by the user profile) calls heavy ASPs less often.
Improving server performance (scaling up) Add more and faster CPUs and more memory; upgrade to faster software, such as upgrading from Windows NT 4 to Windows 2000; and tune the server by optimizing software configuration.
Adding servers (scaling out) Add more servers to your Web clusters.

Measure the effect of these changes by analyzing your site before and afterward and then comparing the results. This can also help you predict the effects of future changes.

Scalability

Scalability refers to how well a computer system’s performance responds to changes in configuration variables, such as memory size or numbers of processors. This, however, is often difficult to gauge because of the interaction of multiple variables, such as system components of the underlying hardware, characteristics of the network, application design, and the operating system’s architecture and capabilities. Organizations need to have the flexibility to scale server-based systems up or out without compromising the multipurpose and price performance advantages of the operating system platform.

Scaling up is achieved by adding more resources, such as memory, processors, and disk drives, to a system. Hardware scalability relies on one large extensible machine to perform work.

Scaling out is achieved by adding more servers. Scaling out delivers high performance when an application’s throughput requirements exceed an individual system’s capabilities. By distributing resources across multiple systems, you can reduce contention for these resources and improve availability. Clustering and system services, such as reliable transaction message queuing, allow applications to scale out in this manner. Software scalability depends on a cluster of multiple moderately performing machines working in tandem. NLB, in conjunction with the use of clustering, is part of the scaling out approach to upgrading. The greater the number of machines involved in the load balancing scenario, the higher the throughput of the overall server farm.

Scaling out allows you to add capacity by adding more Web servers to an existing cluster or by adding more clusters. Although more expensive than upgrading a server, adding a server often gives a bigger performance increase and confers operational flexibility.

Making a Decision

The process for planning your network’s capacity requirements includes a number of steps. In each step you must make decisions about your site’s operations. Table 7.11 describes many of the considerations you should take into account for each step.

Table 7.11 Capacity Planning

Step	Description
Determining the purpose and type of site	You must decide on the type of site, such as transactional, e-commerce, or information. You must also decide whether the site will have dynamic (as opposed to static) content, and if so, how that content will be delivered. For example, will you be using ASP or ISAPI?
Identifying the user base	You must determine how many people will be using the site concurrently.
Determining hardware needs	You must base your hardware requirements on the number of anticipated concurrent users at peak usage times. To get the processor power you need, you can add additional processors to each server or add more servers. You should also ensure that you have plenty of RAM to avoid having to swap pages into memory. Decisions about disk storage strategies must balance the cost of equipment against the consequences of a loss of data.
Determining network bandwidth	You must base your bandwidth requirements on the number of anticipated concurrent users at peak usage times.
Defining the site topology	When defining the site topology, you must decide what server operations can influence site capacity, how often you expect usage spikes, how important is it to be operational 100 percent of the time, and what kind of growth you expect.
Finding potential bottlenecks	You should test your site to try to find any potential bottlenecks.
Upgrading your Web site	When necessary, you should either scale out or scale up your site. You can also streamline content to improve performance.

Recommendations

When planning your Web site’s capacity requirements, you should adhere to the following guidelines:

Identify your number of concurrent users by performing a market analysis (if the site hasn’t been launched yet) or by analyzing your Web server logs (if the site is already running).
Use peak traffic figures to determine the maximum number of concurrent users.
Base hardware and network bandwidth needs on the peak number of concurrent users. Processor power and RAM must be sufficient to avoid performance degradation at times of peak usage. Storage should be adequate enough to ensure the performance and availability required for your Web site.
The site topology must take into consideration server operations such as backup and replication, performance during expected usage spikes, availability requirements, and expected future growth.
Test for potential bottlenecks before implementing your site.
Upgrade your site by scaling out or scaling up. Scaling out generally provides a larger increase in performance and greater operational flexibility.

Example: Capacity Planning for Coho Vineyard

Coho Vineyard is implementing a Web site to help market its organization. Before implementing the site, the company tested for potential bottlenecks to try to determine where problems might arise. Figure 7.9 shows the network topology for the Web site.

Figure 7.9 - Coho Vineyard site topology

The Coho Vineyard site includes four identical Web servers that are configured as an NLB cluster. In addition to the network connecting the Web servers to the Internet, the Web servers are also connected through a private local area network (LAN) to the database tier and other specialized servers. These include the queued components server (for credit card authorizations and fulfillment), the box that runs the primary domain controller (PDC), and the Domain Name System (DNS) service. For data services, the site uses a failover cluster of two servers connected to a common RAID disk array. An administrative and monitoring network is connected to all of the computers. This means that the Web servers are connected to three networks; each of these servers is configured with three network adapters.

Each of the Web servers is configured with two 400 MHz processors. When the Coho Vineyard administrators load-tested the site, they discovered processing degradation at peak loads. When a third processor was added to each computer, performance increased by about 30 percent, which was enough to handle anticipated peak usage.

Each Web server is configured with 256 MB of memory. Throughout the testing and analysis process, memory never appeared to be an issue. Paging was rare, so no configuration changes were made to memory.

During the test phase, the administrators discovered that the database wasn’t working very hard and that the Web servers were sometimes very busy and then sometimes very slow. The problem resulted from using a 100 Mbps hub to connect the Web servers with the database servers. Because all the traffic was going through the hub, it had become swamped, thereby blocking the system from processing transactions quickly. When the administrators replaced the hub with a switch, the bottleneck was removed.

Database server capacity hasn’t been an issue for Coho Vineyard’s site. Only about 25 percent of the data servers’ capacity is being used, even when all four Web servers are running at 100 percent CPU utilization. Disk and memory capacity have also proven to be more than adequate.

Lesson Summary

Planning your network’s capacity requirements involves several steps. The first step is to determine the site’s purpose and type. You must know what the site will be used for and what kind of content it will support. You must also identify how many people will be using the site concurrently. Your estimate should take into account peak usage as well as average per user usage. Once you’ve determined your estimated number of concurrent users, you can then plan your hardware and network bandwidth needs. Both these needs should take into account the site’s peak usage number of concurrent users. Your system should have enough processing power and memory to meet the demands of peak usage. When assessing your storage requirements, you must weigh the expense of a system that provides high performance and availability against possible loss of data, productivity, and business. Overall, your site topology should take into consideration server operations, expected peak usage, availability requirements, and growth. You should also look for potential bottlenecks in your site to find out what’s likely to break first. When you upgrade your site, you can improve performance and availability by optimizing content, improving server performance, or adding servers.