WebSphere and Domino Performance Issues | IBM(R) WebSphere(R) and Lotus: Implementing Collaborative Solutions

In this chapter, we discuss performance issues for both Domino and WebSphere and also discuss performance issues of a combined Domino and WebSphere system. We start off discussing some Domino performance issues, cover WebSphere performance issues, and then discuss issues when using a combined WebSphere/Domino system.

NOTE

Throughout the discussions, examples of "real world" performance issues for both Domino and WebSphere are discussed.

Performance Aspects of Domino on Multiprocessor Servers

What are some of the performance aspects of using Domino on multiprocessor servers? Are there advantages to running Domino on a 4-way server with 200MHz processors rather than a single processor server with an 800MHz processor? What about a single 1,000MHz processor vs. four 200MHz processors? With Domino R5, it's possible to run multiple MAIL.BOX files, multiple indexers, multiple cluster replicators, etc. This section gives numerous recommendations on optimizing Domino for multiprocessor servers. Often the Lotus guideline for the number of concurrent tasks , such as indexers, is based on the number of processors in your server (e.g., use multiple indexers up to the number of processors in your server). So, perhaps that's a significant advantage of a multiprocessor server. IBM Global Services has implemented Domino systems on many types of multiprocessor servers and has looked into these issues. A second important aspect of multiprocessor Domino servers is partitioning. Under R4 of Domino, partitioning a multiprocessor mail server usually allowed administrators to register many more users without sacrificing performance. With R5 of Domino, the performance and capacity improvement using partitions has changed to the extent that it is often better not to use partitions for Domino mail servers.

Introduction

To boost Domino R5's performance, you may have considered running it on multiprocessor servers. But doing so can prompt a lot of questions. Should you run Domino on a four-way server with 200MHz processors or a one-way 800MHz processor server? Or what about a single 1,000MHz processor versus four 200MHz processors? With Domino R5, you can run multiple tasks, such as multiple Mail.BOX files, Update tasks, and cluster replicators to improve your server's performance. But what are your best choices?

Another important consideration is whether to partition multiprocessor Domino servers. With Domino R4, partitioning a multiprocessor mail server typically lets administrators register numerous users without sacrificing performance. But with Domino R5, performance and capacity have improved to the extent that it's often better to avoid using partitions for Domino mail servers. IBM Global Services has implemented Domino systems on many types of multiprocessor servers and has examined these issues. In this section, we share the lessons we've learned and provide recommendations for the use of Domino on multiprocessor servers.

Overview of Multiprocessor Servers

Many large Domino servers (including UNIX, NT, and the AS/400) have multiple processors that use Symmetric Multiprocessing (SMP) ”a computer architecture in which tasks are distributed among two or more processors. Other computer architectures, such as Massively Parallel Processing (MPP), are also used for multiprocessor servers ” specifically , large servers that may have hundreds of processors (e.g., IBM's chess-playing supercomputer "Deep Blue").

A few years ago, high-performance multiprocessor machines carried a price tag of $100,000 or more. The multiprocessor market consisted of proprietary architectures that demanded a higher cost due to the scale of economics. In 1995, Intel helped to change this scenario by bringing high-performance computing to the mainstream with its Pentium Pro processor. The Pentium Pro was a high-performance processor with built-in support for multiprocessing. This support, coupled with the low cost of components , enabled computer manufacturers to build high-performance multiprocessor machines inexpensively. Today, a four-processor machine costs less than $12,000.

SMP is the primary parallel architecture employed in these inexpensive machines. SMP architecture is a tightly coupled multiprocessor system where processors share one copy of the operating system (OS) and resources, which often include a common bus, memory, and an I/O system. However, the machine is handicapped unless it has an OS that can take advantage of its multiprocessors.

In the past, manufacturers of multiprocessor machines were responsible for the design and implementation of a machine's OS. Frequently, machines could operate only under the OS that the manufacturer provided. However, typical multiprocessor Domino servers are built from common PC components and have open architectures. This makes possible an environment where any software developer can design and implement an OS for these machines.

Two mainstream examples of such operating systems are UNIX and Microsoft's Windows NT/2000. These operating systems exploit the power of multiprocessors by incorporating multitasking and multithreading architectures. Although the OS implementations of these architectures are different, the results for Domino servers are similar. Windows NT/2000 and UNIX/Linux are two of the most popular Domino servers. The AS/400 (iSeries), another popular Domino server, also uses SMP architecture and can contain up to 24 processors.

Lotus Guidelines

Lotus's guidelines for the number of allowable concurrent tasks, such as Update (a default task that continually checks and updates folders and views), is often based on the number of processors in your server. For example, you can use up to the same number of multiple Update tasks (also called indexers or view-indexers) in your server as you have processors. You also can use one less Update task.

Updating is a CPU- intensive process, so you might presume that it's useful to leave one processor free to handle other tasks while multiple indexers employ the other processors. And actually, as we'll see, Domino on most multiprocessor systems does work that way. With Domino, a single task is run on one processor. Because the Domino Update task is threaded, it can take advantage of running multiple processes without having multiple CPUs. In addition, tests conducted at Lotus's Enterprise Center of Competence (ECoC) indicate that you can run R5 with a single partition (on a four-way server) and achieve the same (or slightly better) capacity and performance of a multipartition configuration.

With Domino R4 on early versions of multiprocessor NT hardware, many technical people believed that a mail server with more than two processors was a waste because the server would effectively use only two of the processors.

We've captured several graphs using the NT Performance Monitor on a customer's production mail server (Figure 14-1). The Perfmon screen captures showed an even distribution of CPU utilization over the four processors. When we captured the Perfmon screen, the average overall CPU utilization was a moderate 26 percent for the mail server. Of course, the same even distribution of processing is also true for Domino R5. But does that even distribution hold true when one CPU-intensive Domino server task (e.g., updaters, replicators, or cluster replicators that are sized based on the number of processors in the server) is running? The next few sections discuss our findings for these tasks.

Figure 14-1. Perfmon display of CPU utilization for four-processor R4.6 mail server.

Multiple Update Tasks

Update tasks read a request from the queue, remove it from the queue, and then perform the indexing functions on the view. A single Update task works on one database that's pulled from the queue. If a second request enters the queue, the next Update task removes the request from the queue and starts working on it. If both requests are for the same database, the two tasks will work in tandem on that database. But more likely, the two tasks will work on different databases.

Multiple Update tasks can update different view indexes simultaneously within the same database. However, they cannot update the same view or run on the same full-text index (because it's one index) at once.

Multiple Update tasks are helpful from a response-time standpoint when they work on different databases because two Update tasks, for example, can empty a queue twice as fast. Additionally, multiple Update tasks that continually update views reduce users' wait time for an indexer to complete and display the view. At IBM server farms, we use three Update tasks per each four-way NT server running Domino to improve performance for users accessing large databases.

Beginning with R4, you can run multiple Update tasks on a server. To enable multiple Update tasks, you must perform one of the two following steps:

Add the parameter " Updaters = # of desired Update tasks to run on the server " to the Notes.INI file.
Manually load the Update task multiple times at the server console by typing LOAD UPDATE .

Multiple Update Tasks Test Results

We ran our tests at IBM's server farms using a test system that duplicated a customer's production system. The test system used a four-processor Netfinity 7000 server running Domino R5.0.7 server code. Here's part of the test scenario we used:

Test 1 (using one Update task)

Obtain a copy of the large workflow database (about 750MB) from the production system that must be indexed before you can open views.
Obtain a copy of the second production database that must be indexed before you can open views.
Have one Update task available.
Start the capture of CPU utilization for each processor.
From the Domino server command line, type UPDATE to update the second database.
From a user workstation, open a view on the workflow database (that must be indexed) and time how long it takes to open.
Save CPU utilization results for each of the four processors.

Test 2 (using one Update task with the same test steps as Test 1 but without the Domino server UPDATE command)

Restore the two databases to their original state (i.e., copy the two databases from saved copies outside of Domino\Data).
Start the capture of CPU utilization for each processor.
From a user workstation, open a view on the workflow database (that must be indexed) and time how long it takes to open.
Save CPU utilization results for each of the four processors.

After we ran these two tests, we tried the same tests again with two Update tasks defined. Our test results indicate that multiple Update tasks can have a significant impact on the consistency of performance when many users employ numerous large databases. Our tests with NT servers showed that the processing for each Update task uses only one processor in a multiprocessor server. So the concept of running three Update tasks on a four-way NT server and leaving one processor free for other tasks is valid. For some of our test scenarios, running three Update tasks proved optimal for our large workflow application.

Multiple Mail.BOX Files

Support for multiple Mail.BOX databases, new in R5, can improve performance for Domino mail systems. IBM recommends configuring multiple Mail.BOX databases on both dedicated R5 SMTP servers (that handle SMTP/MTA function for remaining R4 servers) and R5 mail servers to solve bottleneck problems. We usually recommend that SMTP/MTA and mail servers have three or four Mail.BOX databases because these servers are typically four-way NT platforms.

Before support was available for multiple Mail.BOX databases, a corrupted Mail.BOX could prevent mail flow on a server. But by moving to multiple Mail.BOX databases, you can enable mail flow to continue even if one Mail.BOX database is corrupted. Although the use of multiple Mail.BOX databases is a handy feature for R5 Domino mail servers, you still must determine how many Mail.BOX databases to define. Lotus usually recommends two Mail.BOX databases and suggests that more is a waste. But is that true?

To test the multiple Mail.BOX feature, we first set up a multiprocessor server with one Mail.BOX and ran a test script to drive the mail system to measure delays and monitor CPU and I/O utilization. Next, we set up a multiprocessor server with two Mail.BOX databases and repeated the prior procedure. Finally, we set up a multiprocessor server with three Mail.BOX databases and repeated the procedure used in the previous two test scenarios. According to our results, test scripts that drive heavy mail flow and define three or four Mail.BOX databases can be optimal for some mail systems.

Multiple Scheduled Replicators and Cluster Replicators

Domino supports both multiple scheduled replicators and multiple cluster replicators. Of course, you must determine how many of these replicators to run to optimize performance on your Domino system. But what criteria can you use to determine the optimal number of replicators to use?

Scheduled Replication . You can load two instances of the schedule replicator task per processor (but not more than eight). Each replicator can process one server at a time and queue up to five requests. To specify that a server loads multiple replicators whenever it starts, add the Replicators=n variable to the Notes.INI file, where n is the number of replicators the server should run. Each replicator occupies about 3MB of RAM when idle, so adding replicators even when they're not in use carries a memory cost. Replication in R5 is much more efficient than in R3 and somewhat more efficient than in R4. Domino 6 provides significant performance improvements in both scheduled replication and cluster replication via data compression and data streaming techniques.

Cluster Replication . The Cluster Replicator task synchronizes database replicas within the cluster and ensures that clients see up-to-date information. When you make changes to a database in a cluster, the Cluster Replicator pushes the change to all other replicas in the cluster. Therefore, the Cluster Replicator is event-driven rather than schedule-driven, as in traditional replication. One Cluster Replicator is automatically run for each server in the cluster. However, you can run more than one Cluster Replicator to improve performance. Multiple cluster replicators work in tandem, sharing the replication workload. To enable multiple cluster replicators, append an additional CLREPL task to the Server Tasks setting in the Notes.INI file or type load CLREPL at the server console.

Our test results indicated that implementing both multiple scheduled replicators and multiple cluster replicators in a production system can improve performance. We usually run two scheduled replicators and two cluster replicators for Domino servers at our server farms.

Monitoring and Performance Analysis

All multiprocessor Domino server platforms, such as NT, UNIX , and the AS/400, have tools to monitor each processor's performance.

The NT server includes the Performance Monitor and Task Manager, which lets you monitor your system's performance. The Performance Monitor has more capabilities than the Task Manager does. It can monitor how efficiently your computer is running, determine whether any components are causing system bottlenecks, plan for future growth, and set alert conditions.

You use Task Manager ”a new utility for NT 4.0 ” to display an overview of system CPU and memory usage. Task Manager also provides monitoring capabilities for the Performance Monitor's more commonly tracked options. Task Manager is easy to access and requires no configuration. You can use it immediately to display graphs of CPU utilization for each of your Domino server's processors. With Task Manager, you can quickly determine the impact on each processor when a Domino task, such as an indexer or replicator, kicks off.

You use the UNIX command vmstat to monitor CPU utilization on each processor. For example, vmstat 3.5 gives you a running status every three seconds, five times. Using vmstat also provides information about paging, disks, and memory use. Other UNIX commands are:

sar ” provides similar data as vmstat for UNIX V environments.
ps ” provides a list of current processes and their use of resources.
iostat ” provides status and usage of I/O resources (e.g., disks).
lsps “a ” provides statistics about the use of paging space.

Using Operations Navigator (OpsNav) on the AS/400 and iSeries lets you monitor and manage your server. OpsNav also provides Management Central functions, which you can use to observe a wide range of performance metrics, including CPU utilization, disk arm utilization, memory usage, and LAN utilization. Management Central's realtime monitoring capability lets you drill down to tasks and resources that may be causing or experiencing high-resource utilization and determine the contributing factors. It also provides data archiving, replay, and exporting capabilities for studying performance trends.

For more comprehensive performance monitoring of the server, the IBM Performance Tools for iSeries includes data collection and reporting capabilities for all system resources and tasks, along with an advisor function that analyzes data and provides appropriate configuration and tuning recommendations.

Using Domino Partitions on Multiprocessor Servers

Now let's look at examples of using Domino partitions on different platforms such as NT, UNIX, and AS/400 servers. These examples reflect the experiences of IBM and Lotus as well as R5 test results.

At its test lab in Cambridge, Massachusetts, Lotus tested various aspects of performance with and without partitions on four-processor UNIX servers (RS/6000) and four-processor NT servers (Compaq Proliant 6500) running Domino. Lotus conducted the tests using Domino R4.5 and R5.0.2 server code.

The first test system consisted of an RS/6000 SP 332MHz four-way SMP thin node with 3GB of RAM, two 4.5GB SCSI disks for rootvg and paging, and forty-eight 9.1GB SSA disks for Notes data and logs. The server contained one partition with 3,500 registered users and two other partitions with 1,750 registered users each.

As Figure 14-2 shows, the probe response time for both partition scenarios is flat to 2,000 users but increases at 2,500 users. Notice that, at 1,500 users, CPU utilization for one partition is 80 percent and for two partitions is 95 percent. As you can see, using one partition works as well as using two partitions. Domino R5's capacity was 2,000 or more users, and for R4.5, it was about 650 users. Each server showed a fast response time up to capacity.

Figure 14-2. Probe response time with CPU utilization for 1 partition and 2 partitions.

graphics/14fig02.gif

The second test system consisted of a Compaq Proliant 6500 four-processor NT 4.0 machine with 1 GB of RAM and 90 GB of hard drive. Lotus conducted the test on a large customer application to determine the performance of R4 against R5. The results indicate that Domino R5 showed some performance improvement and had a much higher capacity than R4, especially when R5 ran on SMP systems.

IBM's guidelines for partitioning Domino servers have changed from R4 to R5. For Domino R4.5 mail servers, IBM recommends using one partition for every two processors on UNIX or NT platforms. This arrangement allows more mail users per physical box. With R5, IBM usually doesn't partition two-processor UNIX or four-processor NT platforms. IBM's tests show that partitioning doesn't improve the number of mail users allowed per physical box. But at IBM server farms, we partition the OS/400 subsystem architecture in AS/400 and iSeries Domino boxes to isolate each partition from the others (see "Partitioning on the AS/400" later in this chapter).

Other Considerations

With Domino R5, a four-way 200MHz server performs similarly to a one-way 800MHz server. But because SMP architecture carries some overhead, a single processor server should always perform somewhat better than a multiprocessor server. For servers greater than four-way, performance is a function of SMP architecture. Of course, availability may be a factor if losing CPU is a concern (it usually isn't). Therefore, one advantage of using a four-way NT server running Domino is that you get four times the processing power of a single processor server with the same processor speed. Conversely, if you need to replace your four-way 200MHz Domino server with an equivalent server, a one-way 800MHz server will do the job.

Another aspect of SMP processing is how it affects response time. Figure 14-3 shows the results of response time tests conducted in an IBM lab using three iSeries servers: a two-way 255MHz 170-2409, a one-way 450MHz 270-2423, and a one-way 540MHz 270-2423. These tests implemented a Web shopping application employed by 100 Web shopping users. As Figure 14-3 shows, using two 255MHz processors won't provide a faster response time than using one 450MHz processor will. The reason is that every SMP architecture requires some overhead, although the amount of overhead varies by architecture (see "NotesBench Performance of Domino on Multiprocessor Servers," for more information about various SMP architecture overhead).

Figure 14-3. Response time for single and multiprocessor servers.

graphics/14fig03.gif

The tests were made in an IBM lab using a Web shopping application involving 100 Web shopping users and three iSeries servers: 2B4255MHz 170-2409, a 450MHz 270-2423, and a 540MHz 270-2423.

You'll also want to consider limiting the risk of having all your mail users on one large Domino server without partitions should that server fail. For example, suppose your company has 4,000 Domino mail users, and you want to limit the number of those affected by the failure of one Domino mail server to 1,000 users. To accomplish this, you'd want to partition a large four-way Domino server into four partitions rather than run a large Domino server with no partitions. This logic assumes that if one partition fails, it doesn't bring down the other partitions.

A final point concerns Lotus's recommendation. Using a guideline based on the number of processors a server has to determine the maximum number of multiple tasks to run does make sense. However, the guideline should be also be based on the overall power of your server, not on the number of processors it has. We could argue that for most relatively new servers, a four-way server is typically about four times as powerful as a one-way server. But since processor power continues to increase rapidly (according to Moore's law, it doubles every 18 months) and because many slightly older servers are still in use, that argument is open to a lot of exceptions.

What's Next for Domino on Multiprocessor Servers?

Since its start with Notes R4.0, Domino on SMP servers has matured into a technologically solid and reliable product. Some server SMP architectures (e.g., iSeries) have little SMP penalty for Domino servers greater than four-way. In fact, NotesBench data indicates little penalty up to a 24-way iSeries Domino server. Multiprocessor UNIX Domino servers are also viable . This is not the case, however, for Intel-based Domino servers greater than four-way. But because Intel-based platforms are steadily becoming more powerful, eight-way (and higher) Domino servers based on Intel platforms will likely approach the effectiveness of current iSeries and UNIX servers.

Notes/Domino 6 (ND6) has many performance improvements related to the topics I've discussed in this chapter . With bigger, more powerful Domino servers to come, the need will be even greater for Domino network performance improvements. ND6 provides significant reductions in network bandwidth requirements via data compression and data streaming techniques. Lotus tests indicate that ND6 could decrease network traffic up to a 50 percent. This reduction will improve performance of all Domino's network aspects, including scheduled replication and cluster replication, and will make Domino clustered replication across the WAN even more feasible ”which is critical to Domino's continued success for the more powerful Domino servers of the future.

NotesBench Performance of Domino on Multiprocessor Servers

One way to determine whether a four-way server with 200MHz processors is equivalent to Domino on a one-way 800MHz processor server is to look at NotesBench performance numbers . Because Symmetric Multiprocessing (SMP) architecture contains some overhead, a single 800MHz processor server will generally perform better than a four-way server with 200MHz processors. Also not all SMP architectures exhibit the same efficiency.

Table 14-1 contains information from the NotesBench Web site (http://www.notesbench.org) that shows how Domino servers with various SMP architectures differ. Notice that the last column displays the number of Notes users per MHz, which is useful for comparing the efficiency of SMP systems. The number of users per MHz for the iSeries doesn't seem to vary much for the 24-way, eight-way, two-way, and one-way systems. Although the pSeries section contains only two entries for UNIX, these systems typically have efficient SMP architectures for numerous processors (more than four). The Intel boxes, however, differ significantly in the efficiency of Domino on multiprocessor servers. The single-processor Intel box supports twice as many users per MHz as the eight-way server does. As a colleague mentioned, this is one reason we don't see Domino solutions being rolled out on eight-way and larger SMP NT/2000 servers.

Table 14-1. NotesBench Results for Multiprocessor Servers

Server	Benchmark	Benchmark CPU Utilization	Number of Processors	Speed of Processors (MHz)	Total MHz Used for Benchmark	Benchmark Users per MHz
iSeries
iSeries 400 Model 840	100500	99.3%	24	600	14,299 MHz used	7.03 users per MHz
iSeries 400 Model 840	75000	93%	24	500	11,160	6.72 users per MHz
iSeries internal Benchmark for 830 8way**	22900	70%	8	540	3,024	7.57 users per MHz
iSeries internal Benchmark for 820 4way**	11810	70%	4	600	1,680	7.03 users per MHz
iSeries internal Benchmark for 820 2way**	6660	70%	2	600	840	7.86 users per MHz
iSeries internal Benchmark for 820 1way**	3110	70%	1	600	420	7.4 users per MHz
INTEL
Compaq ProLiant 8500	14000	96%	8	550	4,224	3.3 users per MHz
Compaq ProLiant 8500	10000	95%	4	550	2,090	4.78 users per MHz
IBM Netfinity 5600	8600	89%	2	933	1661	5.18 users per MHz
Compaq ProLiant DL360	5800	94%	1	933	877	6.61 users per MHz
pSeries
IBM RS/6000 Enterprise Server S80	57600	78%	24	450	8,424	6.84 users per MHz
IBM RS/6000 Enterprise Server M80	28032	99%	8	500	3,960	7.08 users per MHz

Extrapolating from Table 14-1, it's clear that a one-way 800MHz processor Intel box should make a significantly more powerful Domino server than a four-way server with 200MHz processors will. But for iSeries and UNIX servers, it appears that a one-way 800MHz processor is only slightly more powerful than a four-way box with 200MHz processors. The single-processor server should always be somewhat more powerful than the equivalent multiprocessor MHz SMP platform, but the amount of extra power depends on the SMP architecture used.

Partitioning on the AS/400

IBM's commercial server farm runs a 12-way AS/400 with 2.5 terabytes of hard drive for a large commercial customer. Domino R5 runs on the AS/400 and contains 12 partitions ”10 are mail partitions with 1,600 users each, and two are application partitions with 2.5 terabytes of hard drive shared by all partitions in the pool.

The customer wanted to limit the number of users to about 1,500 per partition. Although Domino works well with fewer large partitions, increasing the number of partitions (and, hence, the number of Domino servers) can reduce the risk of impact should one of the Domino servers fail.

The administration cost of using 12 partitions on Domino is as follows :

Administration cost is not reduced for monitoring.
Hard drive pooling allows significant administration savings because you don't have to balance a load based on hard drive use.
You upgrade or apply fixes to Domino only once for all 12 partitions.
You upgrade or apply fixes to the operating system only once for all 12 partitions.

Through its experiences at its commercial server farms, IBM also realized these aspects of partitioning:

The AS/400 proved robust with partitions; it allows Domino to automatically restart and lets programmers with NT skills work with the AS/400 administration interface.
NT experience shows that if one Domino partition fails, it takes down the whole server. This rarely happens with the AS/400 (iSeries) because its subsystem architecture isolates each subsystem, and in this case, it isolates one partitioned Domino server from the others.
Windows 2000 proved more robust than NT.
Unlike the AS/400, you can't pool hard drive space on NT or UNIX.