How Many CPUs?


The first major hardware question is: How many CPUs? This relatively simple question actually has two parts . First, what's the expected maximum number of concurrent queries? This is a much more involved question than just determining the number of reporting users who might access the data warehouse. The suggested procedure is to calculate the maximum number of concurrent users times the maximum number of concurrent reports per user that the end- user business intelligence tool permits . Plus, remember earlier we said that typical business intelligence tools often submit dozens of queries per report execution, some of whose execution may be submitted in parallel. So, the actual number of Oracle connections submitting queries could be as high as:

Concurrent Users * Concurrent Reports per User * Concurrent Queries Spawned per Report Execution

Let's assume that we'll have 25 concurrent users and that each user may submit up to 10 concurrent reports execution requests . Furthermore, let's assume that although our business intelligence tool typically generates about five SQL query statements per report, it nonetheless submits no more than four SQL query statements concurrently. This yields 1000 potential concurrent Oracle query processes!

Second, what degree of parallelism will the average query utilize (if any)? Oracle's Parallel Query Option (PQO) permits Oracle to spawn (i.e., fork) multiple processes to handle a given query. It is a true "divide and conquer" technique to improve query response time.

Returning to our simple example, let's further assume that the DBA has the warehouse tables set up for parallel query with a degree of four. This takes our total potential concurrent Oracle query process count to 4000! That's potentially 4000 concurrent query processes to support just 25 business intelligence users.

I did not define this example to sound so outrageous . But it does hopefully make it clear that the Oracle parallel degree setting requires much more thought than just looking at how many CPUs there are on the data warehouse's server. I've seen several data warehousing projects with 32 “64 CPUs and a parallel degree setting equal to the CPU count ”and these systems are performing like dogs. Once I ask them how many concurrent users they have and inquire about their business intelligence tool's process architecture, I find that either or both the concurrent user count and concurrent query process count are greater than the number of CPUs. We then generally correct their performance problem by merely reducing the parallel degree ”sometimes all the way back down to one (i.e., serial). I see this problem all the time, especially on smaller servers with just 4 “16 CPUs.

So the logical question at this point is: Is there some way to quantify this need? If we assume that the effective throughput per CPU is roughly eight times that of the disk subsystem (i.e., one CPU can generally saturate eight cached disks) and that we must have at least four CPUs in a data warehouse, we arrive at the following equation:

No. of CPUs = Max (Concurrent Users * Avg. Concurrent Queries per User * PQO Degree/8, 4)

The results are shown in Figure 3-1.

Figure 3-1. Example Parallel Query CPU Usage

So what does this graph show us? Well, even for parallel degree one (i.e., serial query processing), we need to double our CPU count at 64 concurrent queries and double it yet again at 128 concurrent queries. Remembering our earlier comments about business intelligence users, their concurrent reports, and concurrent queries per report, 128 queries could well be just one business intelligence user!

Hopefully, the results in Figure 3-1 make it painfully clear that scalable, multi-processor hardware is truly required. Most data warehouses will have lots of concurrent users and queries, plus utilize PQO, so a hardware platform that that can scale up to 32 processors is a genuine minimum requirement ”and the ability to scale to 64 or more CPUs is a definite plus. An overview of available parallel architectures is shown in Table 3-1.

Table 3-1. Common Parallel Hardware Architectures

Architecture

CPU Limit

Shared Memory

Shared Disks

OS Instances

PQO

OPS

SMP

32

Yes

Yes

1

Preferred

Unnecessary

SMP + Crossbar

106

Yes

Yes

1

Preferred

Unnecessary

NUMA

64

Yes

Yes

1

Preferred

Unnecessary

Cluster

1024

No

Yes

No. of Nodes

Preferred

Preferred

MPP

4096

No

No

No. of CPUs

Required

Required

The first architecture to consider is SMP, shown in Figure 3-2. SMP is essentially a uniprocessor architecture with multiple CPUs, all sharing memory and disk. The advantages of SMP include:

  • Proven, reliable technology (has been around for years )

  • Wide vendor selection (most offer SMP servers)

  • Easy to set up, manage, and upgrade (single OS)

  • Excellent performance within scalability range

  • No special programming methods or tools required

Figure 3-2. Typical SMP Hardware Architecture

The primary disadvantage is that SMP servers are generally limited to 32 processors. This limitation eliminates SMP servers as a solution for highly concurrent and parallel data warehouse implementations (i.e., the bottom right corner of Table 3-1). The reason is that SMP servers saturate the system bus as they increase the number of CPUs. The SMP server must maintain inter-CPU cache consistency plus perform all memory, disk, and peripheral operations over a single high-speed system bus. Moreover, system bus length versus speed limitation comes into play.

This does not mean that SMP servers are a bad choice. If you know your needs or budget will not exceed 32 CPUs, SMP is a great choice. SMP is like the perfect work car: common, inexpensive, reliable, economical, and fast ”all at the same time.

The second architecture to consider is SMP with a crossbar that interconnects topologies. In this architecture, every board is directly connected to every other board. Hence, no interconnect requests have to share the same bus as with traditional SMP machines. Thus, these boxes can scale to more CPUs while at the same time maintaining a uniform memory access time. In essence, such machines are really second-generation SMP machines. You get all the benefits of SMP plus more CPUs. These platforms are a great choice.

The third architecture to consider is the non-uniform memory access (NUMA) architecture shown in Figure 3-3. Here nodes have one or more local processor groups connected via a high-speed interconnect, and each processor group implements a portion of the single memory address space and common disk pool. The key advantages of the NUMA architecture include:

  • Reliability similar to SMP architecture

  • More scalable than SMP architecture

  • Easy to set up, manage, and upgrade (single OS)

  • Performance approaching SMP architecture

  • No special programming methods or tools required

Figure 3-3. Typical NUMA Hardware Architecture

NUMA servers have few disadvantages. There are only a few vendors offering NUMA. The CPU count must be increased in increments of the group size, typically four. Current NUMA machines generally top out at 64 CPUs (i.e., 16 groups of 4). Also note that some vendors provide external interconnect devices to link multiple SMP boxes into a pseudo-NUMA architecture machine, validating that NUMA is basically a better SMP than SMP.

The fourth architecture to consider is clustering uniprocessor, SMP, or NUMA machines. Figure 3-4 demonstrates two clustered SMP machines. A cluster is essentially a loosely coupled network of autonomous nodes, each with its own CPU, memory, and OS, but with specialized hardware to provide a common pool of disks to all nodes in the system. The advantages of clustering include:

  • Proven technology (introduced mid-1980s by DEC with VAX VMS clusters)

  • Wide vendor selection (many offer clustering for their SMP or NUMA servers)

  • High availability, or HA (eliminates any individual node as a single point of failure)

Figure 3-4. Typical Clustered SMP Architecture

However, there are some drawbacks:

  • Requires OPS to utilize all resources

  • Less reliable due to hardware and software coordination issues

  • Difficult to set up, manage, and upgrade (multiple OS and OPS instances)

  • Poor performance scalability due to multiple operating systems, OPS, and shared disk overheads

  • Long database startups and recoveries as only one node can do instance recovery

Nonetheless, clustering SMP and NUMA machines is quite popular.

A final architecture to consider is MPP, shown in Figure 3-5. MPP is essentially a very tightly coupled network of autonomous nodes, each with its own CPU, memory, disk, and OS, and with software to make all the disks available to every node in the system. The primary advantage is that MPP servers generally scale higher than any other parallel architecture, with configurations as high as 4096 CPUs.

Figure 3-5. Typical MPP Hardware Architecture

But, MPP servers have many disadvantages, including:

  • Require OPS to utilize all resources

  • Less reliable due to hardware and software coordination issues

  • Narrower vendor selection (few offer MPP servers)

  • Hard to set up, manage, and upgrade (multiple OS and OPS instances)

  • Poor performance scalability due to multiple operating systems, OPS, and shared disk overheads

  • Complex database design for reducing interconnect data traffic utilizing disk affinity

  • Long database startups and recoveries as only one node can do instance recovery



Oracle DBA Guide to Data Warehousing and Star Schemas
Oracle DBA Guide to Data Warehousing and Star Schemas
ISBN: 0130325848
EAN: 2147483647
Year: 2003
Pages: 79
Authors: Bert Scalzo

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net