It is facile to think of a server as a machine with a DBMS program running on it. Technically, that statement is true, but we really must stretch the definition of "a machine" and the definition of "a program" before it becomes true. All large DBMSs are oligolithic (from the Greek for "a few stones"). At any moment there are several processes or threads running simultaneously . (In this chapter, we'll say "process/thread" to mean both processes and threads.) There is a difference between a process and a thread.
The result is that a process tends to be large (a few megabytes) while a thread is smaller (a few hundred kilobytes). On the other hand, processes tend to be safer because they are less able to interfere with one another. Different DBMS vendors have different preferences. For example IBM, Informix, and Oracle favor processes, while Microsoft and Sybase favor threads (or even fibers, which are very thin threads on MS Windows NT systems). The larger DBMSs can run most or all of the following types of processes or threads. System Processes/Threads
Per-Client Processes/ThreadsThe temporary threads come into existence during execution of a single SQL statement. The threads run in parallel. Each thread works on a completely separate area of memory and disk. (If it were otherwise then threads would have to constantly synchronize with each other and there would be no point in assigning parallel threads.) Teradata's list of possible threads, which is larger than most DBMSs mention, is as follows :
Multi-thread architecture involves tremendously important considerations.
Remember that there are two ways to measure performance: response time (the time between arrival and departure of a particular statement) and throughput (the number of statements that the system can handle in a unit of time). Multi-threading is less helpful for throughput than for response time. This becomes obvious if the system is near saturation. For example, suppose one user does an INSERT. Because different parts of the statement are parallelized, the INSERT will be done more quickly, and response time improves . But now suppose that two users do a similar INSERT simultaneously. The statements will block one another with locking, will both try to get time-slices on the same CPU with their multifarious threads, and will both have to queue for the system processes like the Scheduler or the Commit Writer, thus throughput does not improve.
Bottom line: Organize so that multi-threading is possible, but don't hope it will make things better on an overloaded system. The most important thing is to minimize synchronization. That entails separating parts of statements or parts of databases, so that different threads don't need to access the same thing or to communicate with one another. And if your DBMS is a process-per-client rather than a thread-per-client system, it behooves you to disconnect when you're done so the server won't waste space on you. Separateness and ParallelismWe said earlier that there can be no parallel threading if there is no separateness. Separateness usually means partitioning. A partition is a part of a database that consists of its own data and indexes (and often, its own configuration files and transaction logs). Tables can be located in one or more database partitionsthat is, a partitioned table can have some of its rows stored in one partition and other rows stored in other partitions. Because data is divided across partitions, you can use the power of multiple processors to execute SQL statements. The DBMS will automatically decompose SELECT and data-change statements into subrequests, and execute the subrequests in parallel among the applicable partitions. The fact that a table is split across multiple partitions is transparent to users issuing SQL statements. Executing the components of a task (e.g., a SELECT statement) in parallel can enhance performance dramatically. The main types of parallelism are I/O and Query.
For an example of how to ensure you're getting the most out of your DBMS's parallelism abilities , consider how it's done with IBM.
The Bottom Line: Server Processes and ThreadsProcesses have their own memory. A process tends to be large (a few megabytes). Threads share the same address space. A thread tends to be small (a few hundred kilobytes). Processes tend to be safer than threads, because they are less able to interfere with one another. There can be no parallel threading if there is no separateness. That means it's important to partition your data and follow normalization rules. There can be no huge benefit from parallel threading if there is only one CPU. There is no way to predict response time based on the number of rows or the number of steps in the statement process. There are two ways to measure performance: response time (the time between arrival and departure of a particular statement) and throughput (the number of statements that the system can do in a unit of time). Multi-threading is less helpful for throughput than for response time. Organize your work so that multi-threading is possible, but don't hope it will make things better on an overloaded system. The most important thing is to minimize synchronization. Separate parts of statements or parts of databases, so that different threads don't need to access the same thing or communicate with each other. If your DBMS is a process-per-client rather than a thread-per-client system, disconnect when you're done so the server won't waste space on you. |