Application and Database Design

The biggest performance gains come from changes to the application and database design. You might change your configuration settings and add heftier hardware and be thrilled when performance doubles, but changes to the application can often result in even larger performance increases. There are as many approaches to software development as there are pages in this book. No single approach is the right approach—yours must be tailored to the size of the project, your team, and the skill level of the team members.

Take a look at the following list of suggestions for planning and implementing good performance in your system. I'll explain most of these items in detail in this chapter and in Chapter 17. (Chapter 4 discusses hardware.)

  • Develop expertise on your development team.
  • Understand that there is no substitute for solid application and database design.
  • State performance requirements for peak, not average, use.
  • Consider perceived response time for interactive systems.
  • Prototype, benchmark, and test throughout the development cycle.
  • Create useful indexes.
  • Choose appropriate hardware.
  • Use cursors judiciously.
  • Use stored procedures almost all of the time.
  • Minimize network round-trips.
  • Understand concurrency and consistency tradeoffs.
  • Analyze and resolve locking (blocking) problems.
  • Analyze and resolve deadlock problems.
  • Monitor and tune queries using SQL Profiler.
  • Monitor Microsoft Windows 2000 system using System Monitor.
  • Review and adjust operating system settings.
  • Review and adjust SQL Server configuration settings.
  • Make only one change at a time, and measure its effect.
  • Do periodic database maintenance.

Normalize Your Database

I assume that if you're reading this book, you understand the concept of normalization and terms such as third normal form. (If you don't, see Candace Fleming and Barbara von Halle's Handbook of Relational Database Design and Michael Hernandez's Database Design for Mere Mortals. A plethora of other books about database design and normalization are also available.)

A normalized database eliminates functional dependencies in the data so that updating the database is easy and efficient. But querying from that database might require a lot of joins between tables, so common sense comes into play. If the most important and time-critical function your system must perform is fast querying, it often makes sense to back off from a normalized design in favor of one that has some functional dependencies—that is, one that is not in third normal form or higher. Think of normalization as typically being good for updating but potentially bad for querying. Start with a normalized design and then look at all the demands that will be placed on the system.

NOTE


There really isn't a binary concept of being "normalized" or "not normalized." There are only degrees of normalization. It is common to refer to a database that is at least in third normal form as normalized and to refer to a database at a lower level of normalization as unnormalized or denormalized. To keep the discussion simple, I'll use the terms in that way, as imprecise as that might be. Also note that the terms unnormalized and denormalized have slightly different meanings. An unnormalized database is one that has never been normalized. A denormalized database is one that was normalized at some point, but for specific performance-related reasons the design was backed down from the normalized version. My apologies to those who make a living doing entity-relationship diagrams and are horrified by the loose use of these terms.

If you understand the data elements you need to record and you understand data modeling, producing a normalized design is not difficult. But it might take some time to learn about the way a business operates. If you already understand the underlying processes to be modeled and know how to do data modeling, the mechanics of producing a normalized database design are quite straightforward.

Once you produce a normalized design, which you can also think of as the logical design, you must decide if you can implement the design nearly "as is" or if you need to modify it to fit your performance characteristics. A lot of people have trouble with this. Rather than try to articulate specific performance characteristics, they generalize and strive for "as fast as possible" or "as many users as we can handle." Although goals can be difficult to articulate precisely, you should at least set relative goals. You should understand the tradeoffs between update and query performance, for example. If a salesperson must call up all of a customer's records while the customer is waiting on the phone, that action should be completed within a few seconds. Or if you want to run a bunch of batch processes and reports for your manufacturing operation each night and you have a window of four hours in which to do it, you have a pretty clear objective that must be met.

Evaluate Your Critical Transactions

One thing that you should do immediately is look at your critical transactions—that is, transactions whose performance will make or break the system. (In this context, I use the term transaction loosely, to mean any operation on the database.) Which tables and joins will be required for your critical transactions? Will data access be straightforward or complicated?

For example, if it is imperative that a given query have less than a 2-second response time but your normalized design would require a seven-way join, you should look at what denormalizing would cost. If tables are properly indexed, the query is well qualified, the search parameters are quite selective, and not a lot of data needs to be returned, the quick response might be possible. But you should note any seven-way joins and consider other alternatives. You should probably look for alternatives any time you get beyond a four-way join.

In our example, you might decide to carry a little redundant information in a couple of tables to make it just a three-way join. You'll incur some extra overhead to correctly update the redundant data in multiple places, but if update activity is infrequent or less important and the query performance is essential, altering your design is probably worth the cost. Or you might decide that rather than compute a customer's balance by retrieving a large amount of data, you can simply maintain summary values. You can use triggers to update the values incrementally when a customer's records change. For example, you can take the old value and add to or average it but not compute the whole thing from scratch each time. When you need the customer balance, it is available, already computed. You incur extra update overhead for the trigger to keep the value up-to-date, and you need a small amount of additional storage.

Proper indexes are extremely important for getting the query performance you need. But you must face query-vs.-update tradeoffs similar to those described earlier because indexes speed up retrieval but slow down updating. Chapter 9 explains the extra work required when your updates require index maintenance. You might want to lay out your critical transactions and look for the likely problems early on. If you can keep joins on critical transactions to four tables or less and make them simple equijoins on indexed columns, you'll be in good shape.

None of these considerations is new, nor are they specific to SQL Server. Back in the mainframe days, there was a technique known as "completing a CRUD chart." CRUD stands for Create-Retrieve-Update-Delete. In SQL, this would translate as ISUD—Insert-Select-Update-Delete. Conceptually, CRUD is pretty simple. You draw a matrix with critical transactions on the vertical axis and tables with their fields on the horizontal axis. The matrix gets very big very quickly, so creating it in Microsoft Excel or in your favorite spreadsheet program can be helpful. For each transaction, you note which fields must be accessed and how they will be accessed, and you note the access as any combination of I, S, U, or D, as appropriate. You make the granularity at the field level so you can gain insight into what information you want in each table. This is the information you need if you decide to carry some fields redundantly in other tables to reduce the number of joins required. Of course, some transactions require many tables to be accessed, so be sure to note whether the tables are accessed sequentially or via a join. You should also indicate the frequency and time of day that a transaction runs, its expected performance, and how critical it is that the transaction meet the performance metric.

How far you carry this exercise is up to you. You should at least go far enough to see where the potential hot spots are for your critical transactions. Some people try to think of every transaction in the system, but that's nearly impossible. And what's more, it doesn't matter: only a few critical transactions need special care so they don't lead to problems. For example, if you have to do frequent select operations simultaneously on the tables that are updated most often, you might be concerned about locking conflicts. You need to consider what transaction isolation level to use and whether your query can live with Read Uncommitted and not conflict with the update activity. In addition, you shouldn't worry much about such things as noncritical reports that run only during off-hours.

If you're doing complex joins or expensive aggregate functions—SUM, AVG, and so on—for common or critical queries, you should explore techniques such as the following and you should understand the tradeoffs between query performance improvement and the cost to your update processes:

  • Add logically redundant columns to reduce the number of tables to be joined.
  • Use triggers to maintain aggregate summary data, such as customer balances, the highest value, and so forth. Such aggregates can usually be incrementally computed quickly. The update performance impact can be slight, but the query performance improvement can be dramatic.
  • Define indexes on computed columns to maintain calculated data within a single table.
  • Create indexed views to automatically maintain aggregated data or data from multiple tables that might need to be joined together. Just as with triggers, the update performance impact can be minimal, but the query performance improvement can be dramatic.

Keep Table Row Lengths and Keys Compact

When you create tables, you must understand the tradeoffs of using variable-length columns. (See Chapter 6.) As a general rule, data with substantial variance in the actual storage length is appropriate for variable-length columns. Also remember that the more compact the row length, the more rows will fit on a given page. Hence, a single I/O operation with compact rows is more efficient than an I/O operation with longer row lengths—it returns more rows and the data cache allows more rows to fit into a given amount of memory. In addition, if a variable-length column starts out with a small size and an UPDATE causes it to grow, the new size can end up causing the page to split, which can lead to extra overhead.

As with tables, when you create keys you should try to make the primary key field compact because it frequently occurs as a foreign key in other tables. If no naturally compact primary key exists, you might consider using an identity or uniqueidentifier column as a surrogate. And recall that if the primary key is a composite of multiple columns, the columns are indexed in the order that they are declared to the key. The order of the columns in the key can greatly affect how selective, and hence how useful, the index is.

Your clustered key should also be as compact as possible. If your clustered key is also your primary key (which is the default when you declare a PRIMARY KEY constraint), you might already have made it compact for the reason mentioned above. There are additional considerations for your clustered key because SQL Server automatically keeps the clustered key in all nonclustered indexes, along with the corresponding nonclustered key. For example, if your clustered index is on zipcode and you have a nonclustered index on employee_id, every row in the nonclustered index stores the corresponding zipcode value along with the employee_id value. I discussed the structure of indexes in Chapters 3 and 8, and I'll look at the topic again later in this chapter when I discuss how to choose the best indexes.

Occasionally, a table will have some columns that are infrequently used or modified and some that are very hot. In such cases, it can make sense to break the single table into two tables; you can join them back together later. This is kind of the reverse of denormalization as you commonly think of it. In this case, you do not carry redundant information to reduce the number of tables; instead, you increase the number of tables to more than are logically called for to put the hot columns into a separate, narrower table. With the more compact row length, you get more rows per page and potentially a higher cache-hit ratio. As with any deviation from the normalized model, however, you should do this only if you have good reason. After you complete a CRUD chart analysis, you might see that while your customer table is frequently accessed, 99 percent of the time this access occurs just to find out a customer's credit balance. You might decide to maintain this balance via a trigger rather than by recomputing it each time the query occurs. Information such as customer addresses, phone numbers, e-mail addresses, and so on are large fields that make the table have a wide row length. But not all that information is needed for critical transactions—only the customer balance is needed. In this case, splitting the table into two might result in the difference between fitting, say, 150 rows on a page instead of only 2 or 3 rows. A narrower table means a greater likelihood that the customer balance can be read from cache rather than by requiring physical I/O.



Inside Microsoft SQL Server 2000
Inside Microsoft SQL Server 2000
ISBN: 0735609985
EAN: 2147483647
Year: 2005
Pages: 179
Authors: Kalen Delaney

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net