Ugly Updates

   

Microsoft and Sybase have trouble when a data-change statement causes row movement (what they call an out-of-place update ). All DBMSs have trouble when a data-change statement causes temporary violation of UNIQUE constraints. Ugly updates won't cause errors, but they are markedly slower than "regular" updates. For example, the Big Eight take about 50% longer (on average) to do an out-of-place update than a regular data-change statement.

The marks of an ugly update vary depending on DBMS, and even on DBMS version. We have gathered together a list based on vendor recommendations. Because the list is a composite, not all items on it will apply to any single DBMS. Take it as a weak set of guidelines. [2]

[2] These guidelines do not apply for InterBase, because InterBase stores a record of the difference (a "delta" record) instead of just replacing the original data.

  • A data change should affect only one row if a UNIQUE-indexed column is changed. A well-known problem with UNIQUE-indexed columns can be seen in this statement:

     UPDATE Table1   SET unique_column = unique_column + 1 

    If the current values in unique_column are {1, 2} and the DBMS updates only one row at a time, then after the first row is changed, the values in unique_column will be {2, 2} a violation of the UNIQUE constraint. To avoid this situation, some DBMSs will do the UPDATE in two separate steps: First the DBMS will DELETE all the old values, then it will INSERT all the new values. That's a solid solution but rather slow. And it's unnecessary if the UPDATE statement contains a clause that makes it clear there's only one row to change, as in this example:

     UPDATE Table1   SET unique_column = unique_column + 1   WHERE unique_column = 55 

    Microsoft and Sybase call this the "deferred update" and also apply it for this situation:

     INSERT INTO Table1   SELECT * FROM Table1 
  • A data change should not cause changes to any variable-length columns. Even if the new value is shorter or the same length as the old value, the DBMS will adopt a pessimistic strategy on the assumption that a migration could occur. For example, Microsoft and Sybase will treat the change as an out-of-place update. However, if the total row size ends up shorter after the data change, the effects are less bad.

  • There should be no INSERT/UPDATE/DELETE trigger at all, or at least there should be no trigger that affects the same row that's being changed. The problem is that data-change triggers need "before" and "after" copies of the data, and the DBMS has to produce extra information in the log file so that the "before" and "after"copies contain static information.

  • The changed column should not be in a primary key. That's especially so if the primary key is the cluster key for a clustered index.

  • There should be no replication.

  • There should be only one table in the data-change statement. Watch out for views containing subqueries that refer to other tables.

  • When several columns are changed, they should be contiguous or nearly contiguous. Remember that columns are not always in the same order that you specify in the CREATE TABLE definition, because some DBMSs shift variable-length columns to the end of the row.

  • The WHERE clause of an UPDATE statement should not contain a reference to a column that's changed in the SET clause. That is, avoid statements like:

     UPDATE Table1   SET indexed_column = indexed_column + 1   WHERE indexed_column >= 0 

    Such statements can confuse a DBMS that retrieves and processes rows in one pass, instead of getting the entire set of matching rows before processing the first one. Consider what would happen with this example if all rows contain the value in indexed_column .

  • The SET clause of an UPDATE statement should not change columns that have just been used in a SELECT statement. SELECT statements are pooled in a global area (for example, in Oracle's SGA) so that repetitions of the same SELECT can bypass the optimizerthe DBMS just passes back the same result set. The updating of critical columns invalidates this plan.

In closing, let's look at a "tip" we've seen that is a really bad idea. Consider this SQL statement:

 UPDATE Table1   SET column1 = column1   WHERE column2 = 5 

The trick in this statement is that you can get a count of the rows WHERE column2 = 5 because data-change statements always return a count of the number of rows affected. But it's smarterand quickerto use good old COUNT(*).

The Bottom Line: Ugly Updates

Put relatedand frequently donedata-change statements into triggers and/or stored procedures.

A data change should affect only one row if a UNIQUE-indexed column is changed.

A data change should not cause changes to any variable-length columns.

A data change should not affect a primary key.

When you're doing a data change, there should be no INSERT/UPDATE/DELETE trigger at all, or at least no trigger should affect the same row that's being updated.

There should be no replication.

Only one table should be referenced in a data-change statement.

When several columns are updated, they should be contiguous or nearly contiguous.

The WHERE clause of an UPDATE statement should not contain a reference to a column that's changed in the SET clause.

An UPDATE SET clause should not change columns that have just been used in a SELECT statement.

   


SQL Performance Tuning
SQL Performance Tuning
ISBN: 0201791692
EAN: 2147483647
Year: 2005
Pages: 125

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net