Section 10.1. Increasing Volumes

10.1. Increasing Volumes

Some applications see the volume of data they handle increase in considerable proportion over time. In particular, any application that requires keeping online, for regulatory or business analysis purposes, several months or even years of mostly inactive data, often passes through phases of crisis when (mostly) batch programs tend to overshoot the time allocated to them and interfere with regular, human activity.

When you start a new project, the volume of data usually changes, as shown in Figure 10-1. Initially, hardly anything other than a relatively small amount of reference data is loaded into the database. As a new system replaces an older one, data inherited from the legacy system is painfully loaded into the new one. First, because of the radical rethink of the application, conversion from the old system to the new system is fraught with difficulties. When deadlines have to be met and some noncritical tasks have to be postponed, the recovery of legacy data is a prime candidate for slipping behind schedule. As a result, this recovery goes on for some time after the system has become operational and teething problems have been solved. Second, the old system is usually much poorer from a functional perspective than the new one (otherwise, the cost of the new project would have made for difficult acceptance up the food chain). All this means that the volume of prehistoric data will be rather small compared to the data handled by the new system, and several months' worth of old data will probably be equivalent to a few weeks of new data at most.

Meanwhile, operational data accumulates.

Usually, one encounters the first serious performance issues about midway before the volume that the database is expected to hold at cruising speed. Bad queries and bad algorithms are almost invisible, from an end-user perspective, when volumes are low or moderate. The raw power of hardware often hides gigantic mistakes and may give comfortable sub-second response times for full scans of tables that contain several hundreds of thousands of rows. You may be seriously misusing the hardware, balancing gross programming mistakes with powerbut nobody will see that until your volume becomes respectable.

At the first crisis point of the project, "expert tuning" is usually required to add a couple of indexes that should have been there from the start. The system then wobbles until it

Figure 10-1. The evolution of data in a new application

reaches the target volume. There are usually two target volumes: a nominal one (which has been grossly overestimated and which is the volume the system has officially been designed to manage) and the real target volume (which the system just manages to handle and which is often exceeded at some point because archiving of older data has been relegated to the very last lot in the project). The second and more serious crisis often comes in the wake of reaching that point. When archival has finally been put into production, architectural weaknesses reviewed, and some critical processes vigorously rewritten, the system finally reaches cruising speed, with an increase of data related to the natural growth of businessa growth that can lie anywhere between flatness and exponential exuberance.

This presentation of the early months in the life of a new database application is partly caricature; but it probably bears more resemblance to reality than it often should, because the simple mistakes that lead to this caricature are not often avoided. However rigorously one tries to work, errors are made, because of pressure, lack of time for adequate testing, and ambiguous specifications. The only errors that can bring irredeemable failure are those linked to the design of the database and to the choice of the global architecturetwo topics that are closely related and that are the foundation of a system. If the foundation is not sturdy enough, you need to pull the whole building down before reconstructing. Other mistakes may require a more or less deep overhaul of what is in place. Most crises, however, need not happen. You must anticipate volume increases when coding. And you must quickly identify and rewrite a query that deteriorates in performance too quickly in the face of increasing data volumes.

10.1.1. Sensitivity of Operations to Volume Increases

All SQL operations are not equally susceptible to variations in performance when the number of rows to be processed increases. Some SQL operations are insensitive to volume increases, some see performance decrease linearly with volume, and some perform very badly with large volumes of data.

10.1.1.1. Insensitivity to volume increase

Typically, there will be no noticeable difference in a search on the primary key, whether you are looking for one particular key among 1,000 or one among 1,000,000. The common B-tree indexes are rather flat and efficient structures, and the size of the underlying table doesn't matter for a single-row, primary-key search.

But insensitivity to volume increase doesn't mean that single primary-key searches are the ultimate SQL search method. When you are looking for a large number of rows, the "transactional" single-row operation can be significantly inefficient. Just consider the following, somewhat artificial, Oracle examples, each showing a range scan on a sequence-generated primary key:

     SQL> declare      2  n_id         number;      3 cursor c is select customer_id      4 from orders      5 where order_id between 10000 and 20000;      6 begin      7 open c;      8 loop      9 fetch c into n_id;      10 exit when c%notfound;      11 end loop;      12 close c;      13 end;      14 /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.27

     SQL> declare      2  n_id     number;      3 begin      4 for i in 10000 .. 20000      5 loop      6 select customer_id      7 into n_id      8 from orders      9 where order_id = i;      10  end loop;      11 end;      12 /     PL/SQL procedure successfully completed.    Elapsed: 00:00:00.63

The cursor in the first example, which does an explicit range scan, runs twice as fast as the iteration on a single row. Why? There are multiple technical reasons ("soft parsing," a fast acknowledgement at each iteration that the DBMS engine has already met this statement and knows how to execute it, is one of them), but the single most important one is that in the first example the B-tree is descended once, and then the ordered list of keys is scanned and row addresses found and used to access the table; while in the second example, the B-tree is descended for each searched value in the order_id column. The most efficient way to process a large number of rows is not to iterate and apply the single-row process.

10.1.1.2. Linear sensitivity to volume increases

End users usually understand well that if twice as many rows are returned, a query will take more time to run; but many SQL operations double in time when operating on double the number of rows without the underlying work being as obvious to the end user, as in the case of a full table scan returning rows one after the other. Consider the case of aggregate functions; if you compute a max( ), that aggregation will always return a single row, but the number of rows the DBMS will have to operate on may vary wildly over the life of the application. Perfectly understandable, but end users will always see a single-row returned, so they may complain of performance degradation over time. The only way to ensure that the situation will not go from bad to worse is to put an upper bound on the number of rows processed by using another criterion such as a date range. Placing an upper bound keeps data volumes under control. In the case of max( ), the idea might be to look for the maximum since a given date, and not necessarily since the beginning of time. Adding a criterion to a query is not a simple technical issue and definitely depends on business requirements, but limiting the scope of queries is an option that certainly deserves to be pointed out to, and debated with, the people who draft specifications.

10.1.1.3. Non-linear sensitivity to volume increases

Operations that perform sorts suffer more from volume increases than operations that just perform a scan, because sorts are complex and require on average a little more than a single pass. Sorting 100 randomly ordered rows is not 10 times costlier than sorting 10 rows, but about 20 times costlierand sorting 1,000 rows is, on average, something like 300 times costlier than sorting 10 rows.

In real life, however, rows are rarely randomly stored, even when techniques such as clustering indexes (Chapter 5) are not used. A DBMS can sometimes use sorted indexes for retrieving rows in the expected order instead of sorting rows after having fetched them, and performance degradation resulting from retrieving a larger sorted set, although real, is rarely shocking. Be careful though. Performance degradation from sorts often proceeds by fits and starts, because smaller sorts will be fully executed in memory, while larger sorts will result from the merge of several sorted subsets that have each been processed in memory before being written to temporary storage. There are, therefore, some "dangerous surroundings" where one switches from a relatively fast full-memory mode to a much slower memory-plus-temporary-storage mode. Adjusting the amount of memory allocated to sorts is a frequent and efficient tuning technique to improve sort-heavy operations when flirting with the dangerous limit.

By way of example, Figure 10-2 shows how the fetch rate (number of rows fetched per unit of time) of a number of queries evolves as a table grows. The table used in the test is a very simple orders table defined as follows:

     order_id          bigint(20) (primary key )     customer_id      bigint(20)     order_date       datetime     order_shipping   char(1)     order_comment    varchar(50)

The queries are first a simple primary key-based search:

     select order_date     from orders     where order_id = ?

then a simple sort:

     select customer_id     from orders     order by order_date

then a grouping:

     select customer_id, count(*)     from orders     group by customer_id     having count(*) > 3

then the selection of the maximum value in a nonindexed column:

     select max(order_date)     from orders

and finally, the selection of the "top 5" customers by number of orders:

     select customer_id     from (select customer_id, count(*)           from orders           group by customer_id           order by 2 desc) as sorted_customers     limit 5

(SQL Server would replace the closing limit 5 with an opening select top 5, and Oracle would replace limit 5 with where rownum <= 5.)

The number of rows in the table has varied between 8,000 and around 1,000,000, while the number of distinct customer_id values remained constant at about 3,000. As you can see in Figure 10-2, the primary key search performs almost as well with one million rows as with 8,000. There seems to be some very slight degradation at the higher number, but the query is so fast that the degradation is hardly noticeable. By contrast, the sort suffers. The performance (measured by rows returned by unit of time, and therefore independent of the actual number of rows fetched) of the sorting query decreases by 40% when the number of rows goes from 8,000 to over one million.

The degradation of performance, though, is even more noticeable for all the queries that, while returning the very same number of aggregated rows, have a great deal more rows to visit to get the relatively few rows to be returned. These queries are typically the type of queries that are going to draw the most complaints from end users. Note that the DBMS doesn't perform that badly: the performance decrease is very close to proportional to the number of rows, even for the two queries that require a sort (the queries labeled "Group by" and "Top" in Figure 10-2). But end users simply see the same amount of datajust returned much more slowly.

Figure 10-2. How some simple queries behave when the queried table grows

All database operations are not equally sensitive to volume increases. Anticipate how queries will perform on target volumes.

10.1.1.4. Putting it all together

The main difficulty in estimating how a query will behave when data volumes increase is that high sensitivity to volume may be hidden deep inside the query. Typically, a query that finds the "current value" of an item by running a subquery that looks for the last time the price was changed, and then performs a max( ) over the price history, is highly sensitive. If we accumulate a large number of price changes, we shall probably suffer a slow performance degradation of the subquery, and by extension of the outer query as well. The degradation will be much less sensitive with an uncorrelated subquery, executed only once, than with a correlated subquery that will compound the effect by its being fired each time it is evaluated. Such degradation may be barely perceptible in a single-item operation, but will be much more so in batch programs.

The situation will be totally different if we are tracking, for instance, the current status of orders in a merchant system, because max( ) will apply to a narrow number of possible states. Even if the number of orders doubles, max( ) will in that case always operate on about the same number of rows for one order.

Another issue is sorts. We have seen that an increase in the number of rows sorted leads to a quite perceptible degradation of performance. Actually, what matters is not so much the number of rows proper as the number of bytesin other words, the total amount of data to be sorted. This is why joins with what is mostly informational data, such as user-friendly labels associated with an obscure code (as opposed to the data involved in the filtering conditions driving the query), should be postponed to the very last stage of a query.

Let's take a simple example showing why some joins should be delayed until the end of a query. Getting the names and addresses of our 10 biggest customers for the past year will require joining the orders and order_detail tables to get the amount ordered by each customer, and joining to a customers table to get each customer's details. If we only want to get our 10 biggest customers, we must get everybody who has bought something from us in the past year, sort them by decreasing amount, and then limit the output to the first ten resulting rows. If we join all the information from the start, we will have to sort the names and addresses of all our customers from the past year. We don't need to operate on such a large amount of data. What we must do is keep the amount of data to be sorted to the strict minimumthe customer identifier and the amount. Once everything is sorted, we can join the 10 customer_ids we are left with to the customers table to return all the information that is required. In other words, we mustn't write something like:

     select *     from (select c.customer_name,                  c.customer_address,                  c.customer_postal_code,                  c.customer_state,                  c.customer_country                  sum(d.amount)           from customers c,                orders_o,                order_detail d           where c.customer_id = o.customer_id             and o.order_date >= some date expression             and o.order_id = d.order_id           group by c.customer_name,                    c.customer_address,                    c.customer_postal_code,                    c.customer_state,                    c.customer_country            order by 6 desc) as A     limit 10

but rather something like:

     select c.customer_name,            c.customer_address,            c.customer_postal_code,            c.customer_state,            c.customer_country            b.amount     from (select a.customer_id,                  a.amount           from (select o.customer_id,                        sum(d.amount) as amount                 from orders_o,                       order_detail d                 where o.order_date >= some date expression                   and o.order_id = d.order_id                 group by o.customer_id                 order by 2 desc) as a           limit 10) as b,           customers c     where c.customer_id = b.customer_id     order by b.amount desc

The second sort is a safeguard in case the join modifies the order of the rows resulting from the inner subquery (remember that relational theory knows nothing about sorts and that the DBMS engine is perfectly entitled to process the join as the optimizer finds most efficient). We have two sorts instead of one, but the inner sort operates on "narrower" rows, while the outer one operates on only 10 rows.

Remember what was said in Chapter 4: we must limit the "thickness" of the non-relational layer of SQL queries. The thickness depends on the number and complexity of operations, but also on the amount of data involved. Since sorts and limits of all kinds are non-relational operations, the optimizer will probably not rewrite a query to execute a join after having cut the number of customer identifiers to the bare minimum. Although an attentive reading of two queries may make it obvious that they will return the same result, mathematically proving that they always return the same result borders on mission impossible. An optimizer always plays it safe; a DBMS cannot afford to return wrong results by attempting daring rewrites, especially since it knows hardly anything about semantics. Our example is therefore a case in which the optimizer will limit its action to perform the join in inner queries as efficiently as possible. But ordering and aggregates put a stop to mingling inner and outer queries, and therefore the query will for the most part run as it is written. The query that performs the sort of amounts before the joins is, no doubt, very ugly. But this ugly SQL code is the way to write it, because it is the way the SQL engine should execute it if we want resilience to a strong increase in the number of customers and orders.

To reduce the sensitivity of your queries to increases in the volume of data, operate only on the data that is strictly necessary at the deeper levels of a query. Keep ancillary joins for the outer level.

10.1.1.5. Disentangling subqueries

As I have said more than once, correlated subqueries must be fired for each row that requires their evaluation. They are often a major issue when volume increases transform a few shots into sustained rounds of fire. In this section, a real-life example will illustrate both how ill-used correlated subqueries can bog a process down and how one can attempt to save such a situation.

The issue at hand, in an Oracle context, is a query that belongs to an hourly batch to update a security management table. Note that this mechanism is already in itself a fudge to speed up security clearance checks on the system in question. Over time, the process takes more and more time, until reaching, on the production server, 15 minuteswhich for an hourly process that suspends application availability is a bit too much. The situation sends all bells ringing and all whistles blowing. Red alert!

The slowness of the process has been narrowed down to the following statement:

     insert /*+ append */ into fast_scrty      ( emplid,        rowsecclass,        access_cd,        empl_rcd,        name,        last_name_srch,        setid_dept,        deptid,        name_ac,        per_status,        scrty_ovrd_type)     select distinct            emplid,            rowsecclass,            access_cd,            empl_rcd,            name,            last_name_srch,            setid_dept,            deptid,            name_ac,            per_status,           'N'     from pers_search_fast

Statistics are up to date, so we must focus our attack on the query. As it happens, the ill-named pers_search_fast is a view defined by the following query:

     1 select a.emplid,     2        sec.rowsecclass,     3        sec.access_cd,     4        job.empl_rcd,     5        b.name,     6        b.last_name_srch,     7        job.setid_dept,     8        job.deptid,     9        b.name_ac,     10        a.per_status     11 from person a,     12      person_name b,     13      job,     14      scrty_tbl_dept sec     15 where a.emplid = b.emplid     16   and b.emplid = job.emplid     17   and (job.effdt=     18          ( select max(job2.effdt)     19            from job job2     20            where job.emplid = job2.emplid     21              and job.empl-rcd = job2.empl_rcd     22              and job2.effdt <= to_date(to_char(sysdate,     23                                               'YYYY-MM-DD'),'YYYY-MM-DD'))     24        and job.effseq =     25              ( select max(job3.effseq)     26                from job job3     27                where job.emplid = job3.emplid     28                  and job.empl_rcd = job3.empl_rcd     29                  and job.effdt = job3.effdt ) )     30   and sec.access_cd = 'Y'     31   and exists     32           ( select 'X'     33             from treenode tn     34             where tn.setid = sec.setid     35               and tn.setid = job.setid_dept     36               and tn.tree_name = 'DEPT_SECURITY'     37               and tn.effdt = sec.tree_effdt     38               and tn.tree_node = job.deptid     39               and tn.tree_node_num between sec.tree_node_num     40                                        and sec.tree_node_num_end     41               and not exists     42                    ( select 'X'     43                      from scrty_tbl_dept sec2     44                      where sec.rowsecclass = sec2.rowsecclass     45                        and sec.setid = sec2.setid     46                        and sec.tree_node_num <> sec2.tree_node_num     47                        and tn.tree_node_num between sec2.tree_node_num     48                                                 and sec2.tree_node_num_end     49                        and sec2.tree_node_num between sec.tree_node_num     50                                                   and sec.tree_node_num_end ))

This type of "query of death" is, of course, too complicated for us to understand at a glance! As an exercise, though, it would be interesting for you to pause at this point, consider carefully the query, try to broadly define its characteristics, and try to identify possible performance stumbling blocks.

If you are done pondering the query, let's compare notes. There are a number of interesting patterns that you may have noticed:

A high number of subqueries. One subquery is even nested, and all are correlated.
No criterion likely to be very selective. The only constant expressions are an unbounded comparison with the current date at line 22, which is likely to filter hardly anything at all; a comparison to a Y/N field at line 30; and a condition on tree_name at line 36 that looks like a broad categorization. And since the insert statement that has been brought to our attention contains no where clause, we can expect a good many rows to be processed by the query.
Expressions such as between sec.tree_node_num and sec.tree_node_num_end ring a familiar bell. This looks like our old acquaintance from Chapter 7, Celko's nested sets! Finding them in an Oracle context is rather unusual, but commercial off-the-shelf (COTS) packages often make admirable, if not always totally successful, attempts at being portable and therefore often shun the useful features of a particular DBMS.
More subtly perhaps, when we consider the four tables (actually, one of them, person_name, is a view) in the outer from clause, only three of them, person, person_name, and job, are cleanly joined. There is a condition on scrty_tbl_dept, but the join proper is indirect and hidden inside one of the subqueries, lines 34 to 38. This is not a recipe for efficiency.

One of the very first things to do is to try to get an idea about the volumes involved; person_name is a view, but querying it indicates no performance issue. The data dictionary tells us how many rows we have:

     TABLE_NAME                     NUM_ROWS     ------------------------------ ----------     TREENODE                         107831     JOB                               67660     PERSON                            13884     SCRTY_TBL_DEPT                      568

None of these tables is really large; it is interesting to notice that one need not deal with hundreds of millions of rows to perceive a significant degradation of performance as tables grow. The killing factor is how we are visiting tables. Finding out on the development server (obviously not as fast as the server used in production) how many rows are returned by the view is not very difficult but requires steel nerves:

     SQL> select count(*) from PERS_SEARCH_FAST;      COUNT(*)     ----------      264185     Elapsed: 01:35:36.88

A quick look at indexes shows that both treenode and job are over-indexed, a common flaw of COTS packages. We do not have here a case of the "obviously missing index."

Where must we look to find the reason that the query is so slow? We should look mostly at the lethal combination of a reasonably large number of rows and of correlated subqueries. The cascading exists/not exists in particular, is probably what does us in.

In real life, all this analysis took me far more time than it is taking you now to read about it. Please understand that the paragraphs that follow summarize several hours of work and that inspiration didn't come as a flashing illumination!

Take a closer look at the exists/not exists expression. The first level subquery introduces table TReenode. The second level subquery again hits table scrty_tbl_dept, already present in the outer query, and compares it both to the current row of the first level subquery (lines 47 and 48) and to the current row of the outer subquery (lines 44, 45, 46, 49, and 50)! If we want to get tolerable performance, we absolutely must disentangle these queries.

Can we understand what the query is about? As it happens, treenode, in spite of its misleading name, doesn't seem to be the table that stores the "nested sets." The references to a range of numbers are all related to scrty_tbl_dept; treenode looks more like a denormalized flat list (sad words to use in a supposedly relational context) of the "nodes" described in scrty_tbl_dept. Remember that in the nested set implementation of tree structures, two values are associated with each node and computed in such a way that the values associated with a child node are always between the values associated with the parent node. If the two values immediately follow each other, then we necessarily have a leaf node (the reverse is not true, because a subtree may have been pruned and value recomputation skipped, for obvious performance reasons). If we try to translate the meaning of lines 31 to 50 in English (sort of), we can say something like:: There is in treenode a row with a particular tree_name that matches job on both setid_dept and deptid, as well as matching scrty_tbl_dept on setid and tree_effdt, and that points to either the current "node" in scrty_tbl_dept or to one of its descendents. There is no other node (or descendent) in scrty_tbl_dept that the current treenode row points to, that matches the current one on setid and rowsecclass, and that is a descendent of that node.

Dreadful jargon, especially when one has not the slightest idea of what the data is about. Can we try to express the same thing in a more intelligible way, in the hope that it will lead us to more intelligible and efficient SQL? The key point is probably in the there is no other node part. If there is no descendent node, then we are at the bottom of the tree for the node identified by the value of tree_node_num in treenode. The subqueries in the initial view text are hopelessly mingled with the outer queries. But we can write a single contained query that "forgets" for the time being about the link between TReenode and job and computes, for every node of interest in scrty_tbl_dept (a small table, under 600 rows), the number of children that match it on setid and rowsecclass:

     select s1.rowsecclass,            s1.setid,            s1.tree_node_num,            tn.tree_node,            count(*) - 1 children     from scrty_tbl_dept s1,          scrty_tbl_dept s2,          treenode tn     where s1.rowsecclass = s2.rowsecclass       and s1.setid = s2.setid       and s1.access_cd = 'Y'       and tn.tree_name = 'DEPT_SECURITY'       and tn.setid = s1.setid       and tn.effdt = s1.tree_effdt       and s2.tree_node_num between s1.tree_node_num                                and s1.tree_node_num_end       and tn.tree_node_num between s2.tree_node_num                                and s2.tree_node_num_end     group by s1.rowsecclass,              s1.setid,              s1.tree_node_num,              tn.tree_node

(The count(*) - 1 is for not counting the current row.) The resulting set will be, of course, small, at most a few hundred rows. We shall filter out nodes that are not leaf nodes (in our context) by using the preceding query as an inline view, and applying a filter:

     and children = 0

From here, and only from here, we can join to job and properly determine the final set. Giving the final text of the view would not be extremely interesting. Let's just point out that the first succession of exists:

      and (job.effdt=              ( select max(job2.effdt)                from job job2                where job.emplid = job2.emplid                  and job.empl-rcd = job2.empl_rcd                  and job2.effdt <= to_date(to_char(sysdate,'YYYY-MM-DD'),                                                            'YYYY-MM-DD'))            and job.effseq =                  ( select max(job3.effseq)                    from job job3                    where job.emplid = job3.emplid                      and job.empl_rcd = job3.empl_rcd                      and job.effdt = job3.effdt ) )

is meant to find, for the most recent effdt for the current (emplid, empl_rcd) pair, the row with the highest effseq value. This condition is not, particularly in comparison to the other nested subquery, so terrible. Nevertheless, OLAP (or should we say analytical, since we are in an Oracle context?) functions can handle, when they are available, this type of "top of the top" case slightly more efficiently. A query such as:

     select emplid,            empl_rcd,            effdt,            effseq     from (select emplid,                  empl_rcd,                  effdt,                  effseq                  row_number() over (partition by emplid, empl_rcd                                     order by effdt desc, effseq desc) rn           from job           where effdt <= to_date(to_char(sysdate,'YYYY-MM-DD'),'YYYY-MM-DD'))     where rn = 1

will easily select the (emplid, empl_rcd) values that we are really interested in and will be easily reinjected into the main query as an inline view that will be joined to the rest. In real life, after rewriting this query, the hourly process that had been constantly lengthening fell from 15 to under 2 minutes.

Minimize the dependencies of correlated subqueries on elements from outer queries.

10.1.2. Partitioning to the Rescue

When the number of rows to process is on the increase, index searches that work wonders on relatively small volumes become progressively inefficient. A typical primary key search requires the DBMS engine to visit 3 or 4 pages, descending the index, and then the DBMS must visit the table page. A range scan will be rather efficient, especially when applied to a clustering index that constrains the table rows to be stored in the same order as the index keys. Nevertheless, there is a point at which the constant to-and-fro between index page and table page becomes costlier than a plain linear search of the table. Such a linear search can take advantage of parallelism and read-ahead facilities made available by the underlying operating system and hardware. Index-searches that rely on key comparisons are more sequential by nature. Large numbers of rows to inspect exemplify the case when accesses should be thought of in terms of sweeping scans, not isolated incursions, and joins performed through hashes or merges, not loops (all this was discussed in Chapter 6).

Table scans are all the more efficient when the ratio of rows that belong to the result set to rows inspected is high. If we can split our table, using the data-driven partitioning introduced in Chapter 5, in such a way that our search criteria can operate on a well defined physical subset of the table, we maximize scan efficiency. In such a context, operations on a large range of values are much more efficient when applied brutishly to a well-isolated part of a table than when the boundaries have to be checked with the help of an index.

Of course, data-driven partitioning doesn't miraculously solve all volume issues:

For one thing, the repartition of the partitioning keys must be more or less uniform; if we can find one single value of the partitioning key in 90% of rows, then scanning the table rather than the partition will hardly make any difference for that key; and for the others, they will probably be accessed more efficiently by index. The benefit of using an index that operates against a partitioned table will be slight for selective values. Uniformity of distribution is the reason why dates are so well suited to partitioning, and why range partitioning by date is by far the most popular method of partitioning.
A second point, possibly less obvious but no less important, is that the boundaries of ranges must be well defined, in both their lower value and upper values. This isn't a peculiarity of partitioned tables, because the same can be said of index range scans. A half-bounded range, unless we are looking for values greater than a value close to the maximum in the table or lesser than a value close to the minimum, will provide no help in significantly reducing the rows we have to inspect. Similarly, a range defined as:

     where date_column_1 >= some value     and date_column_2 <= some other value

will not enable us to use either partitioning or indexing any more efficiently than if only one of the conditions was specified. It's by specifying a between (or any semantic equivalent) encompassing a small number of partitions that we shall make best usage of partitioning.

Half-bounded conditions make a poor use of both indexes and partitions.

10.1.3. Data Purges

Archival and data purges are too often considered ancillary matters, until they are seen as the very last hope for retrieving those by-and-large satisfactory response times of six months ago. Actually, they are extremely sensitive operations that, poorly handled, can put much strain on a system and contribute to pushing a precarious situation closer to implosion.

The ideal case is when tables are partitioned (true partitioning or partitioned view) and when archival and purges operate on partitions. If partitions can be simply detached, in one way or another, then an archival (or purge) operation is trivial: a partition is archived and a new empty one possibly created. If not, we are still in a relatively strong position: the query that selects rows for archival will be a simple one, and afterwards it will be possible to truncate a partition--truncate being a way of emptying a table or partition that bypasses most of the usual mechanisms and is therefore much faster than regular deletes.

Because truncate bypasses so much of the work that delete performs, you should use caution. The use of TRuncate may impact your backups, and it may also have other side effects, such as the invalidation of some indexes. Any use of truncate should always be discussed with your DBAs.

The less-than-ideal, but oh-so-common case is when archival is triggered by age and other conditions. Accountants, for instance, are often reluctant to archive unpaid invoices, even when rather old. This makes the rather simple and elegant partition shuffling or truncation look too crude. Must we fall back on the dull-but-trusted delete?

It is at this point interesting to try to rank data manipulation operations (inserts, updates, and deletes) in terms of overall cost. We have seen that inserts are pretty costly, in large part because when you insert a new row, all indexes on the table have to be maintained. Updates require only maintenance of the indexes on the updated columns. Their weakness, compared to inserts, is two-fold: first, they are associated with a search (a where clause) that can be as disastrous as with a select, with the aggravating circumstance that in the meanwhile locks are held. Second, the previous value, inexistent in the case of an insert, must be saved somewhere so as to be available in case of rollback. Deletes combine all the shortcomings: they affect all indexes, are usually associated with a where clause that can be slow, and need to save the values for a possible transaction rollback.

Of all operations that change data, deletes offer the greatest potential for trouble.

If we can therefore save on deletes, even at the price of other operations, we are likely to end up on the winning side. When a table is partitioned and archival and purge are dependent mostly on a date condition with strings attached, we can consider a three stage purge:

Insert into a temporary table those old rows that we want to keep.
Truncate partitions.
Insert back from the temporary table those rows that should be retained.

Without partitioning, the situation is much more difficult. In order to limit lock durationand assuming of course that once a row has attained the "ready for archival" state, no operation whatsoever can put it back to the "no, wait, I have second thought" statewe can consider a two-step operation. This two-step operation will be all the more advantageous given that the query that identifies rows for archiving is a slow-running one. What we may do in that case is:

Build a list of the identifiers of the rows to archive.
Join on this list for both archival and purge, rather than running the same slow where clause twice, once in a select statement and once in a delete statement.

A major justification for temporary tables is to enable massive, table-oriented operations that would outperform row-wise operations.