Section 2.18. Discerning Use of Exceptions

2.18. Discerning Use of Exceptions

There is a thin line between courage and rashness; when I recommend coding aggressively, my model is not the charge of the Light Brigade at Balaclava.^[*] Programming by exception can also be the consequence of an almost foolhardy bravado, in which our proud developers determine to "go for it." They have an overriding confidence that testing and the ability to handle exceptions will see them through. Ah, the brave die young!

^[*] During the Crimean War of 1854 that saw England, France, and Turkey fight against Russia, a poorly specified order and personal enmity between some of the commanders led more than 600 British cavalry men to charge down a valley in full line of fire of the Russian guns. Around 120 men and half the horses were killed, for no result. The bravery of the men, celebrated in a poem by Tennyson and (later) several Hollywood movies, helped turn a stupid military action into a myth.

As their name implies, exceptions should be exceptional occurrences. In the particular case of database programming, all exceptions do not require the same computer resourcesand this is probably the key point to understand if they are to be used intelligently. There are good exceptions, conditions that are raised before anything has been done, and bad exceptions, which are raised only when the full extent of the disaster has actually happened.

For instance, a query against a primary key that finds no row will take minimal resourcesthe situation is detected while searching the index. However, if the query cannot use an index, then you have to carry out a full table scan before being able to tell positively that no data has been found. For a very large table, a total sequential read can represent a disaster on a machine near maximum capacity.

Some exceptions are extremely costly, even in the best-case scenario; take the detection of duplicate keys. How is uniqueness enforced? Almost always by creating a unique index, and it is when a key is submitted for entry into that index that any constraint violation of that unique index will be revealed. However, when an index entry is created, the physical address of the row must be provided, which means that the insertion into the table takes place prior to the insertion into the index. The constraint violation requires that the partial insert must be undone, together with the identification of the exact constraint violated being returned as an error message. All of these activities carry some significant processing cost. But the greatest sin is trying to fight at the individual exception level. Here, one is forced to think about individual rows rather than data setsthe very antithesis of relational database processing. The consequence of repeated constraint violations can be a serious deterioration in performance.

Let's look at an Oracle example of the previous points. Assume that following the merger of two companies, email addresses are standardized on the <Initial><Name> pattern, on 12 characters at most, with all spaces or quotes replaced by an underscore character.

Let's assume that a new employee table is created with the new email addresses obtained from a 3,000-row employee_old table. We want each employee to have a unique email address. We must therefore assign, for instance, flopez to Fernando Lopez, and flopez2 to Francisco Lopez (no relation). In fact, in our test data, a total of 33 potential duplicate entries exist, which is the reason for the following result:

     SQL> insert into employees(emp_num, emp_name,                                emp_firstname, emp_email)       2  select emp_num,       3         emp_name,       4         emp_firstname,       5         substr(substr(EMP_FIRSTNAME, 1, 1)       6               ||translate(EMP_NAME, ' ''', '_  _'), 1, 12)       7  from employees_old;     insert into employees(emp_num, emp_name, emp_firstname, emp_email)     *     ERROR at line 1:     ORA-00001: unique constraint (EMP_EMAIL_UQ) violated     Elapsed: 00:00:00.85

Thirty-three duplicates out of 3,000 is about 1%, so perhaps it would be possible to quietly process the conformant 99% and handle the rest through exceptions? After all, it would seem that a 1% load could be accommodated with some additional exception processing which should not be too significant. Following is the code for this optimistic approach:

     SQL> declare       2     v_counter    varchar2(12);       3     b_ok         boolean;       4     n_counter    number;       5     cursor c is  select emp_num,       6                         emp_name,       7                         emp_firstname       8                  from employees_old;       9  begin      10    for rec in c      11    loop      12      begin      13        insert into employees(emp_num, emp_name,      14                              emp_firstname, emp_email)      15        values (rec.emp_num,      16                rec.emp_name,      17                rec.emp_firstname,      18                substr(substr(rec.emp_firstname, 1, 1)      19                ||translate(rec.emp_name, ' ''', '_  _'), 1, 12));      20      exception      21       when dup_val_on_index then      22         b_ok := FALSE;      23         n_counter := 1;      24         begin      25           v_counter := ltrim(to_char(n_counter));      26           insert into employees(emp_num, emp_name,      27                                 emp_firstname, emp_email)      28           values (rec.emp_num,      29                   rec.emp_name,      30                   rec.emp_firstname,      31                   substr(substr(rec.emp_firstname, 1, 1)      32                     ||translate(rec.emp_name, ' ''', '_  _'), 1,      33                     12 - length(v_counter)) || v_counter);      34           b_ok := TRUE;      35         exception      36          when dup_val_on_index then      37            n_counter := n_counter + 1;      38         end;      39       end;      40    end loop;      41  end;      40  /     PL/SQL procedure successfully completed.     Elapsed: 00:00:18.41

But what exactly is the cost of this exception handling? If the same exercise is attempted after removing the "problem" rows, the comparison between the loop with duplicates and the loop without duplicates shows that the cost of processing exceptions in the loop is fairly negligiblewith duplicates the procedure also takes about 18 seconds to run. However, when we run the insert...select of our first attempt without duplicates it is considerably faster than the loop: we discover that the switch to the one-row-at-a-time logic adds close to 50% to processing time. But in such a case, is it possible to avoid the row-at-a-time process? Yes, but only by avoiding exceptions. It's the decision of dealing with problem rows through exception handling that forced our adoption of sequential row processing.

Alternatively, there might be value in attempting to identify those rows that contain email addresses subject to contention, and assigning those addresses some arbitrary number to achieve uniqueness.

It is easy to determine how many rows are involved in this contention by adding a group by clause to the SQL statement. However, assigning numbers might be a difficult thing to do without using the analytical functions available in the major database systems. (Oracle calls them analytical functions, DB2 knows them as online analytical processing, or OLAP, functions, SQL Server as ranking functions.) It is worthwhile to explore the solution to this problem in terms of pure SQL.

Each email address can be assigned a unique number: 1 for the oldest employee whose first name initial and surname result in the given email address, 2 to the second oldest and so on. By pushing this result into a subquery, it is possible to check and concatenate nothing to the first email address in each group, and the sequence numbers (not in the Oracle sense of the word) to the following ones. The following code shows how our logic can be applied:

     SQL> insert into employees(emp_num, emp_firstname,       2                        emp_name, emp_email)       3  select emp_num,       4         emp_firstname,       5         emp_name,       6         decode(rn, 1, emp_email,       7                       substr(emp_email,       8                       1, 12 - length(ltrim(to_char(rn))))       9                        || ltrim(to_char(rn)))      10  from (select emp_num,      11               emp_firstname,      12               emp_name,      13               substr(substr(emp_firstname, 1, 1)      14                 ||translate(emp_name, ' ''', '_  _'), 1, 12)      15                          emp_email,      16               row_number(  )      17                  over (partition by      18                        substr(substr(emp_firstname, 1, 1)      19                         ||translate(emp_name,' ''','_  _'),1,12)      20                        order by emp_num) rn      21        from employees_old)      22  /     3000 rows created.     Elapsed: 00:00:11.68

We avoid the costs of row-at-a-time processing, and this solution requires only 60% of the original time.

Exception handling forces the adoption of procedural logic. Always try to anticipate possible exceptions by remaining within declarative SQL.