Section 8.2. Abstract Layers

8.2. Abstract Layers

It is a common practice to create a succession of abstract layers over a suite of software primitives, ostensibly for maintenance reasons and software reuse. This is a worthy practice and provides superb material for exciting management presentations. Unfortunately, this approach can very easily be abused, especially when the software primitives consist of database accesses. Of course, such an industrial aspect of software engineering is usually associated with modern, object-oriented languages.

I am going to illustrate how not to encapsulate database accesses with some lines from a real-life program. Interestingly for a book entitled The Art of SQL, the following fragment of C# code (of questionable sharpness...) contains only bits of an SQL statement. It is nevertheless extremely relevant to our topic, for deplorable reasons.

 1    public string Info_ReturnValueUS(DataTable dt, 2                               string    codeForm, 3                                string    infoTxt) 4    { 5      string returnValue = String.Empty ; 6    try 7    { 8      infoTxt = infoTxt.Replace("'","''"); 9      string expression = ComparisonDataSet.FRM_CD 10                              + " = '" + codeForm 11                              + "' and " + ComparisonDataSet.TXT_US 12                              + " = '" + infoTxt + "'" ; 13        DataRow[] drsAttr = dt.Select(expression); 14 15        foreach (DataRow dr in drsAttr) 16        { 17         if (dr[ComparisonDataSet.VALUE_US].ToString().ToUpper().Trim(  ) 18                                   != String.Empty) 19            { 20             returnValue = dr[ComparisonDataSet.VALUE_US].ToString(  ) ; 21             break; 22            } 23        } 24      } 25      catch (MyException myex) 26      { 27        throw myex ; 28      } 29      catch (Exception ex) 30      { 31        throw new MyException("Info_ReturnValueUS " + ex.Message) ; 32      } 33      return returnValue ; 34    }

There is no need to be a C# expert to grasp the purpose of the above method, at least in general terms. The objective is to return the text associated with a message code. That text is to be returned in a given language (in this case American English, as US suggests). This code is from a multilingual system, and there is a second, identical function, in which two other letters replace the letters U and S. No doubt when other languages will be required, the same lines of code will be copied as many times as we have different languages, and the suitable ISO code substituted for US each time. Will it ease maintenance, when each change to the program has to be replicated to umpteen identical functions (...but for the ISO code)? I may be forgiven for doubting it, in spite of my legendary faith in what exciting management presentations promise modern languages to deliver.

But let's study the program a little more closely. The string expression in lines 912 is an example of shameless hardcoding , before being passed in line 13 to a Select( ) method that can reasonably be expected to perform a query. In fact, it would seem that two different types of elements are hardcoded: column names (stored in attributes ComparisonDataSet.FRM_CD and ComparisonDataSet.TXT_US--and here, apparently, there is one column per supported language, which is a somewhat dubious design) and actual values passed to the query (codeForm and infoTxt). Column names can only be hardcoded, but there should not be a very great number of different combinations of column names, so that the number of different queries that can be generated will necessarily be small and we will have no reason to worry about this. The same cannot be said of actual values: we may query as many different values as we have rows in the table; in fact we may even query more, generating queries that may return nothing. The mistake of hard-coding values from codeForm and infoTxt into the SQL statement is serious because this type of "give me the associated label" query is likely to be called a very high number of times. As it is written, each call will trigger the full mechanism of parsing, determining the best execution plan, and so onfor no advantage. The values should be passed to the query as bind variables--just like arguments are passed to a function.

The loop of lines 1523 is no less interesting. The program is looking for the first value that is not empty in the dataset just returneddare we say the first value that is not null? Why code into an external application something that the SQL language can do perfectly well? Why return from the server possibly many more rows than are required, just to discard them afterwards? This is too much work. The database server will do more work, because even if we exit the loop at the first iteration, it is quite common to pre-fetch rows in order to optimize network traffic. The server may well have already returned tens or hundreds of rows before our application program begins its first loop. The application server does more work too, because it has to filter out most of what the database server painstakingly returned. Needless to say, the developer has written more code than is required. It is perfectly easy to add a suitable condition to expression, so that unneeded rows are not returned. As the C# code generates the query, the server has no idea that we are interested only in the first non-null value and will simply do as instructed. If we were to try and check on the database side for a clue indicating wrongly written code, the only thing that may possibly hint at a problem in the code will be the multitude of nearly identical hardcoded statements. This anomaly is, however, only a part of the larger problem.

One can write very poor code in any language, from plain old COBOL down to the coolest object-oriented language. But the greater the degree of independence between each layer of software, the better written those layers must each be. The problem here is that a succession of software layers may be called. No matter how skilled the developer who assembles these layers into the overall module, the final performance will be constrained by the weakest layer.

The problem of the weakest layer is all the more perverse when you inherit bad librariesas with inheriting bad genes, there is not much you can do about it. Rewriting inefficient low-level layers is rarely allowed by schedules or budgets. I once learned about a case in which a basic operator in a programming language had been "overloaded" (redefined) and was performing a database access each time it was used by unsuspecting developers! It is all the more complicated to correct such a situation, because it is quite likely that individual queries, as seen from the database server, will look like conspicuously plain queries, not like the bad sort of SQL query that scans millions of rows and attracts immediate attention.

Cool database access libraries are not necessarily efficient libraries.