Dealing with NULL

The FULL OUTER JOIN example shown in Figure 7-4 has a necessary twist because of the NULL value that's generated for the missing author state column. Note the WHERE clause:

 WHERE state <> 'CA' OR state IS NULL 


 OR state IS NULL 

our favorite row, which contains the title The Psychology of Computer Cooking, wouldn't have been selected. Because this title is missing its corresponding author, the state column for the query is NULL.

The expression

 state <> 'CA' 

doesn't evaluate to TRUE for a NULL value. NULL is considered unknown. If you were to omit the expression

 OR state IS NULL 

and execute the query once with the predicate

 state = 'CA' 

and once with the predicate

 state <> 'CA' 

it might seem intuitive that, between the two queries, all rows must have been selected. But this isn't the case. A NULL state satisfies neither condition, and the row with the title The Psychology of Computer Cooking wouldn't be part of either result.

This example illustrates the essence of three-valued logic, so named because a comparison can evaluate to one of three conditions: TRUE, FALSE, or Unknown. Dealing with NULL entries requires three-valued logic. If you allow NULL entries or have cases in which outer joins are necessary, correctly applying three-valued logic is necessary to avoid introducing bugs into your application.

Assume that the table Truth_Table has columns X and Y . We have only one row in the table, as follows , with X being NULL and Y being 1 :

 X      Y  ----   - NULL   1 

Without looking ahead, test yourself to see which of the following queries returns 1 :

 SELECT 1 FROM Truth_Table WHERE X <> Y -- NOT EQUAL SELECT 1 FROM Truth_Table WHERE X = Y  -- EQUAL SELECT 1 FROM Truth_Table WHERE X != Y -- NOT EQUAL                                        -- (alternative formulation) SELECT 1 FROM Truth_Table WHERE X < Y  -- LESS THAN SELECT 1 FROM Truth_Table WHERE X !< Y -- NOT LESS THAN SELECT 1 FROM Truth_Table WHERE X > Y  -- GREATER THAN SELECT 1 FROM Truth_Table WHERE X !> Y -- NOT GREATER THAN 

In fact, none of the previous SELECT statements returns 1 , because none of the comparison operations against X , which is NULL, evaluate to TRUE. All evaluate to Unknown. Did you score 100 percent? If so, are you confident that all your developers would also score 100? If you allow NULL, your answer had better be yes.

Further complexity arises when you use expressions that can be Unknown with the logical operations of AND, OR, and NOT. Given that an expression can be TRUE, FALSE, or Unknown, the truth tables in Table 7-1 below summarize the result of logically combining expressions.

These truth tables can reveal some situations that otherwise might not be obvious. For example, you might assume that the condition (X >= 0 OR X <= 0) must be TRUE, because X for any number always evaluates to TRUE; it must be either 0 or greater, or 0 or less. However, if X is NULL, then X >= 0 is Unknown, and X <= 0 is also Unknown. Therefore, the expression evaluates as Unknown OR Unknown . As the OR truth table shows, this evaluates to Unknown. To further illustrate , the following SELECT statement doesn't return a 1 :

 SELECT 1 FROM Truth_Table WHERE (X >= 0 OR X <= 0) 

Table 7-1. Truth tables showing results of logically combining expressions.

AND with Value True Unknown False
Unknown Unknown Unknown FALSE
OR with Value True Unknown False
Unknown TRUE Unknown Unknown
NOT Evaluates to
Unknown Unknown
False TRUE

And because (X >= 0 OR X <= 0) is Unknown, the condition:

 WHERE NOT(X >= 0 OR X <= 0) 

is equivalent (in pseudocode) to:

 WHERE NOT (Unknown) 

The NOT truth table shows that NOT (Unknown) is Unknown, so the following negation of the previous SELECT statement also returns nothing:

 SELECT 1 FROM Truth_Table WHERE NOT(X >= 0 OR X <= 0) 

And, at the possible risk of belaboring this point, since neither expression evaluates to TRUE, OR'ing them makes no difference either; consequently, this SELECT also returns nothing:

 SELECT 1 FROM TRUTH_TABLE      WHERE (X >= 0 OR X <= 0) OR NOT(X >= 0 OR X <= 0) 

The fact that none of these SELECT statements evaluates to TRUE, even when negating the expression, illustrates that not evaluating to TRUE doesn't imply evaluating to FALSE. Rather, it means either evaluating to FALSE or evaluating to Unknown.

Two special operators, IS NULL and IS NOT NULL, exist to deal with NULL. Checking an expression with one of these operators always evaluates to either TRUE or FALSE. No Unknown condition is produced when using IS NULL or IS NOT NULL, as shown in Table 7-2.

Table 7-2. Truth tables for IS NULL and IS NOT NULL.

IS NULL Evaluates to IS NOT NULL Evaluates to

In the full outer-join query example, the search criteria was

 WHERE state <> 'CA' OR state IS NULL 

For the row with the title The Psychology of Computer Cooking (which was produced from the outer-join operation preserving the titles row with an un-matched authors entry), the authors.state column is NULL. Hence, the expression

 state <> 'CA' 

is Unknown for that row, but the expression

 state IS NULL 

is TRUE for that row. The full expression evaluates to (Unknown OR TRUE) , which is TRUE, and the row qualifies. Without

 OR state IS NULL 

the expression is only (Unknown) , so the row doesn't qualify.

NULL in the Real World

As you know, we strongly recommend that you minimize the use of NULL. In outer-join operations, you should carefully account for NULL values that are generated to preserve rows that don't have a match in the table being joined. And even if you're comfortable with three-valued logic, your developers and anyone querying the data using the SQL language typically won't be; therefore, introducing NULL will be a source of bugs just waiting to happen. You can, of course, make all column definitions NOT NULL and declare default values if no value is specified.

You can often avoid problems of having no matching rows ”without the need for outer-join operations ”by providing placeholder, or dummy , rows. For example, you could easily create a dummy titles row and a dummy authors row. By using constraints or triggers, you could insist that every row in authors must have a matching entry in titleauthor , even if the ultimate matching row in titles is simply the dummy row. Likewise, every row in titles could be made to reference titleauthor , even if the matching row in authors is just the dummy row. Having done this, a standard equijoin returns all rows because no unmatched rows exist. There's no outer join or NULL complication.

Here's an example: After adding a dummy row for authors , one for titles , and one for each unmatched row into the titleauthor table, a standard inner join (with INNER specified purely to emphasize the point) is used to produce essentially the same results as the FULL OUTER JOIN. The only difference in the results is that the placeholder values ***No Current Author*** and ***No Current Title*** are returned instead of NULL.

 BEGIN TRAN -- Transaction will be rolled back so as to avoid            -- permanent changes -- Dummy authors row INSERT authors  (au_id, au_lname, au_fname, phone, address, city, state, zip, contract) VALUES  ('000-00-0000', '***No Current Author***', ', 'NONE', 'NONE', 'NONE', 'XX', '99999', 0) -- Dummy titles row INSERT titles  (title_id, title, type, pub_id, price, advance, royalty, ytd_sales, notes, pubdate) VALUES  ('ZZ9999', '***No Current Title***', 'NONE', '9999', 0.00, 0, 0, 0, 'NONE', '1900.01.01') -- Associate authors with no current titles to dummy title INSERT titleauthor VALUES ('341-22-1782', 'ZZ9999', 0, 0) INSERT titleauthor VALUES ('527-72-3246', 'ZZ9999', 0, 0) INSERT titleauthor VALUES ('724-08-9931', 'ZZ9999', 0, 0) INSERT titleauthor VALUES ('893-72-1158', 'ZZ9999', 0, 0) -- Associate titles with no current author to dummy author INSERT titleauthor VALUES ('000-00-0000', 'MC3026', 0, 0) -- Now do a standard INNER JOIN SELECT  'Author'=RTRIM(au_lname) + ', ' + au_fname, 'Title'=title FROM          authors AS A             -- JOIN conditions         INNER JOIN titleauthor AS TA ON A.au_id=TA.au_id          INNER JOIN titles AS T ON t.title_id=TA.title_id WHERE  A.state <> 'CA' ORDER BY 1 ROLLBACK TRAN    -- Undo changes 

This is the result:

 Author                       Title -------------                ------------------ ***No Current Author***,     The Psychology of Computer Cooking Blotchet-Halls, Reginald     Fifty Years in Buckingham Palace                              Kitchens DeFrance, Michel             The Gourmet Microwave del Castillo, Innes          Silicon Valley Gastronomic Treats Greene, Morningstar          ***No Current Title*** Panteley, Sylvia             Onions, Leeks, and Garlic: Cooking                              Secrets of the Mediterranean Ringer, Albert               Is Anger the Enemy? Ringer, Albert               Life Without Fear Ringer, Anne                 The Gourmet Microwave Ringer, Anne                 Is Anger the Enemy? Smith, Meander               ***No Current Title*** 

Earlier, you saw how OUTER JOIN became complicated and prone to mistakes. It's comforting to know that, regardless of the order in which we specify the tables for the join operation using dummy rows, the semantics and results are the same because now this operation is simply an equijoin.

However, dummy values won't solve all the woes created by using NULL. First, using dummy values demands more work in your application to ensure that relationships to the placeholder values are maintained . Even more important, many queries still need to be able to discern that the placeholders are special values, or the results of the query could be suspect. For example, if you used the placeholder value of 0 (zero) instead of NULL for a salary field in an employee table, you'd have to be careful not to use such a value in a query looking for average values or minimum salaries, because a salary of 0 would alter your average or minimum calculations inappropriately. If the placeholder is simply another designation for "unknown," you must write your queries with that in mind. Placeholder values usually solve the issues that arise in joining tables, and they can be an effective alternative to using OUTER JOIN. With placeholder values, developers are less likely to write incorrect queries, especially those who have only basic SQL knowledge.

In the previous example, the na ve query returns the same result set as the query with an OUTER JOIN and the criteria OR state IS NULL . It took careful work to construct the placeholder data and maintain the relationships for this to be the case. But the group database expert (probably you!) can do that work, and once done, individuals with less SQL or database knowledge won't be as likely to introduce bugs in their queries simply because they're unfamiliar with the issues of OUTER JOINS and three-valued logic. Sometimes you might feel you have to use NULL values; however, if you can minimize or avoid these cases, you're wise to do so.

If dummy entries aren't practical for you to use, try a few other tricks. A handy and underused function is ISNULL. Using ISNULL( expression , value ) substitutes the specified value for the expression when the expression evaluates to NULL. In the earlier outer-join example, the WHERE clause was specified as

 WHERE state <> 'CA' OR state IS NULL 

By using the ISNULL function, a special value (in this case, XX ) can be assigned to any NULL state values. Then the expression <> 'CA' always evaluates to TRUE or FALSE rather than to Unknown. The clause OR state IS NULL isn't needed.

The following query produces the full outer-join result, without using IS NULL:

 SELECT  'Author'=RTRIM(au_lname) + ', ' + au_fname, 'Title'=title FROM      authors AS A             -- JOIN CONDITIONS     FULL OUTER JOIN titleauthor AS TA ON A.au_id=TA.au_id      FULL OUTER JOIN titles AS T ON t.title_id=TA.title_id WHERE  ISNULL(state, 'XX') <> 'CA' ORDER BY 1 


You've seen how the IS NULL operator exists to handle the special case of looking for NULL. Because NULL is considered Unknown, ANSI-standard SQL doesn't provide for an equality expression of NULL ”that is, there's no WHERE value = NULL . Instead, you must use the IS NULL operator WHERE value IS NULL . This can make queries more awkward to write and less intuitive to non-SQL experts. One of my favorite SQL gurus at Microsoft, Rick Vicik, says tongue-in-cheek that this is because "NULLs are evil and you should experience extra pain every time you use one." And it's somewhat curious , or at least nonorthogonal , that ANSI SQL doesn't force an UPDATE statement to set a value TO NULL but instead uses the = NULL formulation. (Table 7-3 below summarizes this state of affairs.)

Orthogonal is an adjective, somewhat jargonish, that's often applied to programming languages and APIs. It's a mathematical term meaning that the sum of the products of corresponding elements in a system equals 0. This implies that a system is orthogonal if all of its primitive operations fully describe the base capabilities of the system and those primitives are independent and mutually exclusive of one another. In broader use, the term also denotes a certain symmetry and consistency. For example, an API that has a GetData call would also have a PutData call, each taking arguments that are as consistent as possible between the two. The two APIs make up a mutually exclusive domain of operations that you can perform on data, and they are named and invoked in a consistent manner. Such a pair of APIs would be considered orthogonal. SQL Server 7 has a BULK INSERT command but no BULK EXPORT command, so this is another example of nonorthogonal behavior.

Table 7-3. ANSI SQL's treatment of NULLs.

ANSI SQL Doesn't Provide for This And Provides Only for This
Yet ANSI SQL Uses Only This And Not Something Like This
UPDATE employee SET salary = NULL UPDATE employee SET salary TO NULL

SQL Server provides an extension to ANSI SQL that allows queries to use = NULL as an alternative to IS NULL. (The SET ANSI_NULLS option has an oblique reference to this extension, but it's never been thoroughly explained.)

Note that the formulation is not = "NULL" ” the quotation marks would have the effect of searching for a character string containing the word NULL , as opposed to searching for the special meaning of NULL. This way of specifying = NULL is simply a shorthand for IS NULL and provides a more convenient formulation of queries, especially by allowing NULL within an IN clause. The IN clause is a standard SQL shorthand notation for an expression that checks multiple values, as illustrated below:

Multiple OR Conditions Equivalent Shorthand Formulation
WHERE state = 'CA' OR state = 'WA' OR state ='IL' WHERE state IN ('CA', 'WA', 'IL')

If you wanted to find states that are either one of the three values depicted in the previous example or NULL, ANSI SQL prescribes that the condition be stated using both an IN clause and an OR state IS NULL expression; SQL Server also allows all the conditions to be specified with the IN clause, as shown here:

ANSI SQL Prescribes SQL Server also Allows
WHERE state IN ('CA', 'WA', 'IL') OR state IS NULL WHERE state IN ('CA', 'WA', 'IL', NULL)

The = NULL shorthand can be particularly handy to use when you're dealing with parameters passed to a stored procedure, and this is the main reason it exists. Here's an example:

 CREATE PROCEDURE get_employees (@dept char(8), @class char(5)) AS     SELECT * FROM employee WHERE employee.dept=@dept         AND employee.class=@class 

By default, we can pass NULL to this procedure for either parameter; no special treatment is necessary. But without the = NULL shorthand, we need to put in extensive special handling even for this simple procedure:

 CREATE PROCEDURE get_employees (@dept char(8), @class char(5)) AS     IF (@dept IS NULL AND @class IS NOT NULL)         SELECT * FROM employee WHERE employee.dept IS NULL         AND employee.class=@class     ELSE IF (@dept IS NULL AND @class IS NULL)         SELECT * FROM employee WHERE employee.dept IS NULL         AND employee.class IS NULL     ELSE IF (@dept IS NOT NULL AND @class IS NULL)         SELECT * FROM employee WHERE employee.dept=@dept         AND employee.class IS NULL     ELSE         SELECT * FROM employee WHERE employee.dept=@dept         AND employee.class=@class 

This example is pretty trivial, and the situation becomes much more complex if you have multiple parameters and any one of them might be NULL. You then need a SELECT statement for every combination of parameters that might be NULL or NOT NULL. It can get ugly quickly.

The = NULL syntax is only possible when you have the ANSI_NULLS and ANSI_DEFAULTS options set to OFF, and if you have the database option ANSI NULLS set to FALSE. Since the ODBC driver and OLE DB provider can set these options on, you need to verify what is in effect for your connection.

As mentioned earlier, dealing with NULL becomes an issue of nearly religious fervor to some in the database industry. No doubt, a few people will find the SQL Server extension that allows an expression to be checked for = NULL to be a grave and serious deviation from three-valued logic (which is the primary reason this extension hasn't been well-documented, although many users have discovered it). As long as you realize that = NULL is purely a convenient shorthand for IS NULL, it should be clear to you that no violation of three-valued logic has occurred. Those of you who are vehemently opposed to such formulations don't have to use these extensions, of course. You can also prevent this extension from working by enabling the SET ANSI_NULLS option.

Inside Microsoft SQL Server 7.0
Inside Microsoft SQL Server 7.0 (Mps)
ISBN: 0735605173
EAN: 2147483647
Year: 1999
Pages: 144 © 2008-2017.
If you may any questions please contact us: