Dealing with NULL

The full outer join example shown in "The Obsolete *= OUTER JOIN Operator" has a necessary twist because of the NULL value that's generated for the missing author state column. Note the WHERE clause:

 WHERE state <> 'CA' OR state IS NULL 

Without the following statement

 OR state IS NULL 

our favorite row, which contains the title The Psychology of Computer Cooking, wouldn't have been selected. Because this title is missing a corresponding author, the state column for the query is NULL.

The expression

 state <> 'CA' 

doesn't evaluate to TRUE for a NULL value. NULL is considered unknown. If you were to omit the expression

 OR state IS NULL 

and execute the query once with the predicate

 state = 'CA' 

and once with the predicate

 state <> 'CA' 

it might seem intuitive that, between the two queries, all rows will have been selected. But this isn't the case. A NULL state satisfies neither condition, and the row with the title The Psychology of Computer Cooking won't be part of either result.

This example illustrates the essence of three-valued logic, which is so named because a comparison can evaluate to one of three conditions: TRUE, FALSE, or Unknown. Dealing with NULL entries requires three-valued logic. If you allow NULL entries or have cases in which outer joins are necessary, correctly applying three-valued logic is necessary to avoid introducing bugs into your application.

Assume that the table Truth_Table has columns X and Y. We have only one row in the table, as follows, with X being NULL and Y being 1:

 X Y ---- - NULL 1 

Without looking ahead, test yourself to see which of the following queries returns 1:

 SELECT 1 FROM Truth_Table WHERE X <> Y -- NOT EQUAL SELECT 1 FROM Truth_Table WHERE X = Y -- EQUAL SELECT 1 FROM Truth_Table WHERE X != Y -- NOT EQUAL -- (alternative formulation) SELECT 1 FROM Truth_Table WHERE X < Y -- LESS THAN SELECT 1 FROM Truth_Table WHERE X !< Y -- NOT LESS THAN SELECT 1 FROM Truth_Table WHERE X > Y -- GREATER THAN SELECT 1 FROM Truth_Table WHERE X !> Y -- NOT GREATER THAN 

In fact, none of them returns 1 because none of the comparison operations against X, which is NULL, evaluate to TRUE. All evaluate to Unknown. Did you get this right? If so, are you confident that all your developers would, too? If you allow NULL, your answer had better be yes.

Further complexity arises when you use expressions that can be Unknown with the logical operations of AND, OR, and NOT. Given that an expression can be TRUE, FALSE, or Unknown, the truth tables in Table 7-1 summarize the result of logically combining expressions.

These truth tables can reveal some situations that might not be obvious otherwise. For example, you might assume that the condition (X >= 0 OR X <= 0) must be TRUE because X for any number always evaluates to TRUE; it must be either 0 or greater, or 0 or less. However, if X is NULL, then X >= 0 is Unknown, and X <= 0 is also Unknown. Therefore, the expression evaluates as Unknown OR Unknown. As the OR truth table shows, this evaluates to Unknown. To further illustrate, the following SELECT statement doesn't return a 1:

 SELECT 1 FROM Truth_Table WHERE (X >= 0 OR X <= 0) 

Table 7-1. Truth tables showing results of logically combining expressions.

AND with Value True Unknown False
TRUE TRUE Unknown FALSE
Unknown Unknown Unknown FALSE
FALSE FALSE FALSE FALSE
OR with Value True Unknown False
TRUE TRUE TRUE TRUE
Unknown TRUE Unknown Unknown
FALSE TRUE Unknown FALSE
NOT Evaluates to
TRUE FALSE
Unknown Unknown
False TRUE

And because (X >= 0 OR X <= 0) is Unknown, the condition:

 WHERE NOT (X >= 0 OR X <= 0) 

is equivalent (in pseudocode) to:

 WHERE NOT (Unknown) 

The NOT truth table shows that NOT (Unknown) is Unknown, so the following negation of the previous SELECT statement also returns nothing:

 SELECT 1 FROM Truth_Table WHERE NOT (X >= 0 OR X <= 0) 

And, at the risk of belaboring the point, since neither expression evaluates to TRUE, OR'ing them makes no difference either; consequently, this SELECT also returns nothing:

 SELECT 1 FROM TRUTH_TABLE WHERE (X >= 0 OR X <= 0) OR NOT (X >= 0 OR X <= 0) 

The fact that none of these SELECT statements evaluates to TRUE, even when negating the expression, illustrates that not evaluating to TRUE doesn't imply evaluating to FALSE. Rather, it means either evaluating to FALSE or evaluating to Unknown.

Two special operators, IS NULL and IS NOT NULL, exist to deal with NULL. Checking an expression with one of these operators always evaluates to either TRUE or FALSE. The result returned by IS NULL depends on whether the original expression is a known or unknown value. No Unknown condition is produced when you use IS NULL or IS NOT NULL, as shown in Table 7-2.

Table 7-2. Truth tables for IS NULL and IS NOT NULL.

IS NULL Evaluates to IS NOT NULL Evaluates to
Known Value FALSE Known Value TRUE
Unknown Value TRUE Unknown Value FALSE
(NULL) (NULL)

In the full outer-join query example, the search criteria were

 WHERE state <> 'CA' OR state IS NULL 

For the row with the title The Psychology of Computer Cooking (which was produced by the outer join operation preserving the titles row with an unmatched authors entry), the authors.state column is NULL. Hence, the expression

 state <> 'CA' 

is Unknown for that row, but the expression

 state IS NULL 

is TRUE for that row. The full expression evaluates to (Unknown OR TRUE), which is TRUE, and the row qualifies. Without

 OR state IS NULL 

the expression is only (Unknown), so the row doesn't qualify.

NULL in the Real World

As you know, I strongly recommend that you minimize the use of NULL. In outer-join operations, you should carefully account for NULL values that are generated to preserve rows that don't have a match in the table being joined. And even if you're comfortable with three-valued logic, your developers and anyone querying the data using the SQL language typically won't be; therefore, introducing NULL will be a source of bugs just waiting to happen. You can, of course, make all column definitions NOT NULL and declare default values if no value is specified.

You can often avoid the problems of having no matching rows—without the need for outer-join operations—by providing placeholder, or dummy, rows. For example, you can easily create a dummy titles row and a dummy authors row. By using constraints or triggers, you can insist that every row in authors must have a matching entry in titleauthor, even if the ultimate matching row in titles is simply the dummy row. Likewise, every row in titles can be made to reference titleauthor, even if the matching row in authors is just the dummy row. In this way, a standard inner join returns all rows because no unmatched rows exist. There's no outer join or NULL complication.

Here's an example: after you add a dummy row for authors, one for titles, and one for each unmatched row to the titleauthor table, you use a standard inner join (with INNER specified purely to emphasize the point) to produce essentially the same results as the full outer join. The only difference in the results is that the placeholder values ***No Current Author*** and ***No Current Title*** are returned instead of NULL.

 BEGIN TRAN -- Transaction will be rolled back so as to avoid -- permanent changes -- Dummy authors row INSERT authors (au_id, au_lname, au_fname, phone, address, city, state, zip, contract) VALUES ('000-00-0000', '***No Current Author***', '', 'NONE', 'NONE', 'NONE', 'XX', '99999', 0) -- Dummy titles row INSERT titles (title_id, title, type, pub_id, price, advance, royalty, ytd_sales, notes, pubdate) VALUES ('ZZ9999', '***No Current Title***', 'NONE', '9999', 0.00, 0, 0, 0, 'NONE', '1900.01.01') -- Associate authors with no current titles to dummy title INSERT titleauthor VALUES ('341-22-1782', 'ZZ9999', 0, 0) INSERT titleauthor VALUES ('527-72-3246', 'ZZ9999', 0, 0) INSERT titleauthor VALUES ('724-08-9931', 'ZZ9999', 0, 0) INSERT titleauthor VALUES ('893-72-1158', 'ZZ9999', 0, 0) -- Associate titles with no current author to dummy author INSERT titleauthor VALUES ('000-00-0000', 'MC3026', 0, 0) -- Now do a standard INNER JOIN SELECT 'Author'=RTRIM(au_lname) + ', ' + au_fname, 'Title'=title FROM authors AS A -- JOIN conditions INNER JOIN titleauthor AS TA ON A.au_id=TA.au_id INNER JOIN titles AS T ON t.title_id=TA.title_id WHERE A.state <> 'CA' ORDER BY 1 ROLLBACK TRAN -- Undo changes 

This is the result:

 Author Title ------------- ------------------ ***No Current Author***, The Psychology of Computer Cooking Blotchet-Halls, Reginald Fifty Years in Buckingham Palace Kitchens DeFrance, Michel The Gourmet Microwave del Castillo, Innes Silicon Valley Gastronomic Treats Greene, Morningstar ***No Current Title*** Panteley, Sylvia Onions, Leeks, and Garlic: Cooking Secrets of the Mediterranean Ringer, Albert Is Anger the Enemy? Ringer, Albert Life Without Fear Ringer, Anne The Gourmet Microwave Ringer, Anne Is Anger the Enemy? Smith, Meander ***No Current Title*** 

Earlier, you saw how OUTER JOIN became complicated and prone to mistakes. It's comforting to know that, regardless of the order in which you specify the tables for the join operation using dummy rows, the semantics and results are the same because now this operation is simply an INNER JOIN.

However, dummy values won't solve all the woes created by using NULL. First, using dummy values demands more work in your application to ensure that relationships to the placeholder values are maintained. Even more important, many queries still need to be able to discern that the placeholders are special values, or else the results of the query might be suspect. For example, if you use the placeholder value of 0 (zero) instead of NULL for a salary field in an employee table, you have to be careful not to use such a value in a query that looks for average values or minimum salaries because a salary of 0 would alter your average or minimum calculations inappropriately. If the placeholder is simply another designation for "unknown," you must write your queries with that in mind. Placeholder values usually solve the issues that arise in joining tables, and they can be an effective alternative to using OUTER JOIN. With placeholder values, developers—especially those with only basic SQL knowledge—are less likely to write incorrect queries.

In the previous example, the naïve query returns the same result set as the query with an OUTER JOIN and the criteria OR state IS NULL. It took careful work to construct the placeholder data and maintain the relationships for this to be the case. But the group database expert (probably you!) can do that work, and once it is done, individuals with less SQL or database knowledge won't be as likely to introduce bugs in their queries simply because they're unfamiliar with the issues of OUTER JOINS and three-valued logic. Sometimes you might feel you have to use NULL values; however, if you can minimize or avoid these cases, you're wise to do so.

If dummy entries aren't practical for you to use, try a few other tricks. A handy and underused function is ISNULL. Using ISNULL(expression, value) substitutes the specified value for the expression when the expression evaluates to NULL. In the earlier outer-join example, the WHERE clause was specified as

 WHERE state <> 'CA' OR state IS NULL 

By using the ISNULL function, you can assign a special value (in this case, XX) to any NULL state values. Then the expression <> 'CA' will always evaluate to TRUE or FALSE rather than to Unknown. The clause OR state IS NULL isn't needed.

The following query produces the full outer-join result, without using IS NULL:

 SELECT 'Author'=RTRIM(au_lname) + ', ' + au_fname, 'Title'=title FROM authors AS A -- JOIN CONDITIONS FULL OUTER JOIN titleauthor AS TA ON A.au_id=TA.au_id FULL OUTER JOIN titles AS T ON t.title_id=TA.title_id WHERE ISNULL(state, 'XX') <> 'CA' ORDER BY 1 

IS NULL and = NULL

You've seen how the IS NULL operator exists to handle the special case of looking for NULL. Because NULL is considered Unknown, ANSI-standard SQL doesn't provide for an equality expression of NULL—that is, there's no WHERE value = NULL. Instead, you must use the IS NULL operator WHERE value IS NULL. This can make queries more awkward to write and less intuitive to non-SQL experts. And it's somewhat curious, or at least nonorthogonal, that ANSI SQL doesn't force an UPDATE statement to set a value TO NULL but instead uses the = NULL formulation. (See Table 73 for a summary of this state of affairs.)

NOTE


Orthogonal is a somewhat jargony adjective that's often applied to programming languages and APIs. It's a mathematical term meaning that the sum of the products of corresponding elements in a system equals 0. Thus, a system is orthogonal if all of its primitive operations fully describe the base capabilities of the system and those primitives are independent and mutually exclusive of one another. In a broader sense, the term denotes a certain symmetry and consistency. For example, an API that has a GetData call also has a PutData call, each taking arguments that are as consistent as possible between the two. The two APIs make up a mutually exclusive domain of operations that you can perform on data, and they are named and invoked in a consistent manner. Such a pair of APIs is considered orthogonal. SQL Server 2000 has a BULK INSERT command but no BULK EXPORT command, so this is an example of nonorthogonal behavior.

Table 7-3. ANSI SQL's treatment of NULLs.

ANSI SQL Doesn't Provide for This And Provides Only for This
SELECT… WHERE salary = NULL SELECT… WHERE salary IS NULL
Yet ANSI SQL Uses Only This And Not Something Like This
UPDATE employee SET UPDATE employee SET salary TO
salary = NULL NULL

SQL Server provides an extension to ANSI SQL that allows queries to use = NULL as an alternative to IS NULL. However, this alternative is available only when the ANSI_NULLS session option is OFF. When the ANSI_NULLS option is ON, a condition testing for = NULL is always UNKNOWN.

Note that the formulation is not = 'NULL'— the quotation marks would have the effect of searching for a character string containing the word NULL, as opposed to searching for the special meaning of NULL. The = NULL formulation is simply a shorthand for IS NULL and in some cases provides a more convenient formulation of queries, particularly by allowing NULL within an IN clause. The IN clause is a standard SQL shorthand notation for an expression that checks multiple values, as illustrated here:

Multiple OR Conditions Equivalent Shorthand Formulation
WHERE state = 'CA' OR state = 'WA' OR state ='IL' WHERE state IN ('CA', 'WA', 'IL')

If you want to find states that are either one of the three values depicted in the previous example or NULL, ANSI SQL prescribes that the condition be stated using both an IN clause and an OR state IS NULL expression; SQL Server also allows all the conditions to be specified using the IN clause, as shown here:

ANSI SQL Prescribes SQL Server Also Allows
WHERE state IN ('CA', 'WA', 'IL') OR state IS NULL WHERE state IN ('CA', 'WA', 'IL', NULL)

The = NULL shorthand can be particularly handy when you're dealing with parameters passed to a stored procedure, and this is the main reason it exists. Here's an example:

 CREATE PROCEDURE get_employees (@dept char(8), @class char(5)) AS SELECT * FROM employee WHERE employee.dept=@dept AND employee.class=@class 

By default, you can pass NULL to this procedure for either parameter; no special treatment is necessary. But without the = NULL shorthand, you need to put in extensive special handling even for this simple procedure:

 CREATE PROCEDURE get_employees (@dept char(8), @class char(5)) AS IF (@dept IS NULL AND @class IS NOT NULL) SELECT * FROM employee WHERE employee.dept IS NULL AND employee.class=@class ELSE IF (@dept IS NULL AND @class IS NULL) SELECT * FROM employee WHERE employee.dept IS NULL AND employee.class IS NULL ELSE IF (@dept IS NOT NULL AND @class IS NULL) SELECT * FROM employee WHERE employee.dept=@dept AND employee.class IS NULL ELSE SELECT * FROM employee WHERE employee.dept=@dept AND employee.class=@class 

This example is pretty trivial; the situation becomes much more complex if you have multiple parameters and any one of them might be NULL. You then need a SELECT statement for every combination of parameters that might be NULL or NOT NULL. It can get ugly quickly.

Because your ODBC driver or OLE DB provider can enable the ANSI_NULLS option for you, in effect disallowing the = NULL syntax, I recommend that you write your applications using only IS NULL. If you do have situations in which you might pass NULLs as parameters to a procedure, you have another solution besides the multiple SELECTs described above for every condition of NULL and NOT NULL parameter values. Stored procedures keep track of what the ANSI_NULLS setting was at the time the procedure was created. You can open a Query Analyzer session and explicitly SET ANSI_NULLS OFF. Then you can create all your procedures that might otherwise require separate SELECTs to check for NULL parameters and NOT NULL parameters. Once created, those procedures can be run in any connection, whether ANSI_NULLS is ON or OFF, and they will always be able to compare a column to a parameter, whether the parameter is NULL or not.



Inside Microsoft SQL Server 2000
Inside Microsoft SQL Server 2000
ISBN: 0735609985
EAN: 2147483647
Year: 2005
Pages: 179
Authors: Kalen Delaney

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net