NOT NULL | SQL Performance Tuning

The NULL "value" is a concept required by relational theory's data integrity rules. That is, relational theory recognizes that, in some situations, a datum is not available, not known, or inapplicable. NULL is used to represent all three of these cases. It's important to remember that NULL does not equate to a blank or a zero: it's something else entirely. Though a blank is equal to another blank and a zero is equal to another zero, a NULL is never equal to anything, not even another NULL.

NOT NULL constraints are used to enforce a rule that a column must always contain valid and applicable datain short, that a column may not contain NULL. In the SQL Standard, this syntax:

 ... <column>...NOT NULL

is just a shorthand form for:

 ... CHECK (<column> IS NOT NULL)

That is, NOT NULL constraints are just CHECK constraints according to the SQL Standard. With most DBMSs, though, NOT NULL constraints are effectively quite different from ordinary CHECK constraints in these ways:

NOT NULL constraints are usually unnamed and can't be dropped.
Defining a column as NOT NULL has an effect on the way that the column values are stored.
NOT NULL constraints are sometimes compulsory.

NOT NULL affects storage because it is impossible to store a NULL value in the column itself. For instance, if a column is defined as SMALLINT, then it may contain any negative or positive number that fits in 16 bitsbut there is no room for a reserved 16-bit value that could mean NULL. Therefore the DBMS must have a flag outside the column itself that says either "this is NULL" or "this is not NULL." In Microsoft, this flag is part of a bit list at the start of the row. In IBM and most other DBMSs, the flag is a byte at the start of the column. Either way, a column that can contain NULL takes more space to store than does a NOT NULL column.

A column that can contain NULL also takes slightly longer to retrieve, because you must always add a check for a NULL indicator to your program before processing the value itself. The severest effect is seen with Sybase, because Sybase silently changes fixed- size columns (such as CHAR) into variable-length columns (such as VARCHAR) if they can contain NULL. The result is that with unindexed columns, most DBMSs handle columns that can contain NULL slower than NOT NULL columns. On the other hand, with indexed columns, Oracle handles nullable columns faster.

Some DBMSs have one or more gotchas that make columns that can contain NULL unwieldy. For example:

They make the assumption that all column definitions imply NOT NULL.
They require that primary keys be explicitly declared NOT NULL. ^[1]

^[1] Early versions of MySQL required that all columns in any index must be defined as NOT NULL. This defect does not exist in MySQL version 3.23.
They refuse to allow multiple occurrences of NULL when a column is defined as UNIQUE. Some DBMSs have conflicting rules in this area because they treat a UNIQUE constraint differently from a UNIQUE INDEX key. Of the Big Eight, only IBM does this. With IBM, a UNIQUE constraint column must be defined as NOT NULL and therefore disallows any NULL values. A column that is part of a UNIQUE INDEX key, though, is allowed one NULL value.
They do not allow NOT NULL constraints to be dropped. This is a tricky area because, although the SQL Standard doesn't distinguish between NOT NULL in a column definition and a NOT NULL constraint, some DBMSs do. The difference lies in distinguishing between NOT NULL as a column attribute that cannot be changed and CHECK (<column> IS NOT NULL) as a constraint that can be dropped. The SQL Standard considers the two to be the same. That is, with SQL:1999, these two definitions both result in a named constraint that can be dropped:
```
 Case #1: CREATE TABLE Table1 (    column1 INTEGER NOT NULL) Case #2: CREATE TABLE Table1 (    column1 INTEGER,    CHECK (column1 IS NOT NULL)) 
```
IBM, Ingres, and InterBase consider Case #1 and Case #2 to be different. For these DBMSs, the first CREATE TABLE statement results in a column (column1) with a NOT NULL attribute, and this attribute may not be changeda clear violation of the SQL Standard. These DBMSs also support the second CREATE TABLE statement, which results in a named constraint that can be droppedas the SQL Standard requires. Informix, Microsoft, Oracle, and Sybase correctly conform to the SQL Standard for both Case #1 and Case #2. MySQL is SQL Standard-compliant for Case #1that is, MySQL lets you use the ALTER TABLE statement to change a nullable column to NOT NULL and vice versa. However, MySQL doesn't support named CHECK constraints and so is not SQL Standard-compliant for Case #2.

All four of these gotchas are flat-out violations of SQL:1999, but we can't do anything about that. Until these "rogue DBMSs" conform to the SQL Standard, portable programs should contain NOT NULL declarations in all or most column definitions. Table 10-1 shows which of the Big Eight treat NULL in a nonconformant fashion.

Notes on Table 10-1:

Auto NOT NULL column

This column is "Yes" if the DBMS assumes that column1 in this table definition is a NOT NULL column rather than a "NULL allowed" column as required by the SQL Standard:
```
 CREATE TABLE Table1 (    column1 INTEGER) 
```

Table 10-1. ANSI/DBMSs and NULL Gotchas

	Auto NOT NULL	Drop NULL	Multiple NULL	Force NOT NULL PKEY Column	Force NOT NULL PKEY Table
ANSI SQL	No	Yes	Many	No	No
IBM	No	No	One	Yes	Yes
Informix	No	Yes	One	No	No
Ingres	No	No	Zero	Yes	Yes
InterBase	No	No	Zero	Yes	Yes
Microsoft	Yes	Yes	One	No	No
MySQL	No	Yes	Many	No	Yes
Oracle	No	Yes	Many	No	No
Sybase	Yes	Yes	One	No	Yes

Drop NULL column

This column is "No" if the DBMS doesn't allow you to drop a NOT NULL constraint as required by the SQL Standard.
Multiple NULL column

This column is "Zero" if the DBMS doesn't allow any NULLs in a UNIQUE column; "One" if the DBMS allows only one NULL in a UNIQUE column; and "Many" if the DBMS allows multiple NULLs to be inserted into a UNIQUE column as required by the SQL Standard. (Note: By "UNIQUE column," we mean a column that is part of a UNIQUE constraint or a column that is part of a UNIQUE INDEX key. We do not mean a column that is part of a PRIMARY KEY constraint, though such columns are also constrained to contain only unique values. The difference between the two is that the SQL Standard allows UNIQUE columns to contain NULLs but disallows any NULLs in PRIMARY KEYs.)
Force NOT NULL PKEY Column column

This column is "Yes" if the DBMS requires an explicit NOT NULL constraint when you define a PRIMARY KEY column constraint. That is, the DBMS requires this definition:
```
 CREATE TABLE Table1 (    column1 INTEGER NOT NULL PRIMARY KEY) 
```
rather than this definition as required by the SQL Standard: ^[2]

^[2] We should note that Entry Level SQL-92the lowest conformance levelrequires the explicit NOT NULL definition we're deploring here and in the next paragraph. Because each of the DBMSs that force this syntax declares its SQL compliance level to be Entry Level SQL-92, it is not in violation of the outdated Standard. But SQL-92 was superseded in 1999, and Core SQL:1999 specifies that NOT NULL is to be implied because non-nullability is inherent in the definition of a primary key just as uniqueness is.
```
 CREATE TABLE Table1 (    column1 INTEGER PRIMARY KEY) 
```
Force NOT NULL PKEY Table column

This column is "Yes" if the DBMS requires an explicit NOT NULL constraint when you define a PRIMARY KEY table constraint. That is, the DBMS requires this definition:
```
 CREATE TABLE Table1 (    column1 INTEGER NOT NULL,    CONSTRAINT Constraint1 PRIMARY KEY (column1)) 
```
rather than this definition as required by the SQL Standard:
```
 CREATE TABLE Table1 (    column1 INTEGER,    CONSTRAINT Constraint1 PRIMARY KEY (column1)) 
```

NULL Indicators

Even if you define every column in every table as NOT NULL, you still need to add NULL indicator checks to your programs because there's still a possibility that NULLs might be returned. For example, these three SQL statements could return NULLs under some circumstances:

 SELECT MAX(column1) FROM Table1 SELECT Table1.column1, Table2.column1    FROM Table1 LEFT JOIN Table2         ON Table1.column1 = Table2.column1 SELECT SUM(column1) FROM Table1    WHERE column2 > 100 /* returns NULL if no column2 value is greater than 100 */

The Bottom Line: NOT NULL Constraints

To speed things up, define columns and domains with NOT NULL as often as possible.

Even if your definitions always include NOT NULL, make sure your program allows for the possibility of retrieved NULLs. Add NULL indicator checks for every OUTER JOIN retrieval and wherever it's possible that a query might be searching an empty table.