## NumbersThere are three kinds of numbers: integers, floats (approximate numbers), and fixed-decimal numbers. (Technically, integers are a type of fixed-decimal number with a scale of zero, but storage methods for integers and decimal numbers differ so we'll talk about them separately.) There is also a number type that auto-incrementswe'll call this a serial. All DBMSs except Oracle use similar numeric formats, so in our discussion of numeric data types, we'll talk about the "normal" DBMSs first and relegate Oracle's treatment of numbers to a sidebar. ## IntegersThe SQL Standard provides two data types for columns that contain integers: INTEGER (or INT) and SMALLINT. Table 7-5 shows the SQL Standard requirements and the level of support (data type and size in bytes) the Big Eight have for these data types. ## Table 7-5. ANSI/DBMS Integer Support
Notes on Table 7-5: -
INTEGER column A 32-bit binary (four bytes) allows for the range 2147483648 to +2147483647. ANSI and Informix disallow the value 2147483648 (hexadecimal 80000000). -
SMALLINT column A 16-bit binary (two bytes) allows for the range 32768 to +32767. ANSI and Informix disallow the value 32768 (hexadecimal 8000). -
BIGINT/LONGINT column A 64-bit binary (eight bytes) allows for the range 223372036854775808 to +9223372036854775807. Informix calls this data type INT8. -
TINYINT/INTEGER1 column An 8-bit binary (one byte) allows for the range 128 to +127. Microsoft and Sybase support unsigned TINYINT only, with a range from 0 to +255. MySQL supports both signed TINYINT (range 128 to +127) and unsigned TINYINT (range 0 to +255). -
Oracle accepts columns defined as INTEGER or SMALLINT but treats all numbers differently from other DBMSs; see the sidebar "Oracle Numbers."
Any number that has a scale of zero (no digits after the decimal point) should be in an integer column, because integer arithmetic is faster than arithmetic with a decimal point. In fact, the fastest arithmetic occurs with the specific SQL data type INTEGER because on a 32-bit machine (such as a Pentium), 32 bits is the native word size. Thus, for example: CREATE TABLE Table1 ( column1 SMALLINT) INSERT INTO Table1 ... SELECT column1 * 1234 FROM Table1 is slower than: CREATE TABLE Table1 ( column1 INTEGER) INSERT INTO Table1 ... SELECT column1 * 1234 FROM Table1 GAIN: 5/8 although the difference is unnoticeable unless millions of calculations happen. The INTEGER data type is also the default data type of any literal that contains all digits. Consider the expression: ... WHERE column1 = 555 If The other integer data types have special advantages though: -
SMALLINT is the only other SQL Standard data type and is just half the size of INTEGER. -
TINYINT is the smallest and is still large enough for personal information, but beware: Even seemingly restricted data like "number of wives" or "height in centimeters" could cause overflow. (Solomon had 700 wives, and Goliath was six cubits and a span, or about 290 centimeters tall.) -
BIGINT is appropriate for numbers larger than two billion, though working with compilers that can't handle such large numbers could be troublesome .
Here's another consideration: At the beginning of this chapter, we said it's important to choose a data type that won't allow impossible values for a column. For example, if you have a list of mileages between world cities, using SMALLINT (and perhaps UNSIGNED if the DBMS allows it) gives you a free check constraintit should be impossible to insert the absurd value
If input errors are a serious worry, only an explicit CHECK constraint will do the job right. If you decide to define Why are telephone numbers stored in CHAR(12) columns instead of BIGINT columns? The answer is that, although a CHAR(12) column is longer, there are built-in functions for all the operations that are likely to be performed on telephone numbers: LIKE, SUBSTRING, SIMILAR, and so on. Meanwhile, all the built-in numeric operators (+ / *) are useless for telephone numbers. The general rule in such cases is that a column should have a non-numeric data type if all appropriate operations for it are non-numeric. A similar generality can be stated for the temporal data types. Recommendation: Prefer INTEGER for integers unless maximum valuesincluding those from arithmetic operationsexceed the INTEGER range. ## FloatsThe SQL Standard provides three data types for columns that contain floats: REAL, FLOAT, and DOUBLE PRECISION (or DOUBLE). Table 7-6 shows the SQL Standard requirements and the level of support (data type and precision) the Big Eight have for these data types. Notes on Table 7-6: -
The letters IEEE mean "according to the IEEE 754 Standard for Binary Floating-Point Arithmetic." Informix and PostgreSQL use "the native C float," which coincidentally corresponds to IEEE 754 for any common C compiler. -
IEEE single storage is 32 bits (four bytes). It allows for seven-digit precision, with a usual range from 3.402E+38 to 1.175E37 for negative numbers, zero, and from +1.175E-37 to +3.402E+38 for positive numbers. -
REAL and FLOAT(n)where `n <= 23`are usually synonymous and refer to a 32-bit floating-point number, IEEE single precision. InterBase, though, allows you to define a column with a specific precision for FLOAT, for example:CREATE TABLE Table1 ( column1 FLOAT(20)) but ignores the precision specified. InterBase `FLOAT(n)`is always 64-bit IEEE double precision even though FLOAT alone is always 32-bit IEEE single precision. -
IEEE double storage is 64 bits (eight bytes). It allows for 15-digit precision, with a usual range from 1.798E+308 to 2.225E-307 for negative numbers, zero, and from +2.225E-307 to +1.798E+308 for positive numbers. -
DOUBLE PRECISION and FLOAT(n)where `n BETWEEN 24 AND 53`are usually synonymous and refer to a 64-bit floating-point number, IEEE double precision. For MySQL, REAL is synonymous with DOUBLE, rather than with FLOAT. -
Oracle accepts columns defined as REAL, FLOAT, or DOUBLE PRECISION, but treats all numbers differently from other DBMSs; see the sidebar "Oracle Numbers."
## Table 7-6. ANSI/DBMS Float Support
The official SQL term for FLOAT, REAL, and DOUBLE PRECISION values is "approximate numeric" but "floating point" or simply "float" is common. The key point for all such values is that the decimal point is floating. (If the decimal point is fixed, see the next section.) You should use float literals when working with floats. For example, use this type of expression: UPDATE Table1 SET float_column = 1.25E02 instead of: UPDATE Table1 SET float_column = 125 Portability MySQL won't accept a float literal unless it has a two-digit exponent. That is, the literal 1.25E02 is acceptable, but 1.25E2 is not. All other DBMSs allow you to drop the leading zero. Floating-point operations are fast if they go through the computer's Floating Point Unit (FPU), but a compiler can make the cautious assumption that no FPU is present. In that case, floating-point operations are slow because they are emulated instead of performed with the FPU. When you install a DBMS, the installer should detect the FPU automatically and bring in the right code, whether FPU dependent or emulated, so make sure you rerun the install program after hardware upgrades or moves. Take another look at Table 7-6 and the precisions shown for the float data types. The range of IEEE single-precision float is from 1.175E-37 to +3.402E+38, to 7 decimal digits precision, although some DBMSs are more cautious in stating the actual range supported. The range of IEEE double-precision float is from 2.225E-307 to +1.798E+308, to 15 decimal digits precision. Again, some DBMSs give a slightly smaller range. These sizes are shown in Table 7-7. Table 7-7 shows that the range of a single-precision float is 1.175E-37 to +3.402E+38. In reality it isn't possible to store all the real numbers in that range in a four-byte space; it isn't even possible to store all the integers in that range in four bytes (the range of a four-byte INTEGER is from 2.14E9 to +2.14E9). So for most numbers in the single-precision range, you'll need to use whatever number is closest that can be represented in a single float. In other words, a floating-point number is exact in bit combinationsthat is, all bit combinations are exactbut it might not be exactly the same as the number that was inserted originally. Hence the name approximate. The question that arises from this isIs it better to use DOUBLE PRECISION for float columns or REAL? (We'll ignore FLOAT entirely because REAL and DOUBLE are just synonyms for predefined sizes of FLOAT.) We checked the Big Eight to see what happens if the same number is stored both ways. First, we created this table: CREATE TABLE Table1 ( real_column REAL, double_column DOUBLE PRECISION) ## Table 7-7. IEEE 754 Floats
Then we inserted the same number into both columns and selected: INSERT INTO Table1 VALUES (0.01, 0.01) SELECT * FROM Table1 WHERE real_column = 0.01 /* result is zero rows, "incorrect" */ SELECT * FROM Table1 WHERE double_column = 0.01 /* result is one row, "correct" */ Most DBMSs returned the "correct" result for the second SELECT, but were unable to find a row where Recommendation: Prefer DOUBLE PRECISION for floats. ## DecimalsThe SQL Standard provides two data types for columns that contain fixed-decimal numbers: DECIMAL and NUMERIC. Table 7-8 shows the SQL Standard requirements and the level of support (data type and maximum precision in digits) the Big Eight have for these data types. ## Table 7-8. ANSI/DBMS Decimal Support
Notes on Table 7-8: -
IBM, Informix, Ingres, Microsoft, Oracle, and Sybase store decimal values as packed decimal values; see the sidebar "Oracle Numbers." -
InterBase stores decimal values inside SMALLINTs, INTEGERs, BIGINTs, or FLOATs, depending on the defined size. The longest string of 9s that could fit in a BIGINT is 999999999999999999 (eighteen 9s) so DECIMAL(18) is the largest precise fixed-decimal number that such a trick can accommodate. -
MySQL stores decimal values as unpacked floating-point numbersthat is, MySQL decimals are stored as strings, using one character for each digit of the value, plus another character for the sign and decimal point if applicable . The maximum number of digits that can be specified is 254, but the actual range for both DECIMAL and NUMERIC is the same as for a double-precision float (see DOUBLE PRECISION in Table 7-6).
A fixed-decimal number is one that has a fixed-decimal precision. For example, DECIMAL(7,2) has a precision of seven digits, with a fixed scalethe number of digits after the decimal pointof two digits. If the data type of a column is DECIMAL, then the actual precision must be at least as big as the defined precision. If the data type of a column is NUMERIC, then the actual precision should be exactly the same as the defined precision, but in practice most DBMSs treat NUMERIC as a synonym for DECIMAL. Any numeric literal can take three forms, as shown in Table 7-9. Notice that Table 7-9 shows that the number All DBMSs except for InterBase and MySQL store DECIMAL values as a packed string of decimal digits, usually with two decimal digits per byte; see the sidebar "Oracle Numbers." InterBase, as mentioned earlier, stores decimal values inside integers, while MySQL stores them as unpacked strings. ## Table 7-9. Forms of a Numeric Literal
The primary advantage of DECIMAL is that it is easy to cast to CHAR, because the number isn't stored in binary form. The primary disadvantage is that DECIMAL values must be converted to binary form before some arithmetic operations can be performed on them. In general the conversion requires only your patiencethe DBMS handles multiple-digit arithmetic slowly, but it gets there. One exception exists, though: MySQL avoids the slowness of multidigit arithmetic by converting to floats when one of the operands is decimal. So this work around is primarily of interest to MySQL users: -
Define two INTEGER columns instead of one DECIMAL column. For example, instead of Definition #1, use Definition #2: Definition #1: CREATE TABLE Table1 ( column1 DECIMAL(7,5), ...) Definition #2: CREATE TABLE Table1 ( column1_pre_decimal_point INTEGER, column1_post_decimal_point INTEGER, ...)
To add 7.35 to such a divided number, add 35 to We still live in an era where the questionShould I use floats for decimals?makes some sense, but the era is fast approaching its end. The fact is that most sensible bean-count values (i.e., money) can be stored and manipulated as integers now, and with 64-bit processors the values are trillions. The only reason to use floats for dollars is that some host languages still have no equivalent for DECIMAL. It is notable that all the DBMSs that support a MONEY data type store MONEY internally as a DECIMAL with a predefined scale. Recommendation: Prefer DECIMAL for fixed-decimal numbers and for most floats.
## SerialsThe SQL:1999 Standard does not provide a data type for columns that contain "serial" numbers but most DBMSs do provide support for auto-incremented, or monotonic, values. Table 7-10 shows the level of support the Big Eight have for serial data types. ## Table 7-10. ANSI/DBMS Serial Support
Notes on Table 7-10 -
Data Type column What is the serial attribute and/or data type? This column is "N/A" if no serial data type is supported and otherwise shows the name of the serial data type and necessary serial attribute, if required. For example, Informix supports serial data on columns defined with the SERIAL or SERIAL8 data types, while Sybase supports serial data on columns defined as NUMERIC(10,0) IDENTITY. -
Sequence Generator column What is the sequence generator? This column is "N/A" if serial data is supported via the serial attribute and/or data type shown in the Data Type column and otherwise shows the name of the function or statement the DBMS uses to generate sequences. For example, Oracle's CREATE SEQUENCE statement creates a SEQUENCE object. The next version of the SQL Standard is expected to support a SEQUENCE object as well.
The various "serial" data types are usually 4-byte integers, with values assigned by the DBMS. The idea is that the DBMS will increment the value for each row you INSERT. Serial data types are useful for ensuring that each row of a table has a unique identifier. For example, you could create a table with Informix's nonstandard SQL-extension serial data type: CREATE TABLE Table1 ( column1 SERIAL PRIMARY KEY, column2 INTEGER) Because the value of The problem with serials is that they can cause trouble with concurrency control; see Chapter 15, "Locks." Recommendation: Prefer INTEGER for serials and do your own value assignments. ## The Bottom Line: NumbersAll DBMSs except Oracle use similar numeric formats, so look for our Oracle-specific recommendations only if you use Oracle. Any number with a scale of zero (no digits after the decimal point) should be in an INTEGER column. INTEGER is the default data type of any literal that contains all digits. SMALLINT is the only other SQL Standard data type and is usually just half the size of INTEGER. TINYINT is the smallest integer data type and is still large enough for most personal information. BIGINT is appropriate for numbers larger than two billion though working with compilers that can't handle such large numbers could be problematic . Floating-point operations are fast if they go through the computer's FPU. When you install a DBMS, the installer should detect the FPU automatically and bring in the right code, whether FPU dependent or emulated, so make sure you rerun the install program after hardware upgrades. The primary advantage of DECIMAL is that it is easy to cast to CHAR, because the number isn't stored in binary form. The primary disadvantage of DECIMAL is that it must be converted to binary form before some arithmetic operations can be performed on it. Serial data types are useful for ensuring each row of a table has a unique identifier. The problem with serials is that they can cause trouble with concurrency control. Recommendation if you don't use Oracle: Prefer INTEGER for integers unless maximum valuesincluding those from arithmetic operationsexceed the INTEGER range. Prefer DECIMAL for fixed-decimal numbers and for most floats. Prefer DOUBLE PRECISION for floats where DECIMAL isn't appropriate. Prefer INTEGER for serials and do your own value assignments. Recommendation if you use Oracle: Data type doesn't really matter because all numbers are stored the same. Avoid multiply and divide because the packed decimal numbers will have to be converted before they can be operated on. Allow free space on each page because rows are all variable-length. Don't worry about floating-point storage because it's not there. |

SQL Performance Tuning

ISBN: 0201791692

EAN: 2147483647

EAN: 2147483647

Year: 2005

Pages: 125

Pages: 125

Authors: Peter Gulutzan, Trudy Pelzer

Similar book on Amazon

flylib.com © 2008-2017.

If you may any questions please contact us: flylib@qtcs.net

If you may any questions please contact us: flylib@qtcs.net