Join Considerations

Team-Fly    

 
Sams Teach Yourself SQL in 24 Hours, Third Edition
By Ronald R. Plew, Ryan K. Stephens
Table of Contents
Hour 13.  Joining Tables in Queries


Several things should be considered before using joins. Some considerations include what columns (s) to join on, whether there is no common column to join on, and performance issues. More joins in a query means the database server has to do more work, which means that more time is taken to retrieve data. Joins cannot be avoided when retrieving data from a normalized database, but it is imperative to ensure that joins are performed correctly from a logical standpoint. Incorrect joins can result in serious performance degradation and inaccurate query results. Performance issues are discussed in more detail in Hour 18, "Managing Database Users."

Using a BASE TABLE

What to join on? Should you have the need to retrieve data from two tables that do not have a common column to join, you must use another table that has a common column or columns to both tables to join on. That table becomes the BASE TABLE. A BASE TABLE is used to join one or more tables that have common columns, or to join tables that do not have common columns. Use the following three tables for an example of a base table:

CUSTOMER_TBL

 

CUST_ID

VARCHAR(10)

NOT NULL

PRIMARY KEY

CUST_NAME

VARCHAR(30)

NOT NULL

 

CUST_ADDRESS

VARCHAR(20)

NOT NULL

 

CUST_CITY

VARCHAR(15)

NOT NULL

 

CUST_STATE

VARCHAR(2)

NOT NULL

 

CUST_ZIP

INTEGER(5)

NOT NULL

 

CUST_PHONE

INYEGER(10)

   

CUST_FAX

INTEGER(10)

   

ORDERS_TBL

 

ORD_NUM

VARCHAR(10)

NOT NULL

PRIMARY KEY

CUST_ID

VARCHAR(10)

NOT NULL

 

PROD_ID

VARCHAR(10)

NOT NULL

 

QTY

INTEGER(6)

NOT NULL

 

ORD_DATE

DATETIME

   

PRODUCTS_TBL

 

PROD_ID

VARCHAR(10)

NOT NULL

PRIMARY KEY

PROD_DESC

VARCHAR(40)

NOT NULL

 

COST

DECIMAL(6,2)

NOT NULL

 

You have a need to use the CUSTOMERS_TBL and the PRODUCTS_TBL. There is no common column in which to join the tables. Now look at the ORDERS_TBL. The ORDERS_TBL has CUST_ID to join with CUSTOMERS_TBL, which also has CUST_ID. The PRODUCTS_TBL has PROD_ID, which is also in ORDERS_TBL. The JOIN conditions and results look like the following:

 graphics/input_icon.gif  SELECT C.CUST_NAME, P.PROD_DESC   FROM CUSTOMER_TBL C,   PRODUCTS_TBL P,   ORDERS_TBL O   WHERE C.CUST_ID = O.CUST_ID   AND P.PROD_ID = O.PROD_ID;  graphics/output_icon.gif CUST_NAME                      PROD_DESC ------------------------------ ----------------------- LESLIE GLEASON                 WITCHES COSTUME SCHYLERS NOVELTIES             PLASTIC PUMPKIN 18 INCH WENDY WOLF                     PLASTIC PUMPKIN 18 INCH GAVINS PLACE                   LIGHTED LANTERNS SCOTTYS MARKET                 FALSE PARAFFIN TEETH ANDYS CANDIES                  KEY CHAIN 6 rows selected. 
graphics/note_icon.gif

Note the use of table aliases and their use on the columns in the WHERE clause.


The Cartesian Product

graphics/newterm_icon.gif

The Cartesian product is a result of a CARTESIAN JOIN or "no join." If you select from two or more tables and do not JOIN the tables, your output is all possible rows from all the tables selected from. If your tables were large, the result could be hundreds of thousands, or even millions, of rows of data. A WHERE clause is highly recommended for SQL statements retrieving data from two or more tables. The Cartesian product is also known as a cross join.

The syntax is

 graphics/syntax_icon.gif FROM TABLE1, TABLE2 [, TABLE3 ] WHERE TABLE1, TABLE2 [, TABLE3 ] 

The following is an example of a cross join, or the dreaded Cartesian product:

 graphics/input_icon.gif  SELECT E.EMP_ID, E.LAST_NAME, P.POSITION   FROM EMPLOYEE_TBL E,   EMPLOYEE_PAY_TBL P;  graphics/output_icon.gif EMP_ID    LAST_NAM POSITION --------- -------- -------------- 311549902 STEPHENS MARKETING 442346889 PLEW     MARKETING 213764555 GLASS    MARKETING 313782439 GLASS    MARKETING 220984332 WALLACE  MARKETING 443679012 SPURGEON MARKETING 311549902 STEPHENS TEAM LEADER 442346889 PLEW     TEAM LEADER 213764555 GLASS    TEAM LEADER 313782439 GLASS    TEAM LEADER 220984332 WALLACE  TEAM LEADER 443679012 SPURGEON TEAM LEADER 311549902 STEPHENS SALES MANAGER 442346889 PLEW     SALES MANAGER 213764555 GLASS    SALES MANAGER 313782439 GLASS    SALES MANAGER 220984332 WALLACE  SALES MANAGER 443679012 SPURGEON SALES MANAGER 311549902 STEPHENS SALESMAN 442346889 PLEW     SALESMAN 213764555 GLASS    SALESMAN 313782439 GLASS    SALESMAN 220984332 WALLACE  SALESMAN 443679012 SPURGEON SALESMAN 311549902 STEPHENS SHIPPER 442346889 PLEW     SHIPPER 213764555 GLASS    SHIPPER 313782439 GLASS    SHIPPER 220984332 WALLACE  SHIPPER 443679012 SPURGEON SHIPPER 311549902 STEPHENS SHIPPER 442346889 PLEW     SHIPPER 213764555 GLASS    SHIPPER 313782439 GLASS    SHIPPER 220984332 WALLACE  SHIPPER 443679012 SPURGEON SHIPPER 36 rows selected. 

Data is being selected from two separate tables, yet no JOIN operation is performed. Because you have not specified how to join rows in the first table with rows in the second table, the database server pairs every row in the first table with every row in the second table. Because each table has 6 rows of data each, the product of 36 rows selected is achieved from 6 rows multiplied by 6 rows.

To fully understand exactly how the Cartesian product is derived, study the following example.

 graphics/input_icon.gif  SQL> SELECT X FROM TABLE1;  graphics/output_icon.gif X - A B C D 4 rows selected. 
 graphics/input_icon.gif  SQL> SELECT V FROM TABLE2;  graphics/output_icon.gif X - A B C D 4 rows selected. 
 graphics/input_icon.gif  SQL> SELECT TABLE1.X, TABLE2.X   2* FROM TABLE1, TABLE2;  graphics/output_icon.gif X X - - A A B A C A D A A B B B C B D B A C B C C C D C A D B D C D D D 16 rows selected. 
graphics/cautions_icon.gif

Be careful to always join all tables in a query. If two tables in a query have not been joined and each table contains 1,000 rows of data, the Cartesian product consists of 1,000 rows multiplied by 1,000 rows, which results in a total of 1,000,000 rows of data returned. Cartesian products, when dealing with large amounts of data, can cause the host computer to stall or crash in some cases based on resource usage on the host computer. Therefore, it is important for the DBA and system administrator to closely monitor for long-running queries.



Team-Fly    
Top
 


Sams Teach Yourself SQL in 24 Hours
Sams Teach Yourself SQL in 24 Hours (5th Edition) (Sams Teach Yourself -- Hours)
ISBN: 0672335417
EAN: 2147483647
Year: 2002
Pages: 275

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net