17.1 What is Oracle Partitioning?

 < Day Day Up > 



Oracle Partitioning is the splitting of data sets usually into separate physical files using separate partition tablespaces. However, partitions can actually co-exist in the same physical file. Separating partitions into separate datafiles is not essential with certain types of RAID array architectures. Separate partitions are accessible in parallel, individually or in groups. Why is partitioning relevant to tuning? Partitioning can be used to break large tables into smaller subsets. Processing of smaller subsets of data separately and in parallel is potentially much faster than serial processing on very large data sets. Partitioning and parallelism generally only applies to very large databases on high end multiple CPU server platforms. Figure 17.1 shows a picture of how partitioning can be used to split a physical datafile containing a table into multiple physical datafiles mapped individually to separate partitions within that table.


Figure 17.1: Physical Partitioning

17.1.1 Why is Oracle Partitioning Beneficial?

Oracle Partitioning is beneficial usually in very large database environments and sometimes in smaller databases, depending on applications, for the following reasons:

  • Parallel processing by way of splitting of data into separately located physical files. Separating table and index datafiles onto separate disks in a nonpartitioned database is a form of partitioning. Since table and index physical spaces are often read in parallel it is highly beneficial to read their data from separate locations. Partitioning can enhance the benefits of splitting of datafiles.

  • Different partitions within the same table can have different physical storage structures. Different partitions within a table can even be both read-write and read-only, perhaps allowing separation of archived and current data.

  • Backup, recovery, and utility usage (SQL*Loader, Export, and Import) can utilize parallel processing and activities on individual partitions.

  • The Optimizer can access individual partitions when processing SQL code. This process is termed partition pruning since partitions can potentially be removed from the database read required by a query. Additionally the Optimizer can execute against multiple partitions using parallel processing, generally on high end multiple CPU server platforms, with datafiles spread across multiple disks.

    Tip 

    Rows can also be accessed by partition name effectively allowing manual partition pruning.

  • Parallel processing with multiple CPUs and Oracle Partitioning applies to both DML and DDL commands.

17.1.2 How are Tables and Indexes Partitioned?

Partitions can be created on single or multiple columns of a table. A table can be divided up into separate partitions based on three methods. These methods are ranges of values, values in lists, and hashing algorithms on columns. Additionally partitions can be one of two composites of the subsequently mentioned partitioning methods.

A partition is divided on the basis of what is called a Partition Key. This key is internal to Oracle Database, effectively the data definition splitting table rows into separate partitions. For example, a range partition on a table could have rows separated into different partitions based on a date for each row, perhaps dividing financial data into quarters.

What are the specifics of the different partitioning methods?

  • Range Partition.   Splits rows in a table based on ranges of values, such as splitting a table of transactions into periods or quarters derived from a transaction date.

  • List Partition.   Splits rows based on lists of values, dividing rows into separate partitions based on matches between list entries and row column values.

  • Hash Partition.   A hashing algorithm separates rows based on a column specification into a specified number of separate partitions.

  • Composite Partition.   Partitions can contain subpartitions of two types. Firstly, a range partition can contain multiple hash subpartitions and secondly, a range partition can contain multiple list subpartitions.

Partitions can have indexes. How are indexes built for partitions? There are two types of partitioning indexes.

  • Local Index.   These indexes have the same structure as their relative table partitions. Local indexes are preferred due to more automated maintenance.

  • Global Index.   These indexes are created on partitioned tables but are not the same structure as the partitioning key.

    Note 

     Oracle Database 10 Grid   There is a new type of global partition index called a hash partitioned global index. It allows for an even spread of index values.

Let's now examine and experiment with different partitioning methods in my Accounts schema database. The machine used is once again my rather geriatric dual 300 MHz PII CPU box. Additionally this machine has different drives for separating different physical partition datafiles.

17.1.3 Oracle Partitioning Methods

I will use the Accounts schema to examine the usefulness of the different partitioning methods.

Partitioning by Range

The GeneralLedger table is most appropriately partitioned by date range. Partitioning by date may not make any difference to this table with respect to row inserts since rows are only added in the current period and thus the most recent partition. However, it could help performance for creating financial statements such as balance sheets. Since accounting reports of this nature are reported by date range periods it is more than apt to divide such a table based on a date range. Following is shown the division of data in the GeneralLedger table. These numbers are not exactly correct because the financial year in the Accounts data starts on March 1st, but for the purposes of this example this level of accuracy is good enough.

SELECT TO_CHAR(dte,'YYYY') "Year"       ,COUNT(TO_CHAR(dte,'YYYY')) "Entries" FROM generalledger GROUP BY TO_CHAR(dte,'YYYY') ORDER BY 1;     Year    Entries ----    ------- 1999     195888 2000     208624 2001     213660 2002     134568

New partitions can have separate tablespaces. How big is the GeneralLedger table?

COL Object FORMAT a24; COL Type FORMAT a5; SELECT segment_name "Object", segment_type "Type" , ROUND(SUM(bytes)/1024/1024) "Mb" , SUM(blocks) "Blocks" FROM dba_extents WHERE owner = 'ACCOUNTS' AND segment_name = 'GENERALLEDGER' GROUP BY segment_name, segment_type;

The following query result shows that the GeneralLedger table is sized at approximately 26 Mb.

Object           Type   Mb   Blocks -------------   -----   --   ------ GENERALLEDGER   TABLE   26     3328

Data in the GeneralLedger table covers 4 years. There are 200,000 rows for each of the 4 years from 1999 through to 2002. I will create four partitions of 6 Mb each. That should be enough space to contain the rows in each partition.

Tip 

The 6 Mb partition datafiles extended when rows were added because appropriate physical storage structure was not set for each separate partition. Partitions can have independent physical storage parameters. Physical storage tuning is covered in other chapters. This chapter focuses on partitioning.

I create four index partition tablespaces and grant unlimited quotas on all the tablespaces to the Accounts schema Oracle Database user. The years 2000, 2001, and 2002 are all on separate disk drives. Some scripts are included in Appendix A. Range partitions are created like this:

CREATE TABLE GLP (        generalledger_id    NUMBER NOT NULL       ,coa#                CHAR(5) NOT NULL       ,dr                  NUMBER(10,2) NOT NULL       ,cr                  NUMBER(10,2) NOT NULL       ,dte                 DATE NOT NULL       ,CONSTRAINT XPK_GLP PRIMARY KEY (generalledger_id)       ,CONSTRAINT FK_GLP_COA# FOREIGN KEY (coa#)           REFERENCES COA ) TABLESPACE indx PARTITION BY RANGE(dte)(     PARTITION DATAGLP1999 VALUES LESS THAN       (TO_DATE('2000-03-01','YYYY-MM-DD')) TABLESPACE         DATAGLP1999 ,PARTITION DATAGLP2000 VALUES LESS THAN       (TO_DATE('2001-03-01','YYYY-MM-DD')) TABLESPACE        DATAGLP2000 ,PARTITION DATAGLP2001 VALUES LESS THAN       (TO_DATE('2002-03-01','YYYY-MM-DD')) TABLESPACE         DATAGLP2001 ,PARTITION DATAGLP2002 VALUES LESS THAN       (MAXVALUE) TABLESPACE DATAGLP2002);     CREATE INDEX FKX_GLP_1 ON GLP(coa#) TABLESPACE indx; CREATE INDEX LK_GLP_1 ON GLP (dte, coa#) LOCAL(        PARTITION INDXGLP1999 TABLESPACE INDXGLP1999       ,PARTITION INDXGLP2000 TABLESPACE INDXGLP2000       ,PARTITION INDXGLP2001 TABLESPACE INDXGLP2001       ,PARTITION INDXGLP2002 TABLESPACE INDXGLP2002);

Now add the data from the GeneralLedger table into the partitioned table called GLP.

INSERT INTO glp SELECT * FROM generalledger; COMMIT; ANALYZE TABLE glp COMPUTE STATISTICS; ANALYZE INDEX fkx_glp_1 COMPUTE STATISTICS; ANALYZE INDEX lk_glp_1 COMPUTE STATISTICS

Now let's try some queries. This first query retrieves all rows from the nonpartitioned GeneralLedger table as a query to measure against.

EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM     generalledger;     Query                       Cost     Rows      Byte -------------------------   ----   ------   ------- 1. SELECT STATEMENT on 778 752740 1957124 2. TABLE ACCESS FULL on       GENERALLEDGER 778 752740 1957124

The next query reads only the 1999 financial year partition using partition pruning, indicated by the drastic decrease in the number of rows read. The cost is therefore much lower. The pruning out of unwanted partitions allows only the 1999 partition to be read.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT * FROM glp WHERE dte < TO_DATE('2000-01-          01','YYYY-MM-DD'); 

Note the values 1 and 1 at the tail of the Query column for the second line of the query plan. These columns are the PARTITION_START and PARTITION_STOP columns in the PLAN_TABLE, denoting the first and the last partitions read. In this case only the first partition was read.

Query                               Cost   Rows     Bytes ----------------------------        ----   ------   ------- 1. SELECT STATEMENT on               203   195888   4505424 2.  TABLE ACCESS FULL on GLP  1  1   203   195888   4505424

This next query reads even fewer rows and the cost is lower still, reading only the partition for the financial year 2002. Note once again the two numbers tailing the second line of the query plan Query column; only the fourth partition is read to satisfy the query.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT * FROM glp       WHERE dte BETWEEN TO_DATE('2002-03-01','YYYY-MM-DD')       AND TO_DATE('2003-02-28','YYYY-MM-DD');     Query                               Cost   Rows     Bytes ----------------------------        ----   ------   ------- 1. SELECT STATEMENT on               143   115657   2660111 2.  TABLE ACCESS FULL on GLP  4  4   143   115657   2660111

This proves the facts about pruning of partitions where SQL code accesses partitions containing only required data.

Tip 

Rows can also be accessed by partition name effectively allowing manual partition pruning.

Partitioning by List

In the Accounts schema the Transactions table could be split into two partitions based on the type column. The Transactions table contains both sales invoices sent to customers and purchase invoices sent from suppliers. The type column contains the value "S" for a sale and "P" for a purchase. The size of the Transactions table is about 10 Mb. The ratio of purchases to sales is 80%. This dictates the size of the tablespaces for the two partitions. Let's create the partitioned version of the Transactions table. This time I will ignore Referential Integrity and indexes, simply creating the partitioned table. We are not trying to prove anything about indexing. We are merely trying to show that partitioning by itself can increase performance dramatically, especially for full table scans and if used in appropriate places.

CREATE TABLE tsp(        transaction_id   NUMBER NOT NULL       ,type             CHAR(1) NOT NULL       ,customer_id      NUMBER NULL       ,supplier_id      NUMBER NULL       ,order_id         NUMBER NULL       ,amount           NUMBER(10,2) NOT NULL       ,dte              DATE NOT NULL       ,drcoa#           CHAR(5) NOT NULL       ,crcoa#           CHAR(5) NOT NULL) PARTITION BY LIST (type)(        PARTITION purchases VALUES ('P') TABLESPACE DATATSP       ,PARTITION sales VALUES ('S') TABLESPACE DATATSS);

Add the rows to the partition table.

INSERT INTO tsp SELECT * FROM transactions; COMMIT; ANALYZE TABLE tsp COMPUTE STATISTICS;

Once again retrieve data from the nonpartitioned table as a measure of performance.

EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM    transactions; Query                                   Cost   Rows     Bytes -------------------------------------   ----   ------   ------- 1. SELECT STATEMENT on                   297   188185   7903770 2.  TABLE ACCESS FULL on TRANSACTIONS    297   188185   7903770 

Now get all rows from the partitioned table. Fewer bytes are read with the same number of rows but the cost is very slightly higher, perhaps due to the second step in the query plan. The entire table with all partitions is being read anyway so there is no reason why reading a partitioned version of the table should be faster.

EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM tsp;     Query                                Cost     Rows   Bytes ----------------------------------   ----   ------   ------- 1. SELECT STATEMENT on                298   188185   6774660 2.  PARTITION LIST ALL on 1 2 3.   TABLE ACCESS FULL on TSP 1 2     298   188185   6774660 

Now let's go one step further and read all the rows from the partition, reading both partitions in parallel. The parallel read is suggested to the Optimizer by using the PARALLEL hint. The two separate partitions are stored in two separate tablespaces and the two datafiles mapped by the tablespace are stored on two separate disk drives. Both processors and both separate disks are assisting the performance of parallel execution. The cost is lower even though the number of rows is the same. There are only two partitions for this partitioned by list table, the PARTITION_START and PARTITION_STOP values tailing the Query column show both partitions read.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT /*+ PARALLEL(tsp, 2) */ * FROM tsp;     Query                           Cost   Rows -----------------------------   ----   ------- 1 SELECT STATEMENT on            149    188185 2  PARTITION LIST ALL on PARALLEL_      COMBINED_WITH_PARENT 1 2 3   TABLE ACCESS FULL on TSP       PARALLEL_TO_SERIAL 1 2      149   188185 

Now read only the sales transactions.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT * FROM tsp WHERE type = 'S';

In this case only the second partition is read since sales transactions occupy the second partition. Consequently the cost is much lower and fewer rows are read.

Query                               Cost   Rows Bytes ---------------------------------   ----   ------   ------- 1. SELECT STATEMENT on                54    33142   1193112 2.  TABLE ACCESS FULL on TSP 2 2      54    33142   1193112 

And now read the sales transactions and suggest parallel execution on both processors using the PARALLEL hint once again. The cost is again lower and the same number of rows is accessed.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT /*+ PARALLEL(tsp, 2) */ * FROM tsp WHERE           type = 'S';     Query Cost Rows ---------------------------   ----   ----- 1. SELECT STATEMENT on          27   33142 2. TABLE ACCESS FULL on  TSP PARALLEL_TO_SERIAL 2 2      27   33142 

Hash Partitions

A hash partition can be used when range or list partitioning is inappropriate. Most of the large tables in the Accounts schema have a potential range or list partition column in the form of a date or some kind of type column. Two tables not delimited in this fashion are the OrdersLine and TransactionsLine tables. When demonstrating how to use the Ordersline table firstly create a partitioned table.

CREATE TABLE olp(        order_id                NUMBER NOT NULL       ,seq#                    NUMBER NOT NULL       ,amount                  NUMBER(10,2) NOT NULL       ,stockmovement_id        NUMBER NOT NULL) PARTITION BY HASH(order_id) PARTITIONS 3       STORE IN(dataol1,dataol2,dataol3);

Now add the rows to the partitioned table.

INSERT INTO olp SELECT * FROM ordersline; COMMIT; ANALYZE TABLE olp COMPUTE STATISTICS;

Now query the nonpartitioned table.

EXPLAIN PLAN SET statement_id='TEST' FOR SELECT * FROM        ordersline;     Query                                   Cost   Rows     Bytes -------------------------------------   ----   ------   ------- 1. SELECT STATEMENT on                   412   540827   9194059 2.  TABLE ACCESS FULL on ORDERSLINE      412   540827   9194059 

And now query the partitioned table in parallel. The cost is halved indicating fewer rows and fewer bytes read.

EXPLAIN PLAN SET statement_id='TEST' FOR       SELECT /*+ PARALLEL(olp, 2) */ * FROM olp;     Query                                   Cost   Rows -------------------------------------   ----   ----- 1 SELECT STATEMENT on                    207   40827 2  PARTITION HASH ALL on       PARALLEL_COMBINED_WITH_PARENT 1 3 3   TABLE ACCESS FULL on OLP        PARALLEL_TO_SERIAL 1 3             207   40827 

Composite Partitions

Composite partitions can be of two forms. Those two forms are a range partition containing hash subpartitions or a range partition containing list subpartitions. Various tables in the Accounts schema could be partitioned using a range-hash partition. The Orders, Transactions, GeneralLedger, and StockMovement tables could be range-hash partitioned on the DTE column for the range partition and the primary key identifier column for the contained hash subpartitions.

CREATE TABLE glprh(        generalledger_id NUMBER NOT NULL       ,coa# CHAR(5) NOT NULL       ,dr NUMBER(10,2) NOT NULL       ,cr NUMBER(10,2) NOT NULL       ,dte DATE NOT NULL) PARTITION BY RANGE(dte) SUBPARTITION BY HASH    (generalledger_id)          SUBPARTITIONS 4 STORE IN (sp1, sp2, sp3, sp4) (PARTITION DATAGLP1999 VALUES LESS THAN         (TO_DATE('2000-01-01','YYYY-MM-DD')) TABLESPACE            DATAGLP1999 ,PARTITION DATAGLP2000 VALUES LESS THAN         (TO_DATE('2001-01-01','YYYY-MM-DD')) TABLESPACE            DATAGLP2000 ,PARTITION DATAGLP2001 VALUES LESS THAN         (TO_DATE('2002-01-01','YYYY-MM-DD')) TABLESPACE            DATAGLP2001 ,PARTITION DATAGLP2002 VALUES LESS THAN         (MAXVALUE) TABLESPACE DATAGLP2002); 

The GeneralLedger table in the Accounts schema could be rangelist partitioned on the DTE column for the range and the COA# column for the list subpartition. The Orders and Transactions tables are not functionally appropriate for range-list partitioning on their respective DTE and TYPE columns.

CREATE TABLE glprl(        generalledger_id NUMBER NOT NULL       ,coa# CHAR(5) NOT NULL       ,dr NUMBER(10,2) NOT NULL       ,cr NUMBER(10,2) NOT NULL       ,dte DATE NOT NULL) PARTITION BY RANGE(dte) SUBPARTITION BY LIST(coa#) (PARTITION DATAGLP1999 VALUES LESS THAN  (TO_DATE('2000-01-01','YYYY-MM-DD')) TABLESPACE     DATAGLP1999        (SUBPARTITION assets1999 VALUES('20001','20002')       ,SUBPARTITION liabilities1999 VALUES('10001','10002')       ,SUBPARTITION expenses1999 VALUES('50001','50002')       ,SUBPARTITION incomes1999 VALUES('60001','60002')) ,PARTITION DATAGLP2000 VALUES LESS THAN  (TO_DATE('2001-01-01','YYYY-MM-DD')) TABLESPACE     DATAGLP2000        (SUBPARTITION assets2000 VALUES('20001','20002')       ,SUBPARTITION liabilities2000 VALUES('10001','10002')       ,SUBPARTITION expenses2000 VALUES('50001','50002')       ,SUBPARTITION incomes2000 VALUES('60001','60002')) ,PARTITION DATAGLP2001 VALUES LESS THAN  (TO_DATE('2002-01-01','YYYY-MM-DD')) TABLESPACE     DATAGLP2001        (SUBPARTITION assets2001 VALUES('20001','20002')       ,SUBPARTITION liabilities2001 VALUES('10001','10002')       ,SUBPARTITION expenses2001 VALUES('50001','50002')       ,SUBPARTITION incomes2001 VALUES('60001','60002')) ,PARTITION DATAGLP2002 VALUES LESS THAN  (MAXVALUE) TABLESPACE DATAGLP2002        (SUBPARTITION assets2002 VALUES('20001','20002')       ,SUBPARTITION liabilities2002 VALUES('10001','10002')       ,SUBPARTITION expenses2002 VALUES('50001','50002')       ,SUBPARTITION incomes2002 VALUES('60001','60002')));



 < Day Day Up > 



Oracle High Performance Tuning for 9i and 10g
Oracle High Performance Tuning for 9i and 10g
ISBN: 1555583059
EAN: 2147483647
Year: 2003
Pages: 164

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net