14.4 Testbed performance analysis testing

< Free Open Study >

A true comparison of the four databases requires a plain benchmark that does not take advantage of any of the special features within any of the databases. In order to do this our team researched the latest benchmarks provided by the Transaction Processing Council (TPC-www.tpc.org). Three benchmarks were found that would allow us to test OLTP. These were TPC-C, TPC-H, and TPC-R. Of the three, TCP-C version 5 is designed as the latest OLTP benchmark. However, the TPC has not yet made the benchmark available for public use. As such, TPC-H and TPC-R were looked at. Both of these benchmarks are for decision-support databases in data warehousing. It was discovered that TPC-H is a revised version of TPC-R. The only difference between the two benchmarks is the implementation rules.

It was the decision of the performance evaluation team, due to the limited amount of time the group would have to learn the benchmarks, learn the databases, and do any real analysis, that it would be best to use TPC-H. This decision was made, since we would have less options to worry about during the implementation of the benchmark. This benchmark consists of a suite of business-oriented ad hoc queries and concurrent data modifications. Its main purpose is to help examine large volumes of data and execute queries with a high degree of complexity.

The next task was to determine what type of workloads the databases would run and how these loads would be run.

14.4.1 Workloads

The key concern in the benchmarking of a system is the specification of the workload. The workload of a computer is defined as the set of all inputs the system receives from its environment. The groups used the queries defined in the TPC-H benchmark (Table 14.4) as the basic workload.

Table 14.4: TPC-H Benchmark
Query 1-Pricing Summary Report	This query will select a pricing summary report for all line items shipped as of a given date (substitution variable). The date is within 6 to 120 days of the greatest ship date contained in the database. A count of the number of line items is included in each group.
Query 2-Minimum Cost Supplier	This query will find, in a given region for each part of a certain type and size, the supplier that can supply it at the lowest cost. If multiple suppliers in that region offer the same lowest price for the part, the query will list the parts from the suppliers with the 100 highest account balances.
Query 3-Shipping Priority	This query will determine the shipping priority and potential revenue, defined as the sum of the extended price of the orders having the largest revenue among those that had not been shipped as of a given date. If more than ten unshipped orders exist, only the ten orders with the largest revenue are listed.
Query 4-Order Priority Checking	This query will count the number of orders that were ordered in a given quarter of a given year in which at least one line item was received later than its committed date.
Query 5-Local Supplier Volume	This query will list, for each country in a region, the revenue volume that resulted from line item transactions in which the customer ordering parts and the supplier filling them were both in the same country. The query only considers parts ordered in a certain year.
Query 6-Forecasting Revenue Change	This query will quantify the amount of revenue increase that would have resulted from eliminating certain company-wide discounts in a given percentage range in a given year.
Query 7-Volume Shipping	This query will determine the value of goods shipped between certain countries to help in the renegotiation of shipping contracts.
Query 8-National Market Share	This query will determine how the market share of a given country within a given region has changed over two years for a given part type.
Query 9-Product Type Profit Measure	This query determines how much profit is made on a given line of parts, broken out by supplier country and year.
Query 10-Returned Item Reporting	This query identifies customers who might be having problems with the parts that are shipped to them.
Query 11-Important Stock Identification	This query finds the most important subset of suppliers' stock in a given country.
Query 12-Shipping Modes and Order Priority	This query determines whether selecting less expensive modes of shipping is negatively affecting the critical-priority orders by causing more parts to be received by customers after the committed date.
Query 13-Customer Distribution	This query will determine the relationships between customers and the size of their orders.
Query 14-Promotion Effect	This query will find the percentage of revenue in a year from promotional parts (the time period is a substitution parameter selected when creating the query with the QGEN application using the Seed variable).
Query 15-Top Supplier	This query will find the supplier that contributed the most revenue for all parts shipped during a specific time period (the time period is a substitution parameter selected when creating the query with the QGEN application using the Seed variable).
Query 16-Parts/Supplier Relationship	This query will find the count of suppliers that can supply parts that meet particular customer requirements. The brand, type, and product sizes are substitution parameters selected when creating the query with the QGEN application using the Seed variable.
Query 17-Small Quantity/Order Revenue	This query will find line item and part for a given brand and type and determine the average quantity of the parts ordered if the quantity is 20 percent less of the average for a seven-year period (the brand and container are substitution parameters selected when creating the query with the QGEN application using the Seed variable).
Query 18-Large-Volume Customer	This query will find the top 100 customers who have ever placed a large-quantity order (the quantity is the substitution parameter selected when creating the query with the QGEN application using the Seed variable).
Query 19-Discounted Revenue	This query will find the gross discounted revenue for all orders for three different types of parts (the part type, container, quantity, ship mode, and shipping instructions are substitution parameters selected when creating the query with the QGEN application using the Seed variable).
Query 20-Potential Part Promotion	This query will find the suppliers that have an excess of a given part available for a specific year (the part name and date are the substitution parameters selected when creating the query with the QGEN application using the Seed variable).
Query 21-Suppliers That Kept Orders Waiting	This query will find the suppliers, for a given country, whose product was part of a multiple supplier order where they failed to meet the committed delivery date (the country is a substitution parameter selected when creating the query with the QGEN application using the Seed variable).
Query 22-Global Sales Opportunity	This query will find the customers within a specific set of country codes who have not placed orders for seven years but still have a positive balance (the country codes are substitution parameters selected when creating the query with the QGEN application using the Seed variable).

14.4.2 Preparing for the testing

In order to ensure that the testing was standard, one of the performance tests in the TPC-H benchmark was chosen and modified. The planned modifications were the insertion of refreshes, as required by the TPC-H specifications, and the use of indexing. Thus, two runs would be done: one with no indexing and refreshes, and one with indexing and refreshes. Refreshes are required by the TPC-H specification, but the locations of these refreshes in the queries are left to the tester. To ensure that all databases ran the queries in the same order, performance test #1 (Appendix A of TPC-H Benchmark) was used with three predetermined refreshes.

No indexing

The nonindexing run was used as a baseline with which to compare an indexing run. The group knew that indexing would greatly decrease the time taken to complete the performance test but desired a quantitative result. At the time of this writing, no team had successfully completed a nonindexing run on any of the database systems. The procedures attempted are discussed later in the chapter. The primary reason for not completing a nonindexed test was lack of time. It was possible to complete an individual test of each of the queries in the performance test with no refreshes. These results are defined in greater detail in section 14.5.

Indexing

As can be deduced from the previous section, if nonindexing was not completed neither was indexing. Representatives from each of the teams got together, however, to determine what should have been indexed. Their work is presented in Table 14.5 to provide future testers a hint as to what can be done next.

Table 14.5: Proposed Indexes for Benchmark Tests
Foreign Keys CREATE INDEX tpch.c_nk ON tpch.customer(c_nationkey ASC) CREATE INDEX tpch.s_nk ON tpch.supplier(s_nationkey ASC) CREATE INDEX tpch.ps_pk ON tpch.partsupp(ps_suppkey ASC) CREATE INDEX tpch.ps_sk ON tpch.partsupp(ps_suppkey ASC) CREATE INDEX tpch.1_ok ON tpch.lineitem(1_orderkey ASC)
Primary Keys CREATE UNIQUE INDEX tpch.c_ck ON tpch.customer(c_custkey ASC) CREATE UNIQUE INDEX tpch.p_pk ON tpch.part(p_partkey ASC) CREATE UNIQUE INDEX tpch.s_sk ON tpch.supplier(s_suppkey ASC) CREATE UNIQUE INDEX tpch.o_ok ON tpch.orderd(o_orderkey ASC) CREATE UNIQUE INDEX tpch.ps_pk_sk ON tpch.partsupp(ps_partkey ASC, ps_suppkey ASC) CREATE UNIQUE INDEX tpch.ps_sk_pk ON tpch.partsupp(ps_suppkey ASC, ps_partkey ASC)
Useful Date Fields CREATE INDEX tpch.o_od ON tpch.orders(o_orderdate ASC) CREATE INDEX tpch.1_sd ON tpch.lineitem(1_shipdate ASC)

Running all queries together

A Persistent Stored Modules (PSM) Committee was tasked to create a file that would run all queries using the TPC-H order, defined in the standard in Appendix A, for all the teams. This committee needed to determine how to keep track of the time each query ran, the total time, and how the refreshes would be handled. Unfortunately, the only team that was able to use the file was the Microsoft SQL Server team. The other teams would have had to modify the file tremendously in order for it to work on their databases. Due to a lack of time, the decision was made not to do this and instead run the queries individually.

14.4.3 Testbed procedures for each configuration

Four basic procedures were needed to run the benchmark on each separate configuration. First, the creation of the database and the database tables for the databases was needed. Second, the newly created tables were populated with the benchmark test data. Third, several sample runs on the individual queries were done to ensure that the systems were running properly and providing each team a way of optimizing the system prior to the test. Finally, the performance tests were run on each system.

< Free Open Study >