11.4 Handling Huge Amounts of Data


If you have to work with more than just a few records, using INSERT is far too slow. As you have already learned, PostgreSQL provides a command called COPY, which you can use to insert more than just one record into a table at once.

In this section the PHP functions related to the COPY command will be discussed. You will also learn about the performance impact of using COPY.

Using COPY works slightly differently than working with a set of INSERT commands. Here's the next example:

 <?php         $connstr = "dbname=phpbook user=postgres host=localhost";         $dbh = pg_connect($connstr);         if ($dbh) {echo "connection to phpbook established ...<br>";}         echo "starting process ...<br>\n";         echo "begin: ".date('Y-m-d H:i:s')."<br>\n";         $result = pg_exec($dbh, "COPY testdata FROM stdin");         for     ($i = 0; $i < 40000; $i++)         {                 $stat = pg_put_line($dbh, "$i\t".(cos($i))."\n");                 if      (!$stat)                 {                         echo "an error has occurred<br>\n";                 }         }         pg_put_line($dbh, "\\.\n");         pg_end_copy($dbh);         echo "end: ".date('Y-m-d H:i:s')."<br>\n";         pg_close($dbh);         echo "ending process ...<br>\n"; ?> 

Before running the script you have to create a table like that:

 CREATE TABLE testdata (idx int4, val numeric(9,8)); 

After connecting to the database, the head of the COPY command is sent to the server. After that, PostgreSQL is waiting for data until \. can be found. Every line of data is submitted to the database by using the pg_put_line command. The table you want to add data to consists of two columns. The first column contains a number and the second column contains the cosine of that number. After the data has been sent to PostgreSQL, the COPY command is terminated by using pg_end_copy. To see how fast the COPY command works, the current time is displayed on the screen as well:

 connection to phpbook established ... starting process ... begin: 2001-12-12 21:57:34 end: 2001-12-12 21:57:38 ending process ... 

The entire COPY command takes not longer than four seconds, which is fast (the demo has been run on a 466Mhz laptop). Because the entire set of data is added to the table in just one transaction, there is hardly any overhead.

Let's delete the records from the table:

 phpbook=# DELETE FROM testdata; DELETE 40000 

To understand why the COPY command should be used, the next listing shows how the same result can be achieved by using INSERT commands:

 <?php         $connstr = "dbname=phpbook user=postgres host=localhost";         $dbh = pg_connect($connstr);         if ($dbh) {echo "connection to phpbook established ...<br>";}         echo "starting process ...<br>\n";         echo "begin: ".date('Y-m-d H:i:s')."<br>\n";         for     ($i = 0; $i < 40000; $i++)         {                 $stat = pg_exec($dbh, "INSERT INTO testdata VALUES                         ($i, '".(cos($i))."')");                 if      (!$stat)                 {                         echo "an error has occurred<br>\n";                 }         }         echo "end: ".date('Y-m-d H:i:s')."<br>\n";         pg_close($dbh);         echo "ending process ...<br>\n"; ?> 

Remember, using COPY took about four seconds to insert 40,000 records. The next listing shows how long it takes when INSERT is used:

 connection to phpbook established ... starting process ... begin: 2001-12-12 22:00:03 end: 2001-12-12 22:13:20 ending process ... 

It takes more than 13 minutes, which is significantly slower. This impressive example shows how much overhead you can save with the help of COPY.

The more data you have to process, the more performance you can gain. Technically, pg_put_line does nothing more than send a NULL terminated string to the backend process handling the connection to PostgreSQL. After sending the data to the database, the backend and the frontend process have to be synchronized, which can be done by using pg_end_copy.



PHP and PostgreSQL. Advanced Web Programming2002
PHP and PostgreSQL. Advanced Web Programming2002
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 201

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net