23.2 Building Failsafe Applications | PHP and PostgreSQL. Advanced Web Programming2002

After this brief theoretical overview about high availability, it is time to see what you can do to make your PHP-and PostgreSQL-based applications work more reliably.

When building reliable solutions, a few things cannot be influenced by the programmer. The first thing is the stability of the software packages you are building your applications on. If there are bugs in PHP's core distribution, you cannot do anything about it except use the latest packages and patches. In addition, we recommend using "old" functions that have been unchanged during the last few releases. These functions will most likely be more reliable than new functions.

However, you can't totally forecast or avoid most problems with software that you haven't written. Just try to use well-known components. What can be influenced is the code you are writing, and this is exactly what we are going to deal with in this section.

In real-world IT environments, it is just not enough when an application does what it is supposed to do. The reason for that is simple. Every user is a potential threat because a user might behave the wrong way, so errors can occur. Just think of a content management system. Several people have spent days adding content to the database. Then an accidental click removes all records from the database and you can do nothing about it. This is a horror scenario, but things like that happen every day, so a lot of money is spent for recovery purposes.

In addition to downtime caused by hardware or software failures, problems caused by users can harm your applications. This can even lead to a loss of data, which cannot be repaired. Such damage is quite dangerous and very bad for your business. An additional point is that it can be useful to monitor the people working on your system.

When designing your applications properly, there is no need to allow such disasters to harm your IT environment. A possible approach to solving problems like that would be to work with rules and various flags. Do you think that all the things I have mentioned are possible with just a few lines of code? Well, I guess we have to prove it.

23.2.1 A Simple Example

The goal of this example is to implement a simple shopping application that is capable of tracking everything happening inside the database. When building up logging information, it can be useful to see which transaction has done what. Therefore it is necessary to retrieve the current transaction id. Up to now (version 7.2), there is no official function for retrieving the transaction id. Therefore it is necessary to implement a function like that quickly. Maybe we will see this feature in future versions of PostgreSQL. However, because it is truly simple to implement this feature on your own, we have decided to include it. Take a look at the following piece of C code:

 #include "postgres.h" #include "access/xact.h" Datum getxid(PG_FUNCTION_ARGS) {         TransactionId xid = GetCurrentTransactionId();         PG_RETURN_INT32((int32) xid); }

As you can see, it is hardly more than calling the GetCurrentTransactionId function and returning the result as a variable. In this case we have chosen int32. To compile the code and to generate a shared object out of it, you can use GCC as shown in the next listing (this time we haven't made a makefile because the code is just four lines long):

 gcc -Wall -fPIC -O2 -I/usr/src/postgresql-7.2/src/include  \         -c -o getxid.o getxid.c gcc -shared -Wl,-soname,/home/hs/datatype/transaction/libgetxid.so \         -o /home/hs/datatype/transaction/libgetxid.so getxid.o

Just modify the paths in the code to your needs and compile the sources.

Now it is time to add the new function to your database. This can easily be done with the help of some SQL code:

 CREATE FUNCTION getxid()         RETURNS int4         AS '/home/hs/datatype/transaction/libgetxid.so'         LANGUAGE 'C';

If no error occurred, a function called getxid will be available, which returns the id of the current transaction. Let's see how it works:

 phpbook=# SELECT getxid();  getxid --------    1946 (1 row)

In this example 1946 is returned. Keep in mind that the result returned on your machine will most likely be different. Let's use the function inside a transaction:

 phpbook=# BEGIN; BEGIN phpbook=# SELECT getxid();  getxid --------    1947 (1 row) phpbook=# SELECT getxid();  getxid --------    1947 (1 row) phpbook=# COMMIT; COMMIT

As you can see, the same transaction id is returned twice because you are inside a transaction. In this way the transaction id can be used to identify all queries belonging to one transaction.

After the transaction is committed, the transaction id will be different because the SELECT statement is executed as one transaction:

 phpbook=# SELECT getxid();  getxid --------    1948 (1 row)

After implementing the fundamentals, it is time to have a look at the data structure of the shop. It consists of just two tables plus one parent table because this is enough to show you how to build a failsafe application:

 CREATE TABLE parent (         id serial8,                     -- id of every record in the database         transid int4                 DEFAULT getxid(),       -- transaction id         valid boolean DEFAULT 't',      -- record valid/yes-no         tstamp timestamp with time zone                 DEFAULT now()           -- modification time ); -- tables containing data CREATE TABLE product (         prod_id int4,                   -- id of product         name text,                      -- name of product         price numeric(9,2)             -- price of product ) INHERITS (parent); CREATE TABLE sales (         prod_id int4,                   -- id of product         amount int4,                    -- number of products sold         price numeric(9,2),             -- price         customer text,                  -- customer who has bought product         sold timestamp with time zone   -- time when product was sold ) INHERITS (parent);

The parent table consists of four columns that contain logging information. In the first column, an id will be used to number every record added to the table. Every record will have a unique id, so it is possible to find out in which order the records have been added to the table even in the case of concurrent transactions. The second column, labeled transid, will contain the transaction id. The default value is set to the current transaction id. However, the transaction id can also be inserted explicitly. In this example a column is used to store the transaction id. This is not the only way to solve the problem, but it is an obvious one. All tables contain a hidden column called xmin. This column contains the transaction id and can be retrieved like the object id or another hidden column (for example, SELECT xmin FROM table). However, if you use the onboard transaction id, you will not be as flexible as when working with your "own" transaction id. In this example we will use a separate column to do the job.

The third column shows whether a record is still valid. Finally, the fourth column tells us the time when a record has been added to the table. Keep in mind that now() returns the current transaction time and not the time the INSERT statement has been sent to the server. This is essential because it is the most important information.

Let's take a look at the table containing the data of the shop. The first table contains the unique id of a product as well as the name and the price of a product. The second table contains a list of all sales. It stores the product id, the number of units bought, the price, which customer has bought the product, and when it was sold. Both tables, product and sales, inherit all columns from the parent table:

 phpbook=# \d product                                      Table "product"  Attribute |           Type           |                     Modifier -----------+--------------------------+----------------------------- ----------------------  id        | bigint                   | not null default nextval ('"parent_id_seq"'::text)  transid   | integer                  | default getxid()  valid     | boolean                  | default 't'  tstamp    | timestamp with time zone | default now()  prod_id   | integer                  | not null  name      | text                     |  price     | numeric(9,2)             | phpbook=# \d sales                                       Table "sales"  Attribute |           Type           |                     Modifier -----------+--------------------------+----------------------------- ----------------------  id        | bigint                   | not null default nextval ('"parent_id_seq"'::text)  transid   | integer                  | default getxid()  valid     | boolean                  | default 't'  tstamp    | timestamp with time zone | default now()  prod_id   | integer                  |  amount    | integer                  |  price     | numeric(9,2)             |  customer  | text                     |  sold      | timestamp with time zone |

As you can see in the listing, the columns in the parent table have been added to the two tables automatically.

In the next step you can insert some products into the table:

 INSERT INTO product (prod_id, name, price) VALUES         (504532, 'cell phone', '45.39'); BEGIN; INSERT INTO product (prod_id, name, price)         VALUES (0815, 'MySQL tutorial', '1.15'); INSERT INTO product (prod_id, name, price)         VALUES (0816, 'Hamburger', '2.09'); INSERT INTO product (prod_id, name, price)         VALUES (866787, 'CPU', '199.79'); COMMIT;

Two transactions are processed. The first one inserted exactly one record into the table. The second transaction added three records. Let's see how to send the data to the server:

 [hs@duron code_buch]$ psql phpbook < add_prods.sql INSERT 49546 1 BEGIN INSERT 49547 1 INSERT 49548 1 INSERT 49549 1 COMMIT

No errors occurred. The table can easily be queried. Let's see what was inserted:

 phpbook=# SELECT id, transid, valid, tstamp::time with time zone, prod_id, name, price FROM product;  id | transid | valid |       tstamp       | prod_id |      name      | price ----+---------+-------+--------------------+---------+----------------+--------   1 |    2018 | t     | 12:47:46.713663+01 |  504532 | cell phone     |  45.39   2 |    2019 | t     | 12:47:46.836869+01 |     815 | MySQL tutorial |   1.15   3 |    2019 | t     | 12:47:46.836869+01 |     816 | Hamburger      |   2.09   4 |    2019 | t     | 12:47:46.836869+01 |  866787 | CPU            | 199.79 (4 rows)

The first few columns have been set automatically. As you can see, the last three records have the same timestamp and the same transaction id. This way you can easily identify what has happened inside your application.

For monitoring the users running applications on your system, it can also be useful to add a column for storing the name of the user to the parent table:

 phpbook=# ALTER TABLE parent ADD COLUMN dbuser text; ALTER phpbook=# \d parent                                       Table "parent"  Attribute |           Type           |                     Modifier -----------+--------------------------+----------------------------- ----------------------  id        | bigint                   | not null default nextval ('"parent_id_seq"'::text)  transid   | integer                  | default getxid()  valid     | boolean                  | default 't'  tstamp    | timestamp with time zone | default now()  dbuser    | text                     | Index: parent_id_key

Now that you have seen how to prepare the database, it is time to say an additional word about high availability.

Some of you might already have asked what all these things have to do with high availability. Well, things are more complicated than they might seem. I could tell you that you should use RAID systems or that several administrators should be around all the time. These things are essential, but they are just one part of the story. Experience has told us that most errors occur outside your software and hardware. Just think of somebody who deleted data accidentally. Think of people who want to harm your applications by adding wrong data to the database. These things can cause corrupt data and will lead to even more downtime than crashing hard disks. If you have RAID systems, these things are no problem, but problems concerning the people working on your application can only be solved by extensive monitoring and a clever design of your database. In real-world applications, this is essential and you should keep an eye on it.

23.2.2 Implementing Total Monitoring

As we have already mentioned, monitoring is the key to success. It is possible to track every customer or employee working with your applications, and you will soon find out where errors occur. In addition, it will help you to increase the efficiency of your applications. In this section we will take a closer look at monitoring. In addition to monitoring, we will try to build some sort of journaling database.

The first thing we want to do is to define a data structure in which records will never be deleted so that you can easily find out which changes have been made by the users working on the database. To do this, you must think about UPDATE and DELETE operations. If somebody performs an UPDATE operation, the old data is deleted. This is definitely not the desired behavior. In the case of DELETE, the old records should be declared invalid. To get the job done, it is necessary to implement a set of rules:

 CREATE RULE rule_product_delete         AS ON DELETE TO product         DO INSTEAD (                 UPDATE parent SET valid='f'                         WHERE id=OLD.id );

The rule you have just seen redefines the behavior of a DELETE statement. Instead of deleting the record, it is set to invalid. Notice the table we use inside the rule. The value in the parent table is set to invalid. Because product inherits all columns from the parent table, the changes will also affect the table called product. Please do not update the original table here because otherwise it would not be possible to redefine UPDATE.

Let's see what happens when running DELETE. In this example the product number 815 is deleted:

 phpbook=# DELETE FROM product WHERE prod_id=815; UPDATE 1

As you can see, an UPDATE operation has been defined. Let's see if any changes have been made inside the table:

 phpbook=# SELECT id, valid, prod_id FROM product;  id | valid | prod_id ----+-------+---------   1 | t     |  504532   3 | t     |     816   4 | t     |  866787   2 | f     |     815 (4 rows)

The product number 815 has been set to invalid. It has not been deleted, and this is the most important thing.

In the next step you can write a rule for redefining UPDATE statements:

 CREATE RULE rule_product_update         AS ON UPDATE TO product         DO INSTEAD (                 UPDATE parent SET valid='f'                         WHERE id=OLD.id;                 INSERT INTO product (prod_id, name, price)                          VALUES (NEW.prod_id, NEW.name, NEW.price) );

This rule is slightly more complex. First an UPDATE is performed to set the old version of the record to invalid. Now the new version is added to the table. This way the old data is saved and the new version is added to the table. To access the new data, we have to select all valid records. If you want to access previous versions of the data, that is also possible. With the help of this algorithm, no employee will ever be able to delete your data accidentally or on purpose. With the help of the parent table, the administrator can remove or modify data from the table to reduce the amount of data in the database or to restore data. No rule has been defined on the parent table, so you can interact with the table without having to think about redefined versions of UPDATE or DELETE. We want to point out one crucial point again. If the UPDATE statement in the second rule was using the product table instead of the parent table, the rule would be called recursively or at least twice, and this wouldn't be the desired behavior.

Before testing the rule, take a closer look at the code of the rule. This is essential because it is important to know how the code works. The two SQL statements in parentheses are executed instead of the original UPDATE query. OLD contains the values before the UPDATE operation. NEW contains the new values of the rows. These values will be set by the UPDATE query. In this case all records are set to false where the id before the UPDATE was the id in the column. The INSERT statement inserts the new values into the tables. The columns can easily be accessed by using dots.

Now that you have seen how the rule works, you can start using it. Let's use it in combination with a transaction:

 phpbook=# BEGIN; BEGIN phpbook=# UPDATE product SET price='1.99' WHERE name='Hamburger'; INSERT 49599 1 phpbook=# SELECT id, valid, prod_id, name, price FROM product;  id | valid | prod_id |      name      | price ----+-------+---------+----------------+--------   1 | t     |  504532 | cell phone     |  45.39   2 | t     |     815 | MySQL tutorial |   1.15   4 | t     |  866787 | CPU            | 199.79   3 | f     |     816 | Hamburger      |   2.09   5 | t     |     816 | Hamburger      |   1.99 (5 rows) phpbook=# COMMIT; COMMIT

The UPDATE query performs an INSERT operation. The result of the SELECT statement shows exactly what has happened inside your database. Two records containing Hamburger are in the listing. One of these is invalid, so the new record is the only valid record. The next query shows the recent version of the data:

 phpbook=# SELECT id, valid, prod_id, name, price FROM product WHERE valid='t';  id | valid | prod_id |      name      | price ----+-------+---------+----------------+--------   1 | t     |  504532 | cell phone     |  45.39   2 | t     |     815 | MySQL tutorial |   1.15   4 | t     |  866787 | CPU            | 199.79   5 | t     |     816 | Hamburger      |   1.99 (4 rows)

Of course, it is possible to define a view that only returns the recent version of the data, but this can also be overkill. It depends on whether you'd like to have a view or not.

In this section you have seen one approach to increasing the reliability of your systems, and you have seen how your data can be protected without investing any money. With the help of redundant design, many common errors can be solved, which is as important as redundant hardware.