Using PostgreSQL with Microsoft Office | PostgreSQL Developer's Handbook2001

For many years , Microsoft Office has played a major role in desktops. Millions of people around the globe have been using Microsoft's Office package for word processing and table calculations, as well as for building small database applications. Microsoft Office is the commonly used standard, and nearly all other vendors of office suites have included some sort of bridges to MS Office, such as data converters.

Because MS Office is so important in today's IT environments, it seems logical to have a connection from MS Office to host running a PostgreSQL database. In Windows environments, remote databases are usually accessed by using ODBC, similar to the one in the previous section. ODBC is a standard protocol and is supported by all major database platforms.

This chapter is based on MS Office 2000.

An Example Database

Let's get to some real action. We have compiled an example database that we will use in this section. Here is the SQL code:

 CREATE TABLE supplier(      "id" int4,      "name" text,      "contact" text ); CREATE TABLE products(      "productid" int8,      "name" text,      "unit" int4,      "amount" numeric(9,2),      "supplier" int4,      "price" numeric (9,2) CHECK (price > 0),      "currency" char(3) NOT NULL ); CREATE TABLE unit(      "id" serial,      "description" text ); CREATE TABLE sales(      "id" serial,      "when" timestamp DEFAULT now(),      "prodid" int8,      "who" text NOT NULL,      "amount" numeric(9,2) CHECK (amount > 0),      "price" numeric(9,2) CHECK (price > 0) ); COPY "supplier" FROM stdin; 1    Sams    Indianapolis 2    Cybertec        Vienna 3    Acompany        AnywhereCity \ . COPY "products" FROM stdin; 453465       PostgreSQL developers handbook 1      3243   1      33.99   USD 324324       Python developers handbook     1      2892   1      33.99   USD 324323       Commercial PostgreSQL support  2      0      2      80      EUR \ . COPY "unit" FROM stdin; 1    piece 2    hour \ .

Table supplier contains information about the vendors of our products, such as the name and the contact information. In table products , we store information about our products. The column called unit defines which category a product belongs to. Books are usually sold in pieces; in our example, support activities are sold in hours. products also includes the name of the vendor of a product. We store the id of the vendor so that the table supplier can be referenced easily. As you can see in the definition of the table, we make sure that the price of a product is higher than zero. This CHECK condition helps us make sure that people cannot insert trash into the database. Another important restriction is the constraint set for the currency column. We make sure that the user inserts a currency into the table, because the price column would be of no use when there is no currency defined.

The sales table will be used to store information about the products we sold. If no correct date and time is passed to the database, the result of the now() function will be used as the default value. We also make sure that who , amount , and price will contain a suitable value, because we want to know which person has bought how many products and how much these products have cost. If you are planning to build a shop database, make sure that a field that stores the price can be found in the sales table; otherwise , you might easily run into trouble when you want to change the price of a product. Don't take the price stored in the products table!

After defining the required data structure, we insert a few records into our tables.

Working with MS Access

MS Access is a database system distributed with Microsoft Office and is often used to build small databases that are not accessed by remote machines. Marketing people often say that MS Access is capable of handling network connections, but this is not 100% true, because MS Access is not a real database server like PostgreSQL. It can also be quite painful to torture MS Access with gigabytes of data, because there are some significant restrictions in MS Access that don't allow the user to insert such huge amounts of data into the database. If you have to process huge amounts of data, you might have to split a database into multiple sub-databases to handle the data. An array of sub-databases won't be easy to handle, but with the help of VBA macros, it would be possible to compute a reasonable result. In general, we do not recommend using MS Access for storing huge amounts of data, because there is much better software available for such purposes.

Now we want to connect MS Access to our PostgreSQL database. Therefore, we start MS Access and open a new database (see Figure 19.8).

Figure 19.8. Adding a new database.

graphics/19fig08.gif

To access an ODBC data source, we have already defined it. Now we have to define a link to all tables that can be found in our data source. Linking to a table means that we include it in our application, and every change we make is directly sent to the database. To create a link to a table, click File at the top of your MS Access window and select Get External Data. There you can click Link Table to include a table into your application (see Figure 19.9).

Figure 19.9. Select a data source.

graphics/19fig09.gif

First we have to select the data source that contains the table we want to include. You can see in the figure that quite of lot of data sources are available, because they are predefined. We select PostgreSQL and click the OK button (see Figure 19.10).

Figure 19.10. Selecting the tables for linking.

graphics/19fig10.gif

Now a list of all tables available in the data source will be displayed. We select all of them and click OK (see Figure 19.11).

Figure 19.11. Linking table by table.

graphics/19fig11.gif

Every table included in our system must have a unique identifier. This is necessary so that Access can distinguish the records. In PostgreSQL, every record has an internal identifier, which is not displayed in the table automatically. The situation is different in Access, so we recommend adding a field called id that contains a serial, which is always unique and can therefore be used as a primary key by access. For table products, we use productid as the primary key, because a product id must always be unique. After defining the primary key of table products, Access asks for the primary key of the next table and so on. Now that we have included all tables, we can see a complete list of all links available on the screen (see Figure 19.12).

Figure 19.12. Our database contains four tables.

graphics/19fig12.gif

In the next step we want to define the relation model for our tables (see Figure 19.13). Therefore, MS Access offers a powerful and easy-to-use interface. In general, we have nothing else to do than connect the tables with the help of a simple mouse click. Some questions have to be answered about the way the tables have to be joined, but that's all we have to do. Let's have a look at the definition of the relation first.

Figure 19.13. Defining the relation.

graphics/19fig13.gif

As you can see in the figure, we want to connect the column productid in table products with column prodid in table sales. At the bottom of the window, you can see that the default type of relation is One-To-Many, which means that one record in table products can have multiple counterparts in table sales. In other words, one product can be sold more than once.

In addition to the relation, we have to define the way the two tables have to be joined. Therefore, we click Join Type, which connects us to a window where we can select one out of three ways to join the table (see Figure 19.14).

Figure 19.14. Select how you want to join the tables.

graphics/19fig14.gif

We define all relations we need for our data model.

Note

We have already defined all tables and all integrity constraints on PostgreSQL, but these settings are not imported into MS Access. Integrity constraints on PostgreSQL will only have an effect on the whole matter when inserting or deleting data.

We recommend you define the relations in MS Access as they have been defined in the PostgreSQL database (see Figure 19.15).

Figure 19.15. The data model.

graphics/19fig15.gif

After we link all tables and define all relations, we have a look at the table products. Three records are already in the table. Now we want to insert a record: We insert an additional record for the book Advanced Unix programming to the table. If the price is higher than zero, the records can be added without any problems.

Let's have a look at the content of the table after inserting the record. We use psql to see whether the record has been added to the table:

 myodbc=#  SELECT * FROM products;  productid               name              unit  amount   supplier  price  currency ----------+--------------------------------+------+---------+----------+------- +----------    453465  PostgreSQL developers handbook    1   3243.00        1   33.99  USD    324324  Python developers handbook        1   2892.00        1   33.99  USD    324323  Commercial PostgreSQL support     2      0.00        2   80.00  EUR    678373  Advanced Unix programming         1   2678.00        1   49.99  USD (4 rows)

Four ”not three ”records can be found in the table now. The new record is added to the table immediately after we press Enter. There are no open transactions in the database system.

Let's look at table sales, which is currently empty. In the definition of the data structure, column id is defined as serial. Serial means that if no value is inserted into the table, the value of the serial is used as the value of the column. Inserting into a column that has been defined as serial can be dangerous, because the values in the column might not be unique anymore. The second column in our example has the result of the now() function as the default value.

Now we want to add two records to the table, but we leave the first two fields empty. Surprisingly, MS Access completes the records automatically. In the first column, the next value in the sequence is inserted. If the second column is left blank, the result of the now() function will be added to the records automatically. This is a very comfortable feature, because it enables you to set server-side default values.

As you can see from the following, the values are also inserted into the database automatically:

 myodbc=#  SELECT * FROM sales;  id           when           prodid    who     amount    price ----+------------------------+--------+----------+---------+----------   1  2001-07-13 10:21:50+02  453465  Reseller  1000.00  33990.00   2  2001-07-13 10:26:26+02  799999  reseller  1000.00  57890.00 (2 rows)

If many people are working on one table simultaneously , it might be necessary to refresh the content of the table from time to time. MS Access does not offer a button for that purpose, so we have to open and close the table explicitly to get a refreshed version of the content.

Another step, which is commonly taken when working with databases, is to drop and create tables. Removing a table that is stored in a PostgreSQL database means removing the link; This does not mean that the table is removed from the PostgreSQL database.

It is possible to create and drop tables directly in the PostgreSQL database and not via MS Access.

Working with MS Excel

Data stored in a PostgreSQL can also be included in MS Excel. As you learned in the last section, the data is imported via an ODBC connection.

To import data into MS Excel, select Data Get External Data New Database Query (see Figure 19.16).

Figure 19.16. Importing data into MS Excel.

graphics/19fig16.gif

In the next step, we define the columns we want to import from the PostgreSQL database.

After selecting the data, we get to the final window, where we have to define what to do with the data. We select Return Data to Microsoft Excel and click Finish. Now the data has been imported into MS Excel.

Note

We have not created a link. We have imported the data, which means that no changes in the data will affect the data stored in PostgreSQL.

If the data has been imported successfully, the data can be found in the spreadsheet (see Figure 19.17).

Figure 19.17. The data has been imported.

graphics/19fig17.gif

If we want to edit the data, we have to use Microsoft Query, which is distributed with the MS Office package (see Figure 19.18).

Figure 19.18. Editing data with MS Query.

graphics/19fig18.gif

MS Query offers an easy way to view and modify data. Inserting data that can be modified into Excel is usually done by writing a macro package and a user interface.

In this section you have learned that it is possible to connect MS Excel to PostgreSQL, but it is not the best way to interact with the database.

Working with MS Word

One of the most widespread applications in Microsoft environments is MS Word, a powerful tool for word processing.

In many cases, it is necessary to generate database-driven documents. Like all other office components , MS Word enables you to import data via ODBC. If we want to modify the data in our PostgreSQL database instead of making a simple import, we have to use MS Query or VBA macros instead ”as you learned in the last section.

Let's create a simple, database-driven document.

To import the data, we can use the taskbar dedicated for working with databases (see Figure 19.19).

Figure 19.19. Using the taskbar to import the data.

graphics/19fig19.gif

We have to select the data source and the columns as we did in the last section.

After importing the data, we use AutoFormat, as shown in Figure 19.20.

Figure 19.20. Formatting the table.

graphics/19fig20.gif

We click through the AutoFormat menu and define all settings for the table. Finally the table is ready (see Figure 19.21).

Figure 19.21. A formatted table.

graphics/19fig21.gif

Using PostgreSQL and MS Office with Huge Amounts of Data

As you learned in Chapter 1, "About PostgreSQL," PostgreSQL is capable of storing gigabytes of data reliably. You can also link huge tables with MS Access or import it to any other office product as well. In our tests, we created a random table with three columns containing 10 million records (about a 200MB ASCII file). In spite of various attempts to link or import the data into an MS Office 2000 product, we did not manage to achieve our target, because all Office products collapse.

Interaction with the database worked well with very small amounts of data, but it was not possible to link or import a "real" database.