Exploring Early Database Models

Before the advent of databases, the only way to store data was from unrelated files. Programmers had to go to great lengths to extract the data, and their programs had to perform complex parsing and relating.

Languages such as Perl, with its powerful regular expressions ideal for processing text, have made the job a lot easier than before; however, accessing data from files is still a challenging task. Without a standard way to access data, systems are more prone to errors, are slower to develop, and are more difficult to maintain. Data redundancy (where data is duplicated unnecessarily) and poor data integrity (where data is not changed in all the necessary locations, leading to wrong or outdated data being supplied) are frequent consequences of the file access method of data storage. For these reasons, database management systems (DBMSs) were developed to provide a standard and reliable way to access and update data. They provide an intermediary layer between the application and the data, and the programmer is able to concentrate on developing the application, rather than worrying about data access issues.

A database model is a logical model concerned with how the data is represented. Instead of database designers worrying about the physical storage of the data, the database model allows them to look at a higher, or more conceptual, level, reducing the gap between the real-world problem for which the application is being developed and the technical implementation.

There are a number of database models. First you'll learn about two common models, the hierarchical database model and the network database model. Then you'll investigate the one that MySQL (along with most modern DBMSs) uses, the relational model.

Understanding the Hierarchical Database Model

The earliest model was the hierarchical database model, resembling an upside-down tree. Files are related in a parent-child manner, with each parent capable of relating to more than one child, but each child only being related to one parent. Most of you will be familiar with this kind of structure—it's the way most filesystems work. There is usually a root, or top-level, directory that contains various other directories and files. Each subdirectory can then contain more files and directories, and so on. Each file or directory can only exist in one directory itself—it only has one parent. As you can see in Figure 7.1, A1 is the root directory, and its children are B1 and B2. B1 is a parent to C1, C2, and C3, which in turn has children of its own.

click to expand
Figure 7.1: The hierarchical database model

This model, although being a vast improvement on dealing with unrelated files, has some serious disadvantages. It represents one-to-many relationships well (one parent has many children; for example, one company branch has many employees), but it has problems with many-to-many relationships. Relationships such as that between a product file and an orders file are difficult to implement in a hierarchical model. Specifically, an order can contain many products, and a product can appear in many orders. Also, the hierarchical model is not flexible because adding new relationships can result in wholesale changes to the existing structure, which in turn means all existing applications need to change as well. This is not fun when someone has forgotten a file type and wants to add it to the structure shortly before the project is due to launch!

Developing the applications is complex also because the programmer needs to know the data structure well in order to traverse the model to access the needed data. As you've seen in the earlier chapters, when accessing data from two related tables, you only need to know the fields you require from those two tables. In the hierarchical model, you'd need to know the entire chain between the two. For example, to relate data from A1 and D4, you'd need to take the route: A1, B1, C3, and D4.

Understanding the Network Database Model

The network database model was a progression from the hierarchical database model and was designed to solve some of that model's problems, specifically the lack of flexibility. Instead of only allowing each child to have one parent, this model allows each child to have multiple parents (it calls the children members and the parents owners). It addresses the need to model more complex relationships, such as the orders/parts many-to-many relationship mentioned earlier. As you can see in Figure 7.2, A1 has two members, B1 and B2. B1 is the owner of C1, C2, C3, and C4. However, in this model, C4 has two owners, B1 and B2.

click to expand
Figure 7.2: The network database model

Of course, this model has its problems, or everyone would still be using it. It is more difficult to implement and maintain, and, although more flexible than the hierarchical model, it still has flexibility problems. Not all relations can be satisfied by assigning another owner, and the programmer still has to understand the data structures well in order to make the model efficient.