The Database Lifecycle

Like everything else, databases have a finite lifespan. They are born in a flush of optimism and make their way through life achieving fame, fortune, and peaceful anonymity, or notoriety as the case may be, before fading out once more. Even the most successful database at some time is replaced by another, more flexible, and up-to-date structure, and so begins its life anew. Although exact definitions may differ, there are generally six stages of the database lifecycle:

Analysis The Analysis phase is where the stakeholders are interviewed and any existing system is examined to identify problems, possibilities, and constraints. The objectives and scope of the new system are determined.

Design The Design phase is where a conceptual design is created from the previously determined requirements, and a logical and physical design are created that will ready the database for implementation.

Implementation The Implementation phase is where the database management system (DBMS) is installed, the databases are created, and the data are loaded or imported.

Testing The Testing phase is where the database is tested and fine-tuned, usually in conjunction with the associated applications.

Operation The Operation phase is where the database is working normally, producing information for its users.

Maintenance The Maintenance phase is where changes are made to the database in response to new requirements or changed operating conditions (such as heavier load).

Database development is not independent of systems development, often being one component of the greater systems development process. The stages of systems development basically mirror the stages of a database lifecycle but are a superset. Whereas database design deals with designing the system to store the data, systems design is also concerned with the processes that will impact on the data.

Phase 1: Analysis

Your existing system can no longer cope. It's time to move on. Perhaps the existing paper system is generating too many errors, or the old Perl script based on flat files can no longer handle the load. Or perhaps an existing news database for a website is struggling under its own popularity and needs an upgrade. This is the stage where the existing system is reviewed.

Depending on the size of the project, the designer may be an individual, responsible for the database implementation and coding, or may be a whole team of analysts. For now, the term designer will represent all these possibilities.

The following are the steps in the Analysis phase:

  1. Analyze the organization.

  2. Define any problems, possibilities, and constraints.

  3. Define the objectives.

  4. Agree on the scope.

When reviewing a system, the designer needs to look at the bigger picture—not just the hardware or existing table structures, but the whole situation of the organization calling for the redesign. For example, a large bank with centralized management would have a different structure and a different way of operating from a decentralized media organization, where anyone can post news onto a website. This may seem trivial, but understanding the organization you're building the database for is vital to designing a good database for it. The same demands in the bank and media organizations should lead to different designs because the organizations are different. In other words, a solution that was constructed for the bank cannot be unthinkingly implemented for the media organization, even when the situation seems similar. A culture of central control at the bank may mean that news posted on the bank website has to be moderated and authorized by central management, or may require the designer to keep detailed audit trails of who modified what and when. On the flip side, the media organization may be more laissez-faire and will be happy with news being modified by any authorized editor. Understanding an organization's culture helps the designer to ask the right questions. The bank may not ask for the audit trail, it may simply expect it; and when the time comes to roll out the implementation, the audit trail would need to be patched on, requiring more time and resources.

Once you understand the organization structure, you can question the users of any existing system as to what their problems and needs are, what constraints exist currently, and what the objectives of the new database system are, as well as what constraints will exist then. You need to question different role players, as each can add a new understanding as to what the database may need. For example, the media organization's marketing department may want to track movements from one news article to another on its website, but the editorial department may want detailed statistics about the times of day certain articles are read. You may also be alerted to possible future requirements. Perhaps the editorial department is planning to expand the website, which will give them the staff to cross-link web articles. Keeping this future requirement in mind could make it easier to add the cross-linking feature when the time comes.

Constraints can include hardware ("We have to use our existing database server, an AMD Duron 900MHz") or people ("We only have one data capturer on shift at any one time"). Constraints also refer to the limitations on values. For example, a student's grade in a university database may not be able to go beyond 100 percent, or the three categories of seats in a theatre database are small, medium, and large.

It is rarely sufficient to rely on one level of management, or an individual, to supply objectives and current problems, except in the smallest of organizations. Top management may be paying for the database design, but lower levels will need to use it, and their input is probably even more important for a successful design.

Of course, although anything is possible given infinite time and money, this is almost never forthcoming. Determining scope, and formalizing it, is an important part of the project. If the budget is for one month's work but the ideal solution requires three, the designer must make clear these constraints and agree with the project owners on which facets are not going to be implemented.

Phase 2: Design

The Design phase is where the requirements identified in the previous phase are used as the basis to develop the new system. Another way of putting it is that the business understanding of the data structures is converted into a technical understanding. The what questions ("What data are required? What are the problems to be solved?") are replaced by how questions ("How will the data be structured? How is the data to be accessed?").

This phase consists of three parts: the conceptual design, the logical design, and the physical design. Some methodologies merge the logical design phase into the other two phases. Note that this chapter is not aimed at being a definitive discussion of database design methodologies (there are whole books written on that!); rather it aims to introduce you to the topic.

Conceptual Design

The purpose of the conceptual design phase is to build a conceptual model based upon the previously identified requirements but closer to the final physical model. The most useful and common conceptual model is called an entity-relationship model.

Entities and Attributes

Entities are basically people, places, or things you want to keep information about. For example, a library system may have book, library, and customer entities. Learning to identify what should be an entity, what should be a number of entities, and what should be an attribute of an entity takes practice, but there are some good rules of thumb. The following questions can help to identify whether something is an entity:

  • Can it vary in number independently of other entities? For example, person height is probably not an entity, as it cannot vary in number independently of person. It is not fundamental, so it cannot be an entity in this case.

  • Is it important enough to warrant the effort of maintaining? For example, customer may not be important for a grocery store and will not be an entity in that case, but it will be important for a video store and will be an entity in that case.

  • Is it its own thing that cannot be separated into subcategories? For example, a car-rental agency may have different criteria and storage requirements for different kinds of vehicles. Vehicle may not be an entity as it can be broken up into car and boat, which are the entities.

  • Does it list a type of thing, not an instance? The video game blow-em-up 6 is not an entity, rather an instance of the game entity.

  • Does it have many associated facts? If it only contains one attribute, it is unlikely to be an entity. For example, city may be an entity in some cases, but if it contains only one attribute, city name, it is more likely to be an attribute of another entity, such as customer.

The following are examples of entities involving a university with the possible attributes in parentheses:

  • Course (name, code, course prerequisites)

  • Student (first name, surname, address, age)

  • Book (title, ISBN, price, quantity in stock)

An instance of an entity is one particular occurrence of that entity. For example, the student Rudolf Sono is one instance of the student entity. There will probably be many instances. If there is only one instance, consider whether the entity is warranted. The top level usually does not warrant an entity. For example, if the system is being developed for a particular university, university will not be an entity because the whole system is for that one university. However, if the system was developed to track registration at all universities in the country, then university would be a valid entity.

Relationships

Entities are related in certain ways. For example, a customer can belong to a library and can take out books. A book can be found in a particular library. Understanding what you are storing data about, and how the data relate, leads you a large part of the way to a physical implementation in the database.

There are a number of possible relationships:

Mandatory For each instance of entity A, there must exist one or more instances of entity B. This does not necessarily mean that for each instance of entity B, there must exist one or more instances of entity A. Relationships are optional or mandatory in one direction only, so the
A-to-B relationship can be optional while B-to-A is mandatory.

Optional For each instance of entity A, there may or may not exist instances of entity B.

One-to-one relationship (1:1) This is where for each instance of entity A, there exists one instance of entity B, and vice versa. If the relationship is optional, there can exist zero or one instances, and if the relationship is mandatory, there exists one and only one instance of the associated entity.

One-to-many relationship (1:M)  For each instance of entity A, many instances of entity B exist, while for each instance of entity B, only one instance of entity A exists. Again, these can be either optional or mandatory relationships.

Many-to-many relationship (M:N) For each instance of entity A, many instances of entity B exist, and vice versa. These can be either optional or mandatory relationships.

There are numerous ways of showing these relationships. Figure 9.1 shows student and course entities. In this case, each student must have registered for at least one course, but a course does not necessarily have to have any students registered. The student-to-course relationship is mandatory, and the course-to-student relationship is optional.

click to expand
Figure 9.1: A many-to-many relationship

Figure 9.2 shows invoice line and product entities. Each invoice line must have at least one product (but no more than one); however, each product can appear on many invoice lines or none at all. The invoice line-to-product relationship is mandatory, while the product to-invoice-line relationship is optional.

click to expand
Figure 9.2: A one-to-many relationship

Figure 9.3 shows husband and wife entities. Each husband must have one and only one wife, and each wife must have one, and only one, husband. Both relationships are mandatory.

click to expand
Figure 9.3: A one-to-one relationship

An entity can also have a relationship with itself. Such an entity is called a recursive entity. Take a person entity: If you're interested in storing data about which people are brothers, you will have an "is a brother to" relationship. In this case, the relationship is an M:N relationship.

Conversely, a weak entity is an entity that cannot exist without another entity. For example, in a school, the scholar entity is related to the weak entity "parent/guardian." Without the scholar, the parent or guardian cannot exist in the system. Weak entities usually derive their primary key, in part or in totality, from the associated entity. Parent/guardian could take the primary key from the scholar table as part of its primary key (or the entire key if the system only stored one parent/guardian per scholar).

The term connectivity refers to the relationship classification (1:1, 1:M, or M:N).

The term cardinality refers to the specific number of instances possible for a relationship. Cardinality limits list the minimum and maximum possible occurrences of the associated entity. In the husband and wife example, the cardinality limit is (1,1), and in the case of a student who can take between one and eight courses, the cardinality limits would be represented as (1,8).

Developing an Entity-Relationship Diagram

An entity-relationship diagram models how the entities relate to each other. It's made up of multiple relationships, the kind that you've seen in Figures 9.1, 9.2, and 9.3. In general, these entities go on to become to the database tables.

The first step in developing the diagram is to identify all the entities in the system. In the initial stage, it is not necessary to identify the attributes, but this may help to clarify matters if the designer is unsure about some of the entities. Once the entities are listed, relationships between these entities are identified and modeled according to their type: one-to-many, optional, and so on. There are many software packages that can assist in drawing an entity-relationship diagram, but any graphical package should suffice.

Once the initial entity-relationship diagram has been drawn, it is often shown to the stakeholders. Entity-relationship diagrams are easy for nontechnical people to understand, especially if they are guided through the process. This can help identify any errors that have crept in. Part of the reason for modeling is that models are much easier to understand than pages of text, and they are much more likely to be viewed by stakeholders, which reduces the chances of errors slipping through to the next stages, when they may be more difficult to fix.

Tip 

It's important to remember that there is no one right or wrong answer. The more complex the situation, the more possible designs that will work. Database design is an acquired skill, though, and more experienced designers will have a good idea of what works and of possible problems at a later stage, having gone through the process before.

Once the diagram has been approved, the next stage is to replace many-to-many relationships with two one-to-many relationships. A DBMS cannot directly implement many-to-many relationships, so they are decomposed into two smaller relationships. To achieve this, you have to create an intersection, or composite entity type. Because intersection entities are less "real-world" than ordinary entities, they are sometimes difficult to name. In this case, you can name them according to the two entities being intersected. For example, you can intersect the many-to-many relationship between student and course by a student-course entity (see Figure 9.4).

click to expand
Figure 9.4: Creating the student-course intersection entity

The same applies even if the entity is recursive. The person entity that has an M:N relationship "is brother to" also needs an intersection entity. You can come up with a good name for the intersection entity in this case: brother. This entity would contain two fields, one for each person of the brother relationship—in other words, the primary key of the first brother and the primary key of the other brother (see Figure 9.5).

click to expand
Figure 9.5: Creating the brother intersection entity

Logical and Physical Design

Once the conceptual design is finalized, it's time to convert this to the logical and physical design. Usually, the DBMS is chosen at this stage, depending on the requirements and complexity of the data structures. Strictly speaking, the logical design and the physical design are two separate stages, but are often merged into one. They overlap because most current DBMSs (including MySQL) match logical records to physical records on disk on a 1:1 basis.

Each entity will become a database table, and each attribute will become a field of this table. Foreign keys can be created if the DBMS supports them and the designer decides to implement them. If the relationship is mandatory, the foreign key must be defined as NOT NULL, and if it is optional, the foreign key can allow nulls. For example, because of the invoice line-to-product relationship in the previous example, the product code field is a foreign key in the invoice line table. Because the invoice line must contain a product, the field must be defined as NOT NULL. Currently, InnoDB tables do support foreign key constraints, and MyISAM tables do not support foreign keys in version 4, but they probably will in version 4.1. A DBMS that does support foreign keys uses ON DELETE CASCADE and ON DELETE RESTRICT clauses in their definitions. ON DELETE RESTRICT means that records cannot be deleted unless all records associated with that foreign key are deleted. In the invoice line-to-product case, ON DELETE RESTRICT in the invoice line table means that if a product is deleted, the deletion will not take place unless all associated invoice lines with that product are deleted as well. This avoids the possibility of an invoice line existing that points to a nonexistent product. ON DELETE CASCADE achieves a similar effect but more automatically (and more dangerously!). If the foreign key was declared with ON DELETE CASCADE, associated invoice lines would automatically be deleted if a product was deleted. ON UPDATE CASCADE is similar to ON DELETE CASCADE, in that all foreign key references to a primary key are updated when the primary key is updated.

Normalizing your tables is an important step when designing the database (see Chapter 8, "Database Normalization"). This process helps avoid data redundancy and improves your data integrity.

Novice database designers usually make a number of common errors. If you've carefully identified entities and attributes and you've normalized your data, you'll probably avoid these errors. However, designers who rush through the design process often end up with large tables of unrelated data. Implementing the following tips will help you to avoid some of the more frequently made errors:

  • Keep unrelated data in different tables. People who are used to using spreadsheets often make this mistake because they are used to seeing all their data in one two-dimensional table. A relational database is much more powerful; don't "hamstring" it in this way.

  • Don't store values you can calculate. Let's say you're interested in three numbers: A, B, and the product of A and B (A * B). Don't store the product. It wastes space and can easily be calculated if you need it. And it makes your database more difficult to maintain: If you change A, you also have to change all of the products as well. Why waste your database's efforts on something you can calculate when you need it?

  • Does your design cater to all the conditions you've analyzed? In the heady rush of creating an entity-relationship diagram, you can easily overlook a condition. Entity-relationship diagrams are usually better at getting stakeholders to spot an incorrect rule than spot a missing one. The business logic is as important as the database logic and is more likely to be overlooked. For example, it's easy to spot that you cannot have a sale without an associated customer, but have you built in that a customer cannot be approved for a sale of less than $500 if another approved customer has not recommended them?

  • Are your attributes, which are about to become your field names, well chosen? Fields should be clearly named. For example, if you use f1 and f2 instead of surname and first_name, the time saved in less typing will be lost in looking up the correct spelling of the field or in mistakes where a developer thought f1 was the first name, and f2 the surname. Similarly, try to avoid the same names for different fields. If six tables have a primary key of code, you're making life unnecessarily difficult. Rather, use more descriptive terms, such as sales_code or customer_code.

  • Don't create too many relationships. Almost every table in a system can be related by some stretch of the imagination, but there's no need to do this. For example, a tennis player belongs to a sports club. A sports club belongs to a region. The tennis players then also belong to a region, but this relationship can be derived through the sports club, so there's no need to add another foreign key (except to achieve performance benefits for certain kinds of queries). Normalizing can help you avoid this sort of problem (and even when you're trying to optimize for speed, it's usually better to normalize and then consciously denormalize rather than not normalize at all).

  • Conversely, have you catered to all relations? Do all relations from your entity-relationship diagram appear as common fields in your table structures? Have you covered all relations? Are all many-to-many relationships broken up into two one-to-many relationships, with an intersection entity?

  • Have you listed all the constraints? Constraints include a gender that can only be m or f, ages of schoolchildren that cannot exceed 20, or e-mail addresses that need to have an at sign (@) and at least one period (.); don't take these limits for granted. At some stage the system will need to implement them, and you're going to either forget to do so, or have to go back to gather more data if you don't list these up front.

  • Are you planning to store too much data? Should a customer be asked to supply their eye color, favorite kind of fish, and names of their grandparents if they are simply trying to register for an online newsletter? Sometimes stakeholders want too much information from their customers. If the user is outside the organization, they may not have a voice in the design process, but they should always be thought of foremost. Consider also the difficulty and time taken to capture all the data. If a telephone operator needs to take all this information down before making a sale, imagine how much slower they will be. Also consider the impact data has on database speed. Larger tables are generally slower to access, and unnecessary BLOB, TEXT, and VARCHAR fields lead to record and table fragmentation.

  • Have you combined fields that should be separate? Combining first name and surname into one name field is a common mistake. Later you'll realize that sorting names alphabetically is tricky if you've stored them as John Ellis and Alfred Ntombela. Keep distinct data discrete.

  • Has every table got at least a primary key? There had better be a good reason for leaving out a primary key. How else are you going to identify a unique record quickly? Consider that an index speeds up access time tremendously, and when kept small it adds very little overhead. Also, it's usually better to create a new field for the primary key rather than take existing fields. First name and surname may be unique in your current dataset, but they may not always be. Creating a system-defined primary key ensures that it will always be unique.

  • Give some thought to your other indexes. What fields are likely to be used in the condition to access the table? You can always create more fields later when you test the system, but add any you think you need at this stage.

  • Are your foreign keys correctly placed? In a one-to-many relationship, the foreign key appears in the "many" table, and the associated primary key in the "one" table. Mixing these up can cause errors.

  • Do you ensure referential integrity? Foreign keys should not relate to a primary key in another table that no longer exists.

  • Have you covered all character sets you may need? German letters, for example, have an expanded character set, and if the database is to cater to German users it will have to take this into account. Similarly, dates and currency formats should be carefully considered if the system is to be international.

  • Is your security sufficient? Remember to assign the minimum permissions you can. Do not allow anyone to view a table if they do not need to do so. Allowing malicious users to view data, even if they cannot change it, is often the first step in for an attacker.

Phase 3: Implementation

The Implementation phase is where you install the DBMS on the required hardware, optimize the database to run best on that hardware and software platform, and create the database and load the data. The initial data could be either new data captured directly or existing data imported from a MySQL database or other DBMS. You also establish database security in this phase and give the various users that you've identified access applicable to their requirements. Finally, you also initiate backup plans in this phase.

The following are steps in the Implementation phase:

  1. Install the DBMS.

  2. Tune the setup variables according to the hardware, software, and usage conditions.

  3. Create the databases and tables.

  4. Load the data.

  5. Set up the users and security.

  6. Implement the backup regime.

Phase 4: Testing

The Testing phase is where the performance, security, and integrity of the data are tested. Usually this will occur in conjunction with the applications that have been developed. You test the performance under various load conditions to see how the database handles multiple concurrent connections or high volumes of updating and reading. Are the reports generated quickly enough? For example, an application designed with MyISAM tables may prove too slow because the impact of the updates was underestimated. The table type may have to be changed to InnoDB in response.

Data integrity also needs to be tested, as the application may have logical flaws that result in transactions being lost or other inaccuracies. Further, security needs to be tested to ensure that users can access and change only the data they should.

The logical or physical designs may have to be modified. Perhaps new indexes are required (which the tester may discover after careful use of MySQL's EXPLAIN statement, explained in Chapter 4, "Indexes and Query Optimization"), or certain tables may need to be denormalized for performance reasons (see Chapter 8).

The testing and fine-tuning process is an iterative one, with multiple tests performed and changes implemented.

The following are the steps in the Testing phase:

  1. Test the performance.

  2. Test the security.

  3. Test the data integrity.

  4. Fine-tune the parameters or modify the logical or physical designs in response to the tests.

Phase 5: Operation

The Operation phase takes place when the testing is complete and the database is ready to be rolled out for everyday use. The users of the system begin to operate the system, load data, read reports, and so on. Inevitably, problems come to light. The designer needs to manage the database's scope carefully at this stage, as users may expect all their desires to be pandered to. Poor database designers may find themselves extending the project well beyond their initial time estimate, and the situation may also become unpleasant if the scope has not been clearly defined and agreed upon. Project owners will feel wronged if their needs are not met, and the database designers will feel overworked and underpaid. Even when scope has been well managed, there will always be new requirements. These then lead into the next stage.

There are numerous strategies for implementing a rollout. The low-key approach often works well, where the relatively low number of users in the early stage make bug fixing easy. Hugely publicized rollouts often end with egg on the stakeholders' faces, as the best testers of all, the users, invariably find an unforeseen bug, which is best done away from the spotlight. Alternatively, rollouts can occur in a distributed manner, where a pilot branch or office is selected, and when the system has proven its stability, it's rolled out to the remaining branches.

The following are the steps in the Operation phase:

  1. Hand over operation of the database to the users.

  2. Make any final changes based on problems discovered by users.

Phase 6: Maintenance

The database Maintenance phase incorporates general maintenance, such as maintaining the indexes, optimizing the tables, adding and removing users, and changing passwords, as well as backups and restoration of backups in case of a failure. (See Chapter 10, "Basic Administration," for more information about maintenance.) New requirements also start to be requested, and this may result in new fields, or new tables, being created.

As the system and organization changes, the existing database becomes less and less sufficient to meet the organization's needs. For example, the media organization may be amalgamated with media bodies from other countries, requiring integration of many data sources, or the volumes and staff may expand (or reduce) dramatically. Eventually, there comes a time, whether it's 10 months after completion or 10 years, when the database system needs to be replaced. The maintenance of the existing database begins to drain more and more resources, and the effort to create a new design is matched by the current maintenance effort. At this point, the database is coming to the end of its life, and a new project begins its life in the Analysis phase.

The following are the steps in the Maintenance phase:

  1. Maintain the indexes (for example, with MySQL's ANALYZE).

  2. Maintain the tables (MySQL's OPTIMIZE).

  3. Maintain the users (MySQL's GRANT and REVOKE).

  4. Change passwords.

  5. Back up.

  6. Restore backups.

  7. Change the design based upon new requirements.



Mastering MySQL 4
Mastering MySQL 4
ISBN: 0782141625
EAN: 2147483647
Year: 2003
Pages: 230
Authors: Ian Gilfillan

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net