Reverse Engineering the Database | Professional UML with Visual Studio. NET. Unmasking Visio for Enterprise Architects

Chapter 8 - Database Modeling with Visio for Enterprise Architects

byAndrew Filevet al.?
Wrox Press ©2002


	Team FLY

Reverse engineering is the process of mapping an already existing physical database and automatically developing a model so that it can be more easily understood - a viewable model is always more intuitive. There aren't many who would claim that a command-line interface is more intuitive and easier to understand than a Windows interface. Obviously, if you were to learn the command-line and get used to it, it can be a faster way to interact with the system. In order to understand what is really going on, nothing beats a viewable diagram or model. This is what documentation is for and why it is so important.

Unless it's combined with updating/generation of the database, which we'll look at later, the creation and maintenance of documentation is reverse engineering's biggest purpose. Many databases are designed from the outset with little or no regard to documentation. Even if solid documentation of the database exists, it is often so grossly outdated that it does not really represent the database in its current implementation. While the database is being constructed, design decisions may be made on the fly. This should be expected and occurs for many reasons that are beyond the control of the system designer. The design can prove to be unfeasible given the choice of the database system used. The implementation of the database can include optimizations that the designers could not foresee the need for.

The list of objects that can be reverse-engineered and depicted by Visio depends on two things:

The model to which you choose to reverse engineer
The features supported by the driver that Visio uses to communicate with the database

Visio uses an algorithm to pull metadata from the underlying database system. The different types of objects you can reverse engineer depend on what the driver you have installed supports.

Reverse Engineering an ER diagram

Reverse engineering a database in Visio is an extremely simple process. How it actually does it is much more complex. Let's start by going through the process with the SQL Server Northwind database and examining what comes out on the other side.

The Reverse Engineer Wizard

We begin by creating the diagram we want to reverse engineer to. We will start with an ER diagram. We can have either a Database Model Diagram or an ER Source Diagram. Either way will provide us with the capability to reverse engineer an ER diagram. Selecting Reverse Engineer from the Database menu brings up the Reverse Engineer Wizard. This wizard will guide us through the process.

click to expand

We begin by selecting the database we are going to connect to and the driver we are going to use. Visio shows you a combination of the OLEDB drivers and ODBC data sources you have on your system. Selecting the SQL Server driver, which is the default, will filter the list of available data sources to only those that are configured to point to SQL server databases. Any of the drivers in the list may be used to reverse engineer the database; however, the features available will change because some of the drivers or the DBMS you are using don't support all of them.

click to expand

Notice that there is also a Generic ODBC driver. This driver is the ODBC driver for OLEDB providers and allows you to select any database for which you have an OLEDB provider. If you already have a data source set up, you may select it, or you can create a new one to point to your database.

In the next step of the wizard you are asked to choose which object types you would like to reverse engineer. This depends upon the database system you are trying to reverse engineer as well as the driver you have selected to connect to that database. The object types supported are checked by default, while the unsupported ones are grayed out. You can choose to filter which objects are displayed in subsequent steps by unchecking unwanted types. The object types indented below the tables objects describe the different properties/attributes of a table that Visio can map to the model.

click to expand

Clicking the Next button will cause Visio to connect to the source database and retrieve a list of the tables and views from the source database. You can then select which tables and views you would like mapped to your diagram by putting a check in the box next to the tables you want. Clicking Next will then provide you with a list of the stored procedures. You can again select which objects you want by putting a check mark in the box next to the desired selections.

click to expand

The next step in the wizard asks you if you would like Visio to add the resulting shapes directly to the diagram. Clicking Yes will cause Visio to attempt to add the shapes to the current page of the diagram, while clicking No will simply add the generated shapes to the underlying model allowing you to add them later. At this point, you can choose to click Finish and have Visio begin the reverse engineering process, or clicking Next will allow you to view a summary of all the objects and catalog information that you have selected. Finishing the wizard will begin the reverse engineering process. As you can see, the Output window is brought into focus.

click to expand

Once the reverse engineering process has begun, the Output window will come up. Visio begins by extracting the columns of all the tables, then it moves on to constraints beginning with checked constraints, primary key constraints, indexes, foreign keys, and triggers. For any stored procedures you have selected the code will be extracted and placed in the underlying model. Visio then performs an error check on the model to alert you to check consistency and integrity of the system. Finally, it provides a summary indicating how many objects were engineered and how long it took for each group. If you had chosen to have Visio automatically add the shapes to the drawing, they would now be laid out on the surface. Otherwise, you can view all of the objects in the Tables and Views window. Dragging a table or view object from the Tables and Views window to the drawing surface results in that shape being placed in the drawing.

click to expand

If this window isn't visible, it can be accessed by Database | View, and clicking Tables and views. Any code that was reverse engineered will show up in the code tab of the tables and views window. These shapes may not be added to the drawing and are merely there to provide completeness in the model and allow you to make edits to the code if need be.

As soon as the reverse engineering process is complete, Visio stores the data in an underlying model using its own format. This underlying model stores everything that it reverse engineered, not only what appears on the drawing surface.

From its visual depiction on the drawing surface one can see a number of properties that Visio has reverse engineered for that object:

click to expand

The table name is displayed in the gray section at the top of the object. As with standard entity relationship diagrams, the objects columns are displayed on the right-hand side while constraints and attributes are shown to the left of the column name. Primary keys are shown underlined and with a PK identifier on their left. Required columns are shown in bold. Indexes are marked on their left with an I. Uniqueness constraints are depicted with a U. Foreign key constraints are depicted with an FK next to the columns they are defined for.

This notation is not the only way objects can be depicted in Visio. On the Database menu under Options, you can select the document choice to bring up the database document options window.

click to expand

Here you can decide the symbol set for graphically modeling the system. You can choose either relational, which is the default, or the IDEF1X symbol set. You can also choose whether to display conceptual names, physical names, or both. Conceptual names are only useful for reverse engineering if they have been defined in the underlying database or if you plan to define them later in the model; otherwise they will be the same as the physical name.

click to expand

The Table tab allows you to choose the level of detail you would like to depict in your drawing. For example, you can choose not to display keys and indexes and you can choose to display both portable and physical data types for each column:

click to expand

The Relationship tab provides options to display the attributes of the relationships defined in the underlying database in many different ways. For example, you can choose to display the physical name or the verb phrase of the underlying key relationship (not shown in the diagram below).

click to expand

You can choose to display either cardinality notation or crow's feet, and you can use referential action to display the triggers involved in cascading additions and deletions. The verb phrase is defined by the conceptual predicate role defined for the object in the relationship.

The Database Properties Window for ER diagrams

Right-clicking on a shape and selecting Database Properties brings up the Database Properties window. This window usually appears as a tab at the bottom of the drawing.

click to expand

This window is extremely useful for viewing all of the underlying properties of a database object, many of which cannot be depicted visually. On the left are the categories of properties that are defined for each object, the first of which, Definition, includes all of the properties that define that object including the physical name, owner, and source database. There is also a property here that lets you define a conceptual name for this object.

click to expand

The Columns category depicts all of the columns and their attributes that are defined for this table and allows you to set mandatory constraints, primary key constraints, and the physical or portable data type for each column. Selecting a column and pressing the Edit button to the right brings up the Column Properties window.

Column Properties

click to expand

The Definition tab allows you to specify a default value as well as a conceptual name. Conceptual names provide more documentation and can help to map back to the conceptual model. The Sync names when typing checkbox ensures that the names are the same in order to ease the creation of conceptual names. Creating a default value for a field in the database can be very useful to use as a boilerplate for new data to be entered into the table.

click to expand

The Data Type tab gives greater freedom to select and control data types, both portable (a general data type that may be easily mapped to any physical database) and physical (a native data type based upon the default driver you have specified). The Collection tab allows you to specify a collection type, which is used if your database supports object-relational models. Let's move onto the Check tab:

click to expand

The Check tab allows you to define check constraints placed upon this column. You can define values and/or ranges of values that the column can accept. The Show check clause code radio button allows you to customize code to meet more specific check clause needs. The Extended tab will allow you specify any extended properties defined by the driver of your database system. Finally, the Notes tab allows you to enter free-form annotations that you would like in order to more fully document this column.

Now let's return to the Database Properties window. The Primary ID category allows you to define the column or columns that will uniquely identify each row in your table. Add columns by selecting them from the available columns list box and clicking the Add button. They will then be depicted in the primary ID column list box. You can also choose how you would like the primary key to be enforced, the physical name of the constraint, and whether you want to create an index on that column or columns.

The Indexes category is used to identify the columns you would like to create indexes for. Click New to create a new index, select the index type from the drop-down menu then add columns to it from the available columns list box. The Options button allows you define any driver-specific options you would like. The trigger category allows you to define code triggers that will be executed based upon certain actions on your tables. Click Add to bring up the code editor window. The code editor window is used to type DDL scripts that will create or alter objects in your database.

Back on the Database Properties window, the Check category will allow you to add check clauses to your table in much the same way as you defined triggers. The Extended category will allow you to define extended attributes defined by your database driver. Once again the notes column allows you to annotate your table.

Remember, that at this point all the tables are in the underlying model. If your drawing surface gets too cluttered, you can select them and press the Delete key. Be sure not to remove the tables from the underlying model (unless of course this is really your intent). A useful tip for retrieving many of the shapes back onto the drawing surface is to select any object already on the surface and either right-click and select Show Related Tables, or select Show Related Tables from the Database menu. This will add any shapes to the drawing that are related directly to that table. This will also add to the drawing all the relational objects that define how all these table relate to other.

click to expand

Reverse Engineering an ORM Diagram

The reverse engineering process for an ORM source model is almost exactly the same as that for an ER diagram. The same steps are involved. This makes sense because the way Visio internally stores all of the metadata that it pulls from the underlying physical database is the same for both diagrams. The only difference is the notation and the way that this information is graphically depicted for you. Once you have gone through the wizard, chosen the objects you wish to reverse engineer and completed the reverse engineering process you will be presented with all of the object types in the underlying database. All of the objects are listed alphabetically in the object types window. A better way (because the objects are easier to drag from it) is to view the results of the reverse engineering process by using the Business Rules window. Choosing the Object Types tab displays all of the object types, both entity and value, while the Fact Types tab displays all of the facts or predicates that define the roles the objects play. Objects can be dragged directly onto the drawing surface from either of these tabs. As with ER diagrams, an extremely useful way to add shapes to the drawing and have Visio lay them out for you is by right-clicking on any type on the drawing surface and selecting Show Relationships:

click to expand

As can be seen from the diagram the Territories object plays three roles. The foreign key constraints in the underlying database have been mapped to a system of predicates, roles, and constraints. As can be seen from the Territories is of Region fact type, the physical name of the foreign key defined in the database (FK_Territory_Region) has been used to create the inverse text of the role. The uniqueness constraints are shown as the familiar arrow-tipped bars over the role they constrain. We can also see the primary uniqueness constraint (a circled P) that denotes that Territories and Employees provide a primary composite reference scheme for the EmployeeTerritories object. Although the ORM modeling notation is very good at graphically depicting many of the objects in a database in an intuitive manner, there are still many properties that are left out of the visual representation. Just as in ER diagrams, the Database Properties window can be extremely useful in viewing these properties.

The Database Properties Window for ORM Diagrams

The Database Properties window also outlines the properties of the currently selected object in categories listed on the left:

click to expand

Selecting different objects results in a multitude of different properties being displayed that can be defined for each particular object type. We will go over the different types of properties that can be defined for both fact types and object types.

Selecting an object in the Object Types tab of the Business Rules window and then looking at its database properties, let's have a look at the first category of Definition.

click to expand

We see that you can define the name of the object, and specify whether the object is an entity, value or external type. The External checkbox states whether this object is defined here or if the object is merely being used as a proxy for an external object defined elsewhere. We can specify the name space it belongs to if our project is large enough that we want to define namespaces so that name collision does not occur. The Independent checkbox allow us to determine whether we define this object as independent or not with regard to whether it has to play a role in a predicate to exist. Finally, we can give Visio's Verbalizer some help in the verbiage of our model by telling it whether we want it to use his/her instead of its when constructing fact types.

The Ref Mode category is used to define the identification scheme for your object. Value types may not have a reference mode since, as you'll remember, they represent strings and numeric values or primitive types. For entities, we can define the name of the reference and the type of reference it is. If a reference mode is specified, it can be identification, formatting, or measurement. Identification means that the reference is an unique identifier for each instance of your object. An example of formatting identification type would be a table of a certain screw type where the screw size uniquely identifies each screw. A measurement identification type would be a type whose value is measured by/as its primary identification scheme. The Options section will decide how you want the primary reference scheme to be used for naming. We'll keep Use Document's setting for now, and we'll take a look at the choices and how to change the default document's setting later.

The Data Type category allows the detailed definition of the data type of the object in order to achieve a more precise mapping to the physical database. We can choose any data type that is specified in the driver we are using for this model. If the Show portable data type radio button is checked, the portable data types that aren't specific to the particular DBMS in question are used. If the Physical data type is checked, then Visio will update and tailor the screen so that each of the options reflects the specific driver you have chosen for your model. For SQL server, you can define whether the column is an identity column by clicking the Edit button.

click to expand

The Composite Type category allows you to define whether this type is a composite type and if so, what kind of composite type the object is.

click to expand

The Subtype category is for use when the object is defined as a subtype of another type. The only selection that is not grayed out by default is the Map to a separate table option.

click to expand

Although this is very useful when the type is a subtype, it can also be useful if you want to make sure that your object is represented in the logical diagram as a separate entity or table. Adding a subtype relationship to the diagram and relating it to the object and its supertype will allow the other options to be entered. You can define the supertype, instruct Visio that you want to use table inheritance, and enter notes documenting the subtype/supertype relationship.

The Value category allows you to define and restrict the value that your object can take. This is tied up with the reference mode or identification scheme that you have defined for your object. Taking our example of the wood screw above, screw size can only be so big or small, so you would want to restrict its value here.

click to expand

We've already explained what a nested role (objectified) predicate is. When the object you are looking at is a nested role, this category presents you with readings that you can change to help verbalize the relationship you have now created.

As you can see, the reverse engineering process is the simple part. It's the understanding of what's going on that can be difficult.


	Team FLY