Data Model | Practical Business Intelligence with SQL Server 2005

Scorecards are usually built on mature data warehouses with data models that span different areas of the business, and the police service scorecard is no exception. To deliver a comprehensive understanding of their police service and support their strategy, the business processes that need to be included in the data model are call handling, crime occurrences, training attendance, and public complaints.

How Good Are We at Handling Calls?

The Call fact table contains a single record for every call received. Because we are interested in what times of day calls are received as well as the date, we use two separate dimensions as described in Chapter 3, "Building a Data Warehouse": a Date dimension that has daily records; and a Time Of Day dimension that has a record for every minute of the day, or 1,440 records in total. When the call record is loaded into the data warehouse from the source system, we can extract the date portion of the call time stamp as the key for Date, and use the time portion of the time stamp to look up the Time Of Day key.

Measures of Elapsed Time

Because we are trying to measure aspects such as how quickly we can dispatch someone after receiving a call, the most interesting measures on this fact table are the ones measuring the gaps between dates and times, such as the time elapsed between dispatching a vehicle and arrival on the scene. We can take the same approach that we used for the manufacturing shipments and store the received, dispatched, arrived, and cleared values as datetime columns in the fact table. Then, we can add calculated columns to the fact view to return the information as useful measures in the form of a number of seconds elapsed between each status.

We could also have modeled this differently and inserted a separate fact record for every change in status, along with a dimension key that describes the event: for example, one fact record for the Call Received event, another record for the Call Dispatched event, and so on. This approach makes a lot of sense for some scenarios, but in our case it's much easier to have a set of additive numeric columns on a single fact record to work with.

One final note on the Time Of Day dimension: As you can see from the fact table in Figure 9-2, we received four different datetime columns from the source system. The modeling question is this: Do we need to add four different Time Of Day keys to allow the user to analyze the facts by any of the four times? The answer you choose depends as usual on the business you are working with. In this case, fairly short gaps occur between the four time periods, so we can just use the time that the call was received as the single Time Of Day key.

Figure 9-2. Call data model

Dealing with Address Information

Address information always presents interesting challenges in a data warehouse, not the least of which is the data quality issue. The possibilities for interesting analyses almost always outweigh the difficulty of dealing with addresses though, so sooner or later you will need to deal with this kind of information.

The major issue with addresses is that even after applying data cleansing and standardization techniques, the best you will end up with is a set of text attributes that hopefully roll up along some hierarchies such as Country, Province/State, and City. The most interesting analyses you can do with address information, such as looking at which areas in the city are experiencing the most crime, really require more than just street or suburb name matchingwe often need to know where the address is physically located.

You have some options for finding the corresponding latitude and longitude from a textual address, such as using third-party geographical information systems or a Web service such as the Microsoft MapPoint service. The MapPoint Web service enables you to submit a text address during the data loading (even an untidy human-entered address) and returns the coordinates and standardized address information. However, geographical information never goes down to the level of detail that some addresses include, such as "Apartment 21B" or "corner of King and Bay St."

In the Call data model shown in Figure 9-2, we can create a GeoLocation dimension table that contains the physical geographical coordinates and description of a location, and an Address dimension table that has the extra information such as apartment numbers. Geographical locations can also be grouped into different patrol zones, each of which belongs to a division within the police service.

How Are Our Crime Reduction Initiatives Going?

Every time police officers are called to a scene, whether a crime has actually been committed or there is just a disturbance, this is known as an "occurrence." The police service is interested in a wide range of information about occurrences, including aspects such as the physical location, the kind of occurrence, whether weapons were involved, and what charges (if any) were laid as a result of the occurrence.

Fact Tables with No Measures

Unlike many of the fact tables we have worked with so far that related to some kind of "transaction," such as a product shipment or a call received, the Occurrence fact table is really modeled around an event. We are interested in what type of event occurred, when it happened, and who was involved; but there is no real "measure" that we can put our hands on. The kind of fact table that we need here really just has a set of dimension keys and no numeric measures, so the analyses that we can perform are mostly about the count of records for a particular selection such as the number of burglaries in the past two months.

Of course, we need to take great care with the concept of conformed dimensions here and make sure that common dimensions such as geographic locations are the same for all fact tables in the data warehouse to allow comparisons across different business areas.

Handling Multiple Charge Records

One interesting problem that we need to solve with this business area relates to the charges that can be laid as a result of the occurrence. Each charge has a number of different dimensions (such as Person and Charge Type) that don't belong with the occurrence as a whole. This means that Charge facts actually have a different granularity to Occurrence facts, and we need to model them with a separate table, as shown in Figure 9-3.

Figure 9-3. Occurrence data model

We could also include on the Charge fact all the dimension keys from the Occurrence fact, so that, for example, we could find out what types of charges were laid as a result of occurrences in a specific geographical area. However, this approach is a bit unwieldy to manage, so we will model a one-to-many (1:M) relationship between the two fact tables and then take advantage of some Analysis Services 2005 features described in the "Technical Solution" section to enable the user to analyze charges using occurrence dimensions.

How Are We Improving Professional Standards?

Every time an officer attends a training course, we add a record to the Training Attendance fact table, including the date, course, result (such as Pass or Fail) dimension keys, and the numeric score the officer received. The Employee dimension has a similar structure to the example in earlier chapters; every employee (except the chief of police) reports to somebody higher up in the chain. We can model this as shown in Figure 9-4 by adding a ParentEmployeeKey column to the dimension table, which contains the surrogate key of another employee record (or null, in the case of the chief). This is an example of a self-referencing dimension, called a parent-child dimension in Analysis Services.

Figure 9-4. Training Attendance data model

What Does the Public Think?

When people call the police service to complain or send in a letter of complaint, we can create a new Complaint fact record with the date and time, as well as the complaintant's address so that we can match it up to the geography dimension and see which communities are having the most complaints. Of course, it would also be really useful to be able to analyze the reasons for the complaints they are receiving. However, the issue that we need to deal with is that complaints don't always neatly match up to a single record in a list of standard complaint reasons.

Some complaints will have more than one reason, so a possible approach is to create multiple fact records for each single complaint, one for each separate reason. This doesn't really match up to the business process, however, because one complex complaint that happens to match up to three different reasons in our dimension table isn't counted as three separate complaints.

We can resolve this dilemma by adding an additional fact table between the Complaint fact table and the Complaint Reason dimension, as shown in Figure 9-5. This is a common technique for resolving M:M (many-to-many) relationships in OLTP databases and works well in data warehouses, too, as long as it can be made transparent to the user of the information. Analysis Services does include features to handle M:M relationships, as you will see in the next section.

Figure 9-5. Complaints data model