As described in previous chapters, we will approach the data modeling for this project by first looking for dimensions. There is nothing really unusual about the Time dimension for the services business, so we will be using the standard structure and start by asking some questions about the business.
Who Does the Work?
The obvious place to start for this model is with the people doing the workconsultants and project managers. In the professional services business, the reality is a little more complex (different types of consultants, engagement managers as well as project managers, people managers, sales executives, administrative staff), but these are just extensions to the simpler model we are looking at.
We could take the approach of having separate Project Manager and Consultant dimensions, but because people can change positions over time, it seems more natural to model this as an Employee dimension, with each employee having a position such as consultant. On a related note, because consultants can change to project managers over time, we need to be careful here that we don't lose the capability to analyze history by overwriting an employee's positionthis is a complex topic usually handed by a technique called slowly changing dimensions (SCDs), which is so common that we have written a whole chapter on itsee Chapter 8, "Managing Changing Data."
The other attributes for the Employee dimension are the usual ones such as Name (see Figure 6-2), but we also want to include the employee's network user ID so that report content can be tailored to the individual user (described in "Giving Users What They Want" in the "Technical Solution" section). We can also include their e-mail address so that reports can be e-mailed to them. The other interesting aspect is the Reports To relationshipemployees have managers who are also employees. This is an example of a self-referencing dimension (called a Parent/Child dimension in Analysis Services).
Figure 6-2. Employee dimension
Tip: Avoid Object-Oriented Designs in a Data Warehouse
Those of us with a strong object-oriented design (OOD) background may be tempted to model the Employee structure using an inheritance-like structure rather than a star or snowflake design; that is, with an Employee base table and two specialized Project Manager and Consultant tables. Although there are occasions when this type of design is appropriate (usually in OLTP systems where it matches up well with the application code), this approach is usually not appropriate for a DW because of the complex ETL that results, as well as the performance impact.
What Are They Working On?
We know that there is ultimately a client paying the bills, so this is an obvious candidate dimension, but because there may be multiple projects for a client each with its own attributes such as start dates, project manager, and budget, we will also include a separate Project dimension, as shown in Figure 6-3. We will also keep a list of the major Project Tasks, which is useful for project analysis to understand which areas of a project are going over budget.
Figure 6-3. Project dimension
One area of complexity is billing rates. Consultants will have different standard billing rates depending on their position, and may even be allocated to projects at different positions than their employee record suggests (such as assigning a senior consultant to a project at a lower consultant-level rate). The current rate information needs to be stored in the timesheet system, but it's debatable whether this information makes it into the data warehouse in that format. What we are really interested in is how much money we actually charged the customer for the work and the average billing rates over time, not what the current standard rates are. One exception to this is when services businesses are interested in finding out which employees are not billing at their standard ratesthis could be modeled by either adding a standard hourly and overtime rate to the Employee table based on their position, or by creating a separate Resource table that links a specific employee to a specific project along with the agreed rates.
How Do We Measure What Work They Are Doing?
The professional services business is built on a simple principlehours billed means money earned. The central fact table to measure the business' performance has a record with the hours billed and the revenue earned for every piece of work that a consultant performs on a project, with a key of Time, Employee, and Project Task, as shown in Figure 6-4.
Figure 6-4. Work completed fact table
Because hours can be translated directly to revenue by applying the appropriate rate, OLTP systems would usually only store the hours and calculate the revenue on demand; in a data warehouse, however, this would miss the opportunity to have well-performing fully additive measures that are stored directly in the database. We can also extend the columns to enable interesting analysis on all kinds of measures, such as overtime revenue versus standard revenue, or billable hours versus nonbillable (such as administrative time that consultants spend on tasks like filling in timesheets).
The business requirements for this solution make frequent reference to utilization, which shows how much billable time an employee has spent as a percentage of the total time available for the month. In practice, the "available time" is often not as simple as multiplying the working days for the month by eight hourswhat about vacation, for example? If the employee's vacation is not included in the total available time for the year, a consultant can achieve 100 percent utilization for the month just by taking a month off. On the other hand, if vacation is included, an incentive exists for employees to manage their vacation time so that they don't lower their utilization for an important period such as the last month before year-end. A simple approach may be to use 52 standard 40-hour weeks for all the available time and adjust the consultants' percentage utilization targets to match.
Also, utilization is a percentage, so we won't store this in the fact table because it is not additive. Because this business only looks at utilization at a month level, the total hours available are stored on each Month record, and utilization can be computed easily by summing the billable hours from the fact table and dividing by the total for the selected periods.
How Much Work Will They Be Doing in the Future?
Although the Work Completed fact table is crucial for understanding the business, trying to manage using only measures of the past is like trying to navigate using your rear-view mirror: At some point, forward-looking measures will be required. In the services business, these measures are typically called something like delivery backlog and sales pipeline.
The delivery backlog is the work that has been sold but that hasn't been delivered yet, and gives us measures of how busy our consultants will be in the near future, as well as how much money we will be making. The sales pipeline is information on potential future sales. Because we can't be sure that these sales will actually get signed (and become part of the backlog), there is a probability percentage assigned to each possible sale, which is increased according to business rules as the sales process progresses and is used to adjust the expected revenue measure.
Both of the preceding are "moving targets" in the sense that the business will need to see how the numbers progress over time. ("What was our sales pipeline at the end of this month compared with the same period last year?") The fact tables shown in Figure 6-5 are both snapshots, where we load all the records at the end of a period, which represents the current view. We can see historical views by selecting prior periods, but because these are snapshots, we cannot add the numbers across time periods. (For example, there is no sense in looking at the total of sales pipeline for January, February, and March because we will triple-count any potential sale that was in all three periods.) This is a clear example of a fact table with "semi-additive" measures, which add up across any dimension except Time in this case.
Figure 6-5. Backlog and Sales Pipeline fact tables
Now that we know what kind of information we are using, we can move on to building the solution.