The Business Dimensional Lifecycle

Weve all felt the empty pit of panic in our stomach when, deep into a project, we realize the scope and scale of the effort before us will take much more work than we imagined at the outset. Many BI/DW projects begin with the notion that youll just move some data to a new machine, clean it up a little, and develop some reports . Doesnt sound so badsix weeks of effort, two months at the most. You wade into the river and are waist deep before realizing you should have built a bridge.

The best way to avoid this sense of panicand the resulting disasteris to figure out where youre going before you jump in. It helps to have a roadmap and directions to lead you safely through unfamiliar territoryone that will tell you the places you have to visit and point out the danger zones on the trip ahead. This book is that roadmap for the Microsoft SQL Server BI/DW system project. This book follows the basic flow of the Business Dimensional Lifecycle first described in the book The Data Warehouse Lifecycle Toolkit (Wiley, 1998). The steps, tasks , and dependencies of the Lifecycle were crafted based on our collective experience of what works. The Lifecycle is an iterative approach based on four primary principles:

  • Focus on the business: Concentrate on identifying business requirements and their associated value. Use these efforts to develop solid relationships with the business side and sharpen your business sense and consultative skills.

  • Build an information infrastructure: Design a single, integrated, easy-to-use, high-performing information foundation that will meet the broad range of business requirements youve identified across the enterprise.

  • Deliver in meaningful increments : Build the data warehouse in increments that can be delivered in 6 to 12 month timeframes. Use clearly identified business value to determine the implementation order of the increments.

  • Deliver the entire solution: Provide all the elements necessary to deliver value to the business users. This means a solid, well-designed, quality- tested , accessible data warehouse database is only the start. You must also deliver ad hoc query tools, reporting applications and advanced analytics, training, support, web site, and documentation.

This book helps you follow these four principles by using the Business Dimensional Lifecycle to build your DW/BI system. These four principles are woven into the fabric of the Lifecycle. The secret to understanding the Business Dimensional Lifecycle is cleverly hidden in its name : It is business-based, it takes a dimensional approach to designing data models for end- user presentation, and it is a true lifecycle.

Lifecycle Tracks and Task Areas

The BI/DW system is a complex entity, and the methodology to build that system must help simplify that complexity. Figure 1 outlines the Lifecycle. The 13 boxes show the major task areas involved in building a successful data warehouse and the primary dependencies among those tasks.

image from book
Figure 1: The Business Dimensional Lifecycle
image from book
BETTER LATE THAN NEVER

Delivering the entire solution has always been one of our fundamental principles. We wouldnt even consider building a data warehouse without delivering what is now known as the Business Intelligence layer. The rest of the industry came to this understanding in the late 1990s after years of failed projects that focused on creating the data warehouse database. Their general approach was build it and they will come. This almost never works.

image from book
 

There are several observations to make about the Lifecycle at this level. First, notice the central role of the Business Requirements Definition box. Business requirements provide the foundation for the three tracks that follow. They also influence the project plan, hence the arrow pointing back to the Project Planning box. You usually end up modifying the plan based on a more detailed understanding of the business requirements and priorities.

Second, the three tracks in the middle of the lifecycle concentrate on three separate areas:

  • The top track is about technology. These tasks are primarily about planning which pieces of Microsoft technology youll use, and how youll install and configure them.

  • The middle track is about data. In the data track youll design and instantiate the dimensional model, and develop the Extract, Transformation, and Load (ETL) system to populate it. You could think of the data track as building the data warehouse databases, although your data warehouse will not succeed unless you surround it with the rest of the Lifecycle tasks.

  • The bottom track is about business intelligence applications. In these tasks you design and develop BI applications for the business users.

The tracks combine when its time to deploy the system. This is a particularly delicate time because theres only one chance to make a good first impression . Maintaining the DW/BI system doesnt begin after deployment. You need to design your system with the ability and tools for maintaining it. The growth phase of the project links to the arrow heading back to the beginning. This simple arrow has major implications. The Lifecycles incremental approach is a fundamental element of delivering business value.

Underlying the entire Lifecycle is the Project Management box. The most important thing to remember here is that you need a leader, and that person needs access to senior management. The team leader is ideally one of those difficult-to-find people who can communicate effectively with both technologists and business people, including the most senior executives in the company.

Key Terminology and the Microsoft Toolset

The business intelligence industry is plagued with terminology thats used imprecisely, or in contradictory ways. Some of the most long-standing debates in the industry derive as much from misunderstandings about what others mean by a term , as from true differences in philosophy. Keeping that in mind, well try to be clear and consistent even if we dont settle all the historical debates. We highlight some of the key terms here.

As we define each term, we are highlighting the associated Microsoft technologies, most of which are components of SQL Server 2005.

  • The data warehouse is the platform for business intelligence. In the Kimball Method, the data warehouse includes everything from the original data extracts to the software and applications that users see. We disagree with other authors who insist that the data warehouse is merely a centralized and highly normalized store of data in the back room, far from the end users. To reduce confusion, in this book we consistently use the phrase data warehouse/business intelligence system (DW/BI system) to mean the entire end-to-end system. When were talking specifically and exclusively about the atomic-level, user-queryable data store, we call it the data warehouse database .

  • The business process dimensional model is a specific discipline for modeling data that is an alternative to normalized modeling. A dimensional model contains the same information as a normalized model but packages the data in a symmetrical format whose design goals are user understandability, business intelligence query performance, and resilience to change. Normalized models, sometimes called third normal form models, were designed to support the high-volume, single-row inserts and updates that define transaction systems, and generally fail at being understandable, fast, and resilient to change.

    We use the term business process dimensional model to refer both to the logical dimensional model that supports a business process and the corresponding physical tables in the database. In other words, dimensional models are both logical and physical.

  • The relational database is a general-purpose technology for storing, managing, and querying data. The SQL Server 2005 database engine is Microsofts relational database engine. The business process dimensional model can be stored in a relational database. Normalized data models that support transaction processing can also be stored in a relational database.

  • The online analytic processing (OLAP) database is a technology for storing, managing, and querying data specifically designed to support business intelligence uses. SQL Server 2005 Analysis Services is Microsofts OLAP database engine. The business process dimensional model can be stored in an OLAP database, but a transactional database cannot, unless it first undergoes transformation to cast it in an explicitly dimensional form.

  • An Extract, Transformation, and Load (ETL) system is a set of processes that clean, transform, combine, de-duplicate, household, archive, conform, and structure data for use in the data warehouse. These terms are described in this book. Early ETL systems were built using a combination of SQL and other scripts. While this is still true for most smaller ETL systems, larger and more serious systems use a specialized ETL tool. Moving forward, almost every DW/BI system will use an ETL tool such as SQL Server 2005 Integration Services because the benefits are significant and the incremental dollar cost is low or zero.

  • Business intelligence (BI) applications are predefined applications that query, analyze, and present information to support a business need. There is a spectrum of BI applications, ranging in complexity from a set of predefined static reports, all the way to an analytic application that directly affects transaction systems and the day-to-day operation of the organization. You can use SQL Server Reporting Services to build a reporting application, and a wide range of Microsoft and third-party technologies to build complex, analytic applications.

  • A data mining model is a statistical model, often used to predict future behavior based on data about past behavior. Data mining is a term for a loose (and ever-changing) collection of statistical techniques or algorithms that serve different purposes. The major categories are clustering, decision trees, neural networks, and prediction. Analysis Services Data Mining is an example of a data mining tool.

  • Ad hoc queries are formulated by the user on the spur of the moment. The dimensional modeling approach is widely recognized as the best technique to support ad hoc queries because the simple database structure is easy to understand. Microsoft Office, notably Excel pivot tables, is the most popular ad hoc query tool on the market. You can use Reporting Services Report Builder to perform ad hoc querying and simple report definition. Nonetheless, many systems supplement Excel and Report Builder with a third-party ad hoc query tool for their power users.

  • Once again, the data warehouse/business intelligence (DW/BI) system is the whole thing: source system extracts, ETL, dimensional database in both relational and OLAP, BI applications, and an ad hoc query tool. The DW/BI system also includes management tools and practices, useroriented documentation and training, a security system, and all the other components that we discuss in this book.

Roles and Responsibilities

The DW/BI system requires a number of different roles and skills, from both the business and technical communities, during its lifecycle. In this section, we review the major roles involved in creating a DW/BI system. There is seldom a one-to-one relationship between roles and people. Weve worked with teams as small as one person, and as large as forty (and heard of much larger teams). The vast majority of DW/BI teams fall between three and seven full-time members , with access to others as required.

Its common for a single DW/BI team to take on both development and operational duties . This is different from most technology project teams, and is related to the highly iterative nature of the DW/BI project development cycle. The following roles are associated with design and development activities:

  • The DW/BI manager is responsible for overall leadership and direction of the project. The DW/BI manager must be able to communicate effectively with both senior business and IT management. The manager must also be able to work with the team to formulate the overall architecture of the DW/BI system.

  • The project manager is responsible for day-to-day management of project tasks and activities during system development.

  • The business project lead is a member of the business community and works closely with the project manager.

  • The business systems analyst (or business analyst) is responsible for leading the business requirements definition activities, and often participates in the development of the business process dimensional model. The business systems analyst needs to be able to bridge the gap between business and technology.

  • The data modeler is responsible for performing detailed data analysis including data profiling, and developing the detailed dimensional model.

  • The system architect(s) design the various components of the DW/BI system. These include the ETL system, security system, auditing system, and maintenance systems.

  • The development database administrator (DBA) creates the relational data warehouse database(s) and is responsible for the overall physical design including disk layout, partitioning, and initial indexing plan.

  • The OLAP database designer creates the OLAP databases.

  • The ETL system developer creates Integration Services packages, scripts, and other elements to move data from the source databases into the data warehouse.

  • The DW/BI management tools developer writes any custom tools that are necessary for the ongoing management of the DW/BI system. Examples of such tools include a simple UI for entering metadata, scripts or Integration Services packages to perform system backups and restores , and a simple UI for maintaining dimension hierarchies.

  • The BI application developer is responsible for building the BI applications, including the standard reports and any advanced analytic applications required by the business. This role is also responsible for developing any custom components in the BI portal and integrating data mining models into business operations.

Most of the rest of the roles play a part in the latter stages of the DW/BI project development cycle, as the team moves toward deploying and operating the system. A few of the roles are strictly operational.

  • The data steward is responsible for ensuring the data in the data warehouse is accurate.

  • The security manager specifies new user access roles that the business users need, and adds users to existing roles. The security manager also determines the security procedures in the ETL back room of the DW/BI system.

  • The BI portal content manager manages the BI portal. She determines the content thats on the portal and how its laid out, and keeps it fresh.

  • The DW/BI educator creates and delivers the training materials for the BI/DW system.

  • The relational database administrator (DBA) is responsible for managing the performance and operations of the relational data warehouse database.

  • The OLAP DBA is responsible for managing the performance and operations of the OLAP data warehouse database.

  • The compliance manager is responsible for ensuring that the DW/BI policies and operations comply with corporate and regulatory directives such as privacy policies, HIPAA, and Sarbanes-Oxley. The compliance manager works closely with the security manager and Internal Audit.

  • The metadata manager has the final word on what metadata is collected, where it is kept, and how its published to the business community. As we discuss in Chapter 13, metadata tends not to be managed unless theres a person identified to lead the charge.

  • The data mining analyst is deeply familiar with the business and usually has some background in statistics. The data mining analyst develops data mining models and works with the BI application developers to design operational applications that use the data mining models.

  • User support personnel within the DW/BI team must be available to help business users, especially with ad hoc access. Corporate-wide help desks tend not to have the specialized expertise necessary to do more than assist with minor connectivity issues.



Microsoft Data Warehouse Toolkit. With SQL Server 2005 and the Microsoft Business Intelligence Toolset
The MicrosoftВ Data Warehouse Toolkit: With SQL ServerВ 2005 and the MicrosoftВ Business Intelligence Toolset
ISBN: B000YIVXC2
EAN: N/A
Year: 2006
Pages: 125

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net