SQL Server 2005 Metadata

The good news is that the SQL Server toolset is mostly metadata driven. The relational engine has a slew of system tables that define and describe the data structures, activity monitoring, security, and other functions along with a set of stored procedures to manage it. Other components , like Analysis Services and Integration Services are based on similar metadata, but its kept in an object-oriented structure in XML files. Much, if not all, of the property-based metadata in SQL Server 2005 can be accessed through the various object models.

The bad news is that every major component of SQL Server 2005 keeps its metadata in its own independent structures, from database tables to XML files, which have their own access methods , from SQL Management Objects (SMO) and Analysis Management Objects (AMO) to stored procedures to APIs. Not only do the tools manage their own metadata, but the metadata they use is not integrated across the tools. The flexibility you get from the programmable nature of the tools and the Visual Studio development environment makes it particularly difficult to identify which packages pulled a particular set of data and applied which transformations and loaded it into what relational database, and what cubes and what reports .

As we describe in the next section, the first step in every metadata strategy is to assess the situation. You need to conduct a detailed inventory of what metadata structures are available, which ones are actually being used, what tools you have to view the metadata, and what tools you have to manage it. Table 13.1 provides a convenient summary of the various metadata sources and stores across the SQL Server 2005 BI platform, and identifies various tools for accessing and viewing them. The remainder of this section describes the major components in Table 13.1.

Table 13.1: Metadata Sources and Stores in the SQL Server 2005 BI Platform

TOOL/COMPONENT

TYPE

CONTENT

ACCESS AND MAINTENANCE METHODS

EXAMPLES

RESOURCE LOCATION

Cross-Tool Components

SQL Agent

All

Job and schedule definition and execution

SQL Server Studio; stored procedures; SQL Management Objects (SMO).

sp_help_job; dbo.sysjobs_view

BOL search: automating administrative tasks or Programming SMO

SQL Server Profiler

Process

Relational and Analysis Services engine activity

SQL Server Profiler; SQL Trace.

See Chapter 15.

BOL search: Introducing SQL Server Profiler

The SQL Server BI Documentation Center

All

Full range of metadata from relational DB, AS, and SSIS packages

Stand-alone tool creates XML files for web browsing.

Microsoft Developers Network (MSDN) http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx

 

Microsoft Metadata Whitepaper

All

Microsofts recommended approach for managing SQL Server 2005 metadata

   

MSDN: http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx

Metadata Reporting Pack for SQL Server Integration Services

Technical,Process

SSIS package definitions and reports for impact and lineage analysis across the relational DB, AS, SSIS, and Reporting Services

Stand-alone tool to load a metadata database and a viewer, along with SSIS packages and Reporting Services reports.

 

MSDN: http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx

Relational Engine

System tables

Technical, Business, Process

Object descriptions, definitions, parameters, security, relationships, and settings

Catalog views (CVs); Information Schema views; SMO.

sys.objects; sys. columns

BOL search: catalog views; information schema views

System stored procedures

Technical, Business, Process

Integrated system table queries, database management functions

System stored procedures (SPs); SMO.

sp_who (current database processes

BOL search: system stored procedures

Extended properties

Business

Business metadata

Design spreadsheet, SQL Studio, Stored Procedures, Catalog views, and SMO.

See reports in this chapter.

BOL search: extended properties

Analysis Services

AS Object model

Technical, Business, Process

Object descriptions, definitions, parameters, security, relationships, hierarchies, and settings

SQL Server BI -Development Studio; SQL Server Management Studio; Analysis Management Objects (AMO).

See Figure 13.4.

BOL search: analysis management objects Default sample location: Program Files\Microsoft SQL Server\90\Samples\Analysis Services\ - Programmability\AMO\AmoBrowser - Administrator\ActivityViewer

Integration Services

Integration Services Object model

Technical, Business, Process

Object descriptions, definitions, parameters, security, relationships, hierarchies, and settings

SQL Server Management Studio; Integration Services object model; SQL Server BI Development Studio.

 

BOL search: integration services programming Programming examples: Microsoft SQL Server\90\ Samples\Integration Services\Programmin g Samples

Integration Services Package logging, and Integration Services Logging Report Pack

Process

SSIS process metrics

Data written to various log providers. The Logging Report Pack is a set of reports based on SSIS logging data.

See Chapter 15.

BOL search: Logging Package Execution Logging Report Pack: Microsoft Developers Network (MSDN) http://msdn.microsoft .com/SQL/ sqlwarehouse/SSIS/default.aspx

Data Warehouse Audit system

Process

ETL process metrics, data quality flags and metrics

SQL queries; Reports.

See Chapters 6 and 15.

 

Reporting Services

Report Manager

Technical, Business, Process

Report definitions, schedules, folders, security, and so on

SQL Server Management Studio; Report Manager web page; web service (ADO.NET).

See Chapter 9.

 

Execution Log

Technical, Business, Process

Server, user and report activity

SQL Script, Report Server reports.

See Chapter 15.

C:\Program Files\ Microsoft SQL Server\ 90\Samples\ Reporting Services\ Report Samples\ Server Management Sample Reports

Report Builder

Technical, Business Models

Semantic layer between user interface and data sources

SQL Server BI Development Studio; Reporting Services object model.

See Figures 13.2 and 13.3.

BOL search: Reporting Services object model (yields several topics)

External Components

System Monitor Performance tool

Process

System level performance (reads, writes , buffers, and so on)

System Monitor, also known as perfmon.

See Chapter 15.

Monitor tool: Administrative Tools/ Performance BOL search: monitoring performance (yields several topics)

Active Directory

Technical, Business

Security, organizational info, employee info

System.DirectoryServices namespace in .NET Framework; Query through OLE DB Provider.

 

BOL search: Query Active Directory, OLE DB Directory

Cross-Tool Components

Several metadata components in SQL Server support more than one tool in the system. We decided to list them once rather than repeating them under each heading:

  • SQL Server Agent is the job scheduler of the SQL Server world. It contains information about jobs, job steps, and schedules. SQL Server Agent metadata can be accessed through a set of stored procedures, system tables, SQL Management Objects (SMO), and SQL Server Management Studio. From a metadata perspective, it is important to monitor SQL Server Agents process metadata to make sure jobs are running and completing successfully. Beyond that, even folks in the user community are interested in knowing what jobs are scheduled and when they are supposed to run.

  • SQL Server Profiler is SQL Servers activity monitoring tool. You define a trace to track the occurrences of specific events, like Audit Logon, and have those occurrences written out to a table or file. SQL Server Profiler can be used interactively from the Profiler tool, or initiated programmatically using stored procedures. Chapter 15 describes using SQL Server Profiler to create an audit log of DW/BI system usage.

  • SQL Server BI Documentation Center (DocCenter) is a reporting utility designed to create a metadata catalog of sorts. It was developed by the SQL Server Team as an extra to help customers gather and explore metadata from three major sources: the relational engine, Analysis Services, and Integration Services packages. When you run DocCenter, it reads through the system tables or the object models, depending on the subsystem you picked, and creates a set of XML documents along with an XSLT reader. This allows the viewer to navigate the various sets of metadata through a set of standard, linked reports. DocCenter serves a useful role in the survey step of creating a metadata strategy. It can also be used as a basic metadata browser for these major subsystems. DocCenter can be found at the Microsoft Development Network SQL developer center: http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx.

  • Metadata Reporting Pack for SQL Server Integration Services (Metadata Pack) is another utility designed to create a metadata repository. In this case, a program parses through the Integration Services packages to identify the sources, targets, transformations, and dependencies involved in the data flows. The results of this parsing process are stored in a relational database. Because these sources and targets are often relational, analysis services, and reporting objects, this data can be used to provide impact and lineage analysis from the relational database all the way out to the standard reports. The initial version of the Metadata Pack does not track all the components of an Integration Services package, but the Integration Services team continues to enhance it. The Metadata Pack can also be found at the MSDN SQL developer center.

Relational Engine Metadata

The SQL Server relational engine itself has system tables that can be accessed through a set of system views (called catalog views) or system stored procedures designed to report on the system table contents and, in some cases, to edit the contents. Alternatively, they can be accessed programmatically through SQL Server Management Objects (SMO), which itself supersedes SQL Distributed Management Objects (SQL-DMO).

Business metadata can be stored along with the table and column definitions by using the extended properties function. This allows the developer to append any number of additional descriptive metadata elements onto the database objects. The dimensional model design spreadsheet described in Chapter 2 makes use of extended properties to store several metadata columns like description, comments, example values, and ETL rules. The names of these extended properties must be part of your naming convention so programs and queries can find them.

Note 

Description is not a stock extended property name , but Visual Studio will populate an extended property called description through the user interface (see the Books Online topic, Description Property Dialog Box for more information).

Process metadata for the relational engine can be captured through the SQL Server Profiler component described earlier in this section. Chapter 15 describes a performance-monitoring log for the relational engine.

Analysis Services

Analysis Services object model has all the same kinds of metadata that the relational engine has and more. From the definitions of the databases, cubes, dimensions, facts, and attributes to the KPIs, calculated columns, and hierarchy structures, not to mention the data mining modelsall are available through the object model. The first and obvious access tools for the developer are the BI Development Studio and the SQL Server Management Studio. Beyond these, you can build custom .NET-based applications that use Analysis Management Objects (AMO) to access the Analysis Services object model.

Analysis Services process metadata can be captured through the SQL Server Profiler component described earlier in this section. Chapter 15 describes a performance-monitoring log for Analysis Services.

The Data Source Views (DSVs) used by Analysis Services, Report Builder, and Integration Services provide an intriguing metadata layer. A DSV is essentially an abstraction layer between the source systems and the processes that reference them. DSVs have a sub-model layer called a diagram that allows the developer to define even simpler sub-views of the DSV, and that supports the idea of role-playing tables. For example, you could create a DSV for the entire data warehouse, with multiple fact tables and all associated dimensions. Then, you could create diagrams in the DSV that include tables for only a given subject area, like Orders. Additional diagrams might cover Shipments, Returns, and Customer Care calls. The entities shown in a DSV dont even have to be tables. You can create a named query in the DSV that looks and acts like a table to any tool that uses the DSV. DSVs also provide an additional descriptive property called FriendlyName. Unfortunately, DSVs are not directly referenced as metadata anywhere outside of the BI Development Studio in the initial release of SQL Server 2005. The Report Builder ad hoc query tool uses a DSV to create its own abstraction layer, called a model. Analysis Services Enterprise Edition offers a capability called Perspectives. Perspectives are similar to diagrams and provide for the logical sub-setting of a cube. These perspectives are used in the creation of Report Builder models based on Analysis Services. Analysis Services also has its own security metadata to provide the ability to limit access to subsets of the data.

Integration Services

As one might expect, SQL Server Integration Services has its own object model. The real difficulty with Integration Services from a metadata perspective is that its essentially a visual programming environment as opposed to a structured database environment, like the relational engine or Analysis Services. SSIS packages can be incredibly complex and do not follow a common structure. In fact, much of what might be considered Integration Services metadata will actually be the result of defining and using standard naming conventions for packages and tasks. Another difficulty with Integration Services is its time-dependent nature. At any point in time, it is difficult to tell which package was used to execute which load. One day, a certain ETL process could be run from a package saved in the file system. The next day, the ETL developer changes the SQL Agent task to point to another package stored in the database that uses different logic and business rules. While the packages will have distinct GUIDs, the names will be the same. Keeping track of this obviously requires a clear set of development process rules, naming standards, discipline, and monitoring.

Regardless of where a package comes from, it can and should be set up to create its own process metadata. As we describe in Chapter 15, you should turn on logging at the package level, and select from a long list of events to log. There are several options for determining where the information is logged, including to a database table. Table 13.1 references some example Reporting Services reports on MSDN that demonstrate how to access the log data and use it to track the execution and performance of your SSIS packages.

In addition to this base process-level logging, the data warehouse audit system described in Chapter 6 ties the process metadata back to the actual data that was loaded in a given ETL package. This metadata surfaces in the user interface in the form of an Audit dimension and associated audit tables that allow the user to get a sense for where the data came from and how it was loaded. Chapter 6 also describes an extended version of the audit system that can be used to monitor data quality and flag data quality issues at the row level if necessary.

Reporting Services

Reporting Services is entirely metadata driven. The contents, operation, usage, and security are all described in a set of metadata tables in the ReportServer database. It is possible to query these tables directly to see how Reporting Services works. However, rather than build reports on top of the production database, which could impact performance and potentially break when Microsoft changes the database, it makes sense to extract the process metadata into a separate reporting and analysis schema. Microsoft has included a couple of SQL scripts on the samples that create this schema and update it on a scheduled basis, along with a few example reports written against the schema. Figure 13.1 shows the Reporting Services process metadata schema created by the sample package. Not surprisingly, it is a solid dimensional schema (with the exception of the unnecessary snowflaking of the ReportTypes table from Reports dimension). The Reporting Services database is also surfaced through an object model that itself is accessible through a web service.

image from book
Figure 13.1: Microsofts Reporting Services process metadata reporting schema

Like many front-end tools, Report Builder uses metadata to define the objects, attributes, join paths, and calculations that it needs to formulate a query. This metadata set is called a Report Builder model and is created using the BI Development Studio. The relational version is built on top of its own Data Source View and is kept in the Reporting Services database. Report Builder models used to access Analysis Services are built directly from the Analysis Services cube. Well show an example of a Report Builder model in the next section.

External Metadata Sources

There are several useful metadata sources that exist in the broader computing environment. Chief among these are the System Monitor tool and Active Directory. If you use a source control tool like Visual Source Safe, you may also consider that tools data as a metadata source.

System Monitor

The Windows System Monitor performance tool is familiar to anyone who has done system performance troubleshooting in Windows. It works at the operating system level much like SQL Server Profiler does at the SQL Server level. You can define traces for a whole range of system activities and measures across the BI toolset and in the operating system itself. These traces can be defined to run in the background and to write to log files. If necessary, these logs can be tied back to SQL Profiler logs to help understand the possible causes of any problems. Chapter 15 describes which System Performance indicators are most important to log during regular operations.

Active Directory

Active Directory is Microsofts network-based user logon management facility. It is a directory structure that can hold user, group , security, and other organizational information. Depending on how your organization is using Active Directory, it can be a source for security- related information. It can also be the system of record for some of the descriptive attributes of the Employee and Organization dimensions. This is another example of where the line between metadata and data can get fuzzy.

Looking Forward on SQL Server Metadata

Clearly the SQL Server 2005 development team understands the importance of metadata from a development point of view. However, even though SQL Server 2005 has lots of metadata, that metadata is not very well integrated. Careful metadata integration and active metadata management did not appear to be Microsofts top priority for the core SQL Server 2005 product. Fortunately, the development team has been working on a lot of add-on utilities to help fill this gap, many of which are referenced in this chapter. In the past, Microsoft has been a leader on metadata issues, so we would be surprised if upcoming releases of SQL Server did not address the problem of managing and integrating metadata as part of the core product.



Microsoft Data Warehouse Toolkit. With SQL Server 2005 and the Microsoft Business Intelligence Toolset
The MicrosoftВ Data Warehouse Toolkit: With SQL ServerВ 2005 and the MicrosoftВ Business Intelligence Toolset
ISBN: B000YIVXC2
EAN: N/A
Year: 2006
Pages: 125

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net