| ||
The good news is that the SQL Server toolset is mostly metadata driven. The relational engine has a slew of system tables that define and describe the data structures, activity monitoring, security, and other functions along with a set of stored procedures to manage it. Other components , like Analysis Services and Integration Services are based on similar metadata, but its kept in an object-oriented structure in XML files. Much, if not all, of the property-based metadata in SQL Server 2005 can be accessed through the various object models.
The bad news is that every major component of SQL Server 2005 keeps its metadata in its own independent structures, from database tables to XML files, which have their own access methods , from SQL Management Objects (SMO) and Analysis Management Objects (AMO) to stored procedures to APIs. Not only do the tools manage their own metadata, but the metadata they use is not integrated across the tools. The flexibility you get from the programmable nature of the tools and the Visual Studio development environment makes it particularly difficult to identify which packages pulled a particular set of data and applied which transformations and loaded it into what relational database, and what cubes and what reports .
As we describe in the next section, the first step in every metadata strategy is to assess the situation. You need to conduct a detailed inventory of what metadata structures are available, which ones are actually being used, what tools you have to view the metadata, and what tools you have to manage it. Table 13.1 provides a convenient summary of the various metadata sources and stores across the SQL Server 2005 BI platform, and identifies various tools for accessing and viewing them. The remainder of this section describes the major components in Table 13.1.
TOOL/COMPONENT | TYPE | CONTENT | ACCESS AND MAINTENANCE METHODS | EXAMPLES | RESOURCE LOCATION |
---|---|---|---|---|---|
Cross-Tool Components | |||||
SQL Agent | All | Job and schedule definition and execution | SQL Server Studio; stored procedures; SQL Management Objects (SMO). | sp_help_job; dbo.sysjobs_view | BOL search: automating administrative tasks or Programming SMO |
SQL Server Profiler | Process | Relational and Analysis Services engine activity | SQL Server Profiler; SQL Trace. | See Chapter 15. | BOL search: Introducing SQL Server Profiler |
The SQL Server BI Documentation Center | All | Full range of metadata from relational DB, AS, and SSIS packages | Stand-alone tool creates XML files for web browsing. | Microsoft Developers Network (MSDN) http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx | |
Microsoft Metadata Whitepaper | All | Microsofts recommended approach for managing SQL Server 2005 metadata | MSDN: http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx | ||
Metadata Reporting Pack for SQL Server Integration Services | Technical,Process | SSIS package definitions and reports for impact and lineage analysis across the relational DB, AS, SSIS, and Reporting Services | Stand-alone tool to load a metadata database and a viewer, along with SSIS packages and Reporting Services reports. | MSDN: http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx | |
Relational Engine | |||||
System tables | Technical, Business, Process | Object descriptions, definitions, parameters, security, relationships, and settings | Catalog views (CVs); Information Schema views; SMO. | sys.objects; sys. columns | BOL search: catalog views; information schema views |
System stored procedures | Technical, Business, Process | Integrated system table queries, database management functions | System stored procedures (SPs); SMO. | sp_who (current database processes | BOL search: system stored procedures |
Extended properties | Business | Business metadata | Design spreadsheet, SQL Studio, Stored Procedures, Catalog views, and SMO. | See reports in this chapter. | BOL search: extended properties |
Analysis Services | |||||
AS Object model | Technical, Business, Process | Object descriptions, definitions, parameters, security, relationships, hierarchies, and settings | SQL Server BI -Development Studio; SQL Server Management Studio; Analysis Management Objects (AMO). | See Figure 13.4. | BOL search: analysis management objects Default sample location: Program Files\Microsoft SQL Server\90\Samples\Analysis Services\ - Programmability\AMO\AmoBrowser - Administrator\ActivityViewer |
Integration Services | |||||
Integration Services Object model | Technical, Business, Process | Object descriptions, definitions, parameters, security, relationships, hierarchies, and settings | SQL Server Management Studio; Integration Services object model; SQL Server BI Development Studio. | BOL search: integration services programming Programming examples: Microsoft SQL Server\90\ Samples\Integration Services\Programmin g Samples | |
Integration Services Package logging, and Integration Services Logging Report Pack | Process | SSIS process metrics | Data written to various log providers. The Logging Report Pack is a set of reports based on SSIS logging data. | See Chapter 15. | BOL search: Logging Package Execution Logging Report Pack: Microsoft Developers Network (MSDN) http://msdn.microsoft .com/SQL/ sqlwarehouse/SSIS/default.aspx |
Data Warehouse Audit system | Process | ETL process metrics, data quality flags and metrics | SQL queries; Reports. | See Chapters 6 and 15. | |
Reporting Services | |||||
Report Manager | Technical, Business, Process | Report definitions, schedules, folders, security, and so on | SQL Server Management Studio; Report Manager web page; web service (ADO.NET). | See Chapter 9. | |
Execution Log | Technical, Business, Process | Server, user and report activity | SQL Script, Report Server reports. | See Chapter 15. | C:\Program Files\ Microsoft SQL Server\ 90\Samples\ Reporting Services\ Report Samples\ Server Management Sample Reports |
Report Builder | Technical, Business Models | Semantic layer between user interface and data sources | SQL Server BI Development Studio; Reporting Services object model. | See Figures 13.2 and 13.3. | BOL search: Reporting Services object model (yields several topics) |
External Components | |||||
System Monitor Performance tool | Process | System level performance (reads, writes , buffers, and so on) | System Monitor, also known as perfmon. | See Chapter 15. | Monitor tool: Administrative Tools/ Performance BOL search: monitoring performance (yields several topics) |
Active Directory | Technical, Business | Security, organizational info, employee info | System.DirectoryServices namespace in .NET Framework; Query through OLE DB Provider. | BOL search: Query Active Directory, OLE DB Directory |
Several metadata components in SQL Server support more than one tool in the system. We decided to list them once rather than repeating them under each heading:
SQL Server Agent is the job scheduler of the SQL Server world. It contains information about jobs, job steps, and schedules. SQL Server Agent metadata can be accessed through a set of stored procedures, system tables, SQL Management Objects (SMO), and SQL Server Management Studio. From a metadata perspective, it is important to monitor SQL Server Agents process metadata to make sure jobs are running and completing successfully. Beyond that, even folks in the user community are interested in knowing what jobs are scheduled and when they are supposed to run.
SQL Server Profiler is SQL Servers activity monitoring tool. You define a trace to track the occurrences of specific events, like Audit Logon, and have those occurrences written out to a table or file. SQL Server Profiler can be used interactively from the Profiler tool, or initiated programmatically using stored procedures. Chapter 15 describes using SQL Server Profiler to create an audit log of DW/BI system usage.
SQL Server BI Documentation Center (DocCenter) is a reporting utility designed to create a metadata catalog of sorts. It was developed by the SQL Server Team as an extra to help customers gather and explore metadata from three major sources: the relational engine, Analysis Services, and Integration Services packages. When you run DocCenter, it reads through the system tables or the object models, depending on the subsystem you picked, and creates a set of XML documents along with an XSLT reader. This allows the viewer to navigate the various sets of metadata through a set of standard, linked reports. DocCenter serves a useful role in the survey step of creating a metadata strategy. It can also be used as a basic metadata browser for these major subsystems. DocCenter can be found at the Microsoft Development Network SQL developer center: http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx.
Metadata Reporting Pack for SQL Server Integration Services (Metadata Pack) is another utility designed to create a metadata repository. In this case, a program parses through the Integration Services packages to identify the sources, targets, transformations, and dependencies involved in the data flows. The results of this parsing process are stored in a relational database. Because these sources and targets are often relational, analysis services, and reporting objects, this data can be used to provide impact and lineage analysis from the relational database all the way out to the standard reports. The initial version of the Metadata Pack does not track all the components of an Integration Services package, but the Integration Services team continues to enhance it. The Metadata Pack can also be found at the MSDN SQL developer center.
The SQL Server relational engine itself has system tables that can be accessed through a set of system views (called catalog views) or system stored procedures designed to report on the system table contents and, in some cases, to edit the contents. Alternatively, they can be accessed programmatically through SQL Server Management Objects (SMO), which itself supersedes SQL Distributed Management Objects (SQL-DMO).
Business metadata can be stored along with the table and column definitions by using the extended properties function. This allows the developer to append any number of additional descriptive metadata elements onto the database objects. The dimensional model design spreadsheet described in Chapter 2 makes use of extended properties to store several metadata columns like description, comments, example values, and ETL rules. The names of these extended properties must be part of your naming convention so programs and queries can find them.
Note | Description is not a stock extended property name , but Visual Studio will populate an extended property called description through the user interface (see the Books Online topic, Description Property Dialog Box for more information). |
Process metadata for the relational engine can be captured through the SQL Server Profiler component described earlier in this section. Chapter 15 describes a performance-monitoring log for the relational engine.
Analysis Services object model has all the same kinds of metadata that the relational engine has and more. From the definitions of the databases, cubes, dimensions, facts, and attributes to the KPIs, calculated columns, and hierarchy structures, not to mention the data mining modelsall are available through the object model. The first and obvious access tools for the developer are the BI Development Studio and the SQL Server Management Studio. Beyond these, you can build custom .NET-based applications that use Analysis Management Objects (AMO) to access the Analysis Services object model.
Analysis Services process metadata can be captured through the SQL Server Profiler component described earlier in this section. Chapter 15 describes a performance-monitoring log for Analysis Services.
The Data Source Views (DSVs) used by Analysis Services, Report Builder, and Integration Services provide an intriguing metadata layer. A DSV is essentially an abstraction layer between the source systems and the processes that reference them. DSVs have a sub-model layer called a diagram that allows the developer to define even simpler sub-views of the DSV, and that supports the idea of role-playing tables. For example, you could create a DSV for the entire data warehouse, with multiple fact tables and all associated dimensions. Then, you could create diagrams in the DSV that include tables for only a given subject area, like Orders. Additional diagrams might cover Shipments, Returns, and Customer Care calls. The entities shown in a DSV dont even have to be tables. You can create a named query in the DSV that looks and acts like a table to any tool that uses the DSV. DSVs also provide an additional descriptive property called FriendlyName. Unfortunately, DSVs are not directly referenced as metadata anywhere outside of the BI Development Studio in the initial release of SQL Server 2005. The Report Builder ad hoc query tool uses a DSV to create its own abstraction layer, called a model. Analysis Services Enterprise Edition offers a capability called Perspectives. Perspectives are similar to diagrams and provide for the logical sub-setting of a cube. These perspectives are used in the creation of Report Builder models based on Analysis Services. Analysis Services also has its own security metadata to provide the ability to limit access to subsets of the data.
As one might expect, SQL Server Integration Services has its own object model. The real difficulty with Integration Services from a metadata perspective is that its essentially a visual programming environment as opposed to a structured database environment, like the relational engine or Analysis Services. SSIS packages can be incredibly complex and do not follow a common structure. In fact, much of what might be considered Integration Services metadata will actually be the result of defining and using standard naming conventions for packages and tasks. Another difficulty with Integration Services is its time-dependent nature. At any point in time, it is difficult to tell which package was used to execute which load. One day, a certain ETL process could be run from a package saved in the file system. The next day, the ETL developer changes the SQL Agent task to point to another package stored in the database that uses different logic and business rules. While the packages will have distinct GUIDs, the names will be the same. Keeping track of this obviously requires a clear set of development process rules, naming standards, discipline, and monitoring.
Regardless of where a package comes from, it can and should be set up to create its own process metadata. As we describe in Chapter 15, you should turn on logging at the package level, and select from a long list of events to log. There are several options for determining where the information is logged, including to a database table. Table 13.1 references some example Reporting Services reports on MSDN that demonstrate how to access the log data and use it to track the execution and performance of your SSIS packages.
In addition to this base process-level logging, the data warehouse audit system described in Chapter 6 ties the process metadata back to the actual data that was loaded in a given ETL package. This metadata surfaces in the user interface in the form of an Audit dimension and associated audit tables that allow the user to get a sense for where the data came from and how it was loaded. Chapter 6 also describes an extended version of the audit system that can be used to monitor data quality and flag data quality issues at the row level if necessary.
Reporting Services is entirely metadata driven. The contents, operation, usage, and security are all described in a set of metadata tables in the ReportServer database. It is possible to query these tables directly to see how Reporting Services works. However, rather than build reports on top of the production database, which could impact performance and potentially break when Microsoft changes the database, it makes sense to extract the process metadata into a separate reporting and analysis schema. Microsoft has included a couple of SQL scripts on the samples that create this schema and update it on a scheduled basis, along with a few example reports written against the schema. Figure 13.1 shows the Reporting Services process metadata schema created by the sample package. Not surprisingly, it is a solid dimensional schema (with the exception of the unnecessary snowflaking of the ReportTypes table from Reports dimension). The Reporting Services database is also surfaced through an object model that itself is accessible through a web service.
Like many front-end tools, Report Builder uses metadata to define the objects, attributes, join paths, and calculations that it needs to formulate a query. This metadata set is called a Report Builder model and is created using the BI Development Studio. The relational version is built on top of its own Data Source View and is kept in the Reporting Services database. Report Builder models used to access Analysis Services are built directly from the Analysis Services cube. Well show an example of a Report Builder model in the next section.
There are several useful metadata sources that exist in the broader computing environment. Chief among these are the System Monitor tool and Active Directory. If you use a source control tool like Visual Source Safe, you may also consider that tools data as a metadata source.
The Windows System Monitor performance tool is familiar to anyone who has done system performance troubleshooting in Windows. It works at the operating system level much like SQL Server Profiler does at the SQL Server level. You can define traces for a whole range of system activities and measures across the BI toolset and in the operating system itself. These traces can be defined to run in the background and to write to log files. If necessary, these logs can be tied back to SQL Profiler logs to help understand the possible causes of any problems. Chapter 15 describes which System Performance indicators are most important to log during regular operations.
Active Directory is Microsofts network-based user logon management facility. It is a directory structure that can hold user, group , security, and other organizational information. Depending on how your organization is using Active Directory, it can be a source for security- related information. It can also be the system of record for some of the descriptive attributes of the Employee and Organization dimensions. This is another example of where the line between metadata and data can get fuzzy.
Clearly the SQL Server 2005 development team understands the importance of metadata from a development point of view. However, even though SQL Server 2005 has lots of metadata, that metadata is not very well integrated. Careful metadata integration and active metadata management did not appear to be Microsofts top priority for the core SQL Server 2005 product. Fortunately, the development team has been working on a lot of add-on utilities to help fill this gap, many of which are referenced in this chapter. In the past, Microsoft has been a leader on metadata issues, so we would be surprised if upcoming releases of SQL Server did not address the problem of managing and integrating metadata as part of the core product.
| ||