System Deployment

Before you actually start deploying your system, you should perform extensive testing. Throughout the development process, you should have been conducting unit tests to confirm that components have been written properly. As we discuss in the next section, you need to perform extensive end-to-end testing, too.

The less time you spend on testing, the harder the deployment process will be. If you dont perform end-to-end testing before you deploy to production, youre guaranteed to find problems on production. If you dont rigorously check data quality, your business users will do it for you, and lose confidence in the DW/BI system at the same time. If you dont check performance and tune the system in advance, youll have to do it while trying to avoid disrupting business users.

If you test, tune, and adjust the system before you begin deployment, the move to production will be fairly straightforward. This is especially true if this is a new system. Moving incremental changes into production while minimizing end- user impact can be a delicate dance .

Pre-Deployment Testing

To successfully deploy a new DW/BI system, or changes to an existing system, plan ahead and test repeatedly until youve verified :

  • System testing procedures: Your system testing procedures are sufficiently rigorous that you can prove to users or auditors that the system has been thoroughly tested .

  • Operations process testing: The databases load and process correctly.

  • Data quality assurance testing: The data is accurate and complete, both for the historical load and for ongoing incremental loads.

  • Performance testing: The system performs well both for loads and for queries and reports , with live data rather than a smaller development dataset.

  • Usability testing: Business users can find what they need and accomplish necessary tasks .

  • Platform and edition testing: The system runs correctly on the class of hardware and editions of software of the production system.

  • Deployment testing: Deployment scripts are solid and have been rehearsed.

System Testing Procedures

Your DW/BI system is a software project, in addition to being an exercise in business politics. As part of the deployment process, test that the system works in whole and in part. The testing that we discuss in this section should be formalized to the greatest extent you and your organization can tolerate . The exact methodology you use to keep track of tests and results is much less important than having some methodology. If your organization already uses a testing methodology, you should be able to make that system work for your DW/BI system. If you dont already use a testing methodology, you should adopt or develop one for this project.

Even though your DW/BI system isnt shrink-wrapped software, its a complex custom system that requires a professional approach to testing. Too often we see projects devote inadequate time and resources to testing.

Any testing methodology will contain at least four major steps: defining your tests, developing a primary test dataset, developing datasets to test unusual conditions, and running some tests and logging the results.

Define Tests

Make a list of the tests you plan to run, and what the expected outcome is. Figure 14.2 illustrates the kind of information thats useful to collect about each test and serves as a convenient example from which to discuss testing issues. We dont present these spreadsheets as a best practice, although there are worse approaches than using Excellike the common one of not keeping track at all!

Task#

Short name

Description

Procedure

Test Script

Expected Outcome

1.00

ETL Scheduling and Operations

       

1.01

Job kickoff

Launch the ETL job DWLoadMaster at a specific time

Set the start time in SQL Agent for DWLoadMaster to a specific time (15 min from now)

 

Log table JobLogs receives a row signalling job kickoff within 10s of launch datetime.

1.02

Peoplesoft dependency (negative)

Verify DWLoadMaster terminates if the Peoplesoft job hasnt finished yet

Run DWLoadMaster using configuration file test Test01.dtsconfig.

Note: Can we create an auto-restart to try again 3 times before failure?

Log table JobLogs gets a CheckPrereq Failure row. Operator gets an email.

1.03

Peoplesoft dependency (positive)

DWLoadMaster continues if the Peoplesoft job has finished correctly.

Run DWLoadMaster using configuration file test Test02.dtsconfig.

Note: Automate

Log table JobLogs gets a row signalling were moving past the CheckPrereq step

 

etc.

       
           

2.00

Data Accuracy

       

2.01

Customer dim, incremental row count

Do all new customers from source system end up in the customer dim?

Run DWLoadMaster using configuration file test Test02.dtsconfig.

CheckCustCnt.sql

Result query from scrip counts rows from Ops and new rows from the customer dim. These numbers should be the same (511)

2.02

Customer dim, Type 2 counts

Do customer changes to Type 2 columns result in new row in customer dim?

same

CheckCustType2.sql

Query result set returns same rowcount (7)

 

etc.

       

Figure 14.2: Example test matrix information

The first thing to notice in Figure 14.2 is that the individual tests are tiny. A test like Does the ETL process run correctly from start to finish? is too big to be tested. Keep breaking down tests until you have something simple thats easily verified.

The documentation for each test includes descriptions of how the test should be run and the results verified. Ideally , write scripts to run tests and check results. Its often as easy to automate a test as it is to document the steps you want a person to follow, and its far easier to run the second or third (or twentieth) time. At the very least, stick the SQL or MDX used to verify a result into the spreadsheet so the tester can find it and copy/paste.

Develop a Primary Test Dataset

Develop a basic test dataset to simulate an incremental load. Later in the deployment process youll test against live data, but early on its important to have a known set of test data so you can easily check results. Follow this test dataset all the way through the process, from ETL to the relational and OLAP databases, and into the predefined reports. More than half of the systems tests are usually performed by one normal run through the basic test dataset.

You can include imperfect data in the primary test dataset, as long as your processes have been designed to identify and handle that data during normal operations. For example, perhaps you expect one fact table to have early-arriving facts and youve automated the process of creating a dummy dimension member to ensure referential integrity. Include test data for this normal (but imperfect) data condition within your primary test dataset. The test cases that we described earlier in this chapter should verify that this data is properly handled.

Youll run tests over and over, so automate the process of restoring the test databases to the pre-test state.

image from book
USING INTEGRATION SERVICES CONFIGURATIONS

Integration Services configurations are a powerful tool for modifying the way a package behaves at runtime, without editing the package. This capability is very important for controlling package contents and quality: Once a packages quality has been assured, it should be locked down and not edited again. But some things about a package may have to change. This is particularly true of connectivity information, which almost always changes as you move a package from development to test to production.

You can pull out almost any characteristic of a task or connection into a configuration file. At the time the package is executed, Integration Services reads the configuration files, overwriting the default value of a parameter or variable with its value from the configuration file. You can have many configuration files, and even overwrite the same parameter multiple times. The last configuration file in order will win.

Weve been talking about configuration files, but theyre not always files. Configurations are often stored in XML files, but you can also put the value of a parameter into an environment variable, registry entry, or a configuration table in SQL Server.

Some applications, especially those built as shrink-wrapped systems, will make extensive use of configurations. Even the simplest application will probably use them at least twice:

  • To pass parameter values from a master package to its child packages

  • To modify package connection information as packages move from development to test and production

The first step is to set up your package to use configurations. Edit the package in BI Studio, and choose Package Configurations from the SSIS menu. Add a new configuration; for now have it go to an XML configuration file.

The Package Configuration Wizard shows you all the Variables , Properties, Connection Managers, and Executables in your package in a tree structure. You can navigate down to select any property from those objects. Within Connection Managers, you might navigate to your target database connection, and choose to put the Server Name in the configuration file. If youre using Windows Integrated Security to execute your packages, and use the same database names on development, test, and production systems, this may be the only property you need to configure.

Note that the Executables whose properties you can configure include any Data Flow tasks in your package, but only at the task (container) level. There arent any properties available for the steps (transforms) within a Data Flow task. As far as a configuration is concerned , a Data Flow task is a black box.

Its trivial to change the default value of a package variable in the configuration file. For now, just add a variable to the configuration.

As we mentioned, connection information and variable values are by far the most common properties to change in configurations. Another handy trick is to change the Disable property for a task. Perhaps you have a task that you want to run when the property is in test mode (like truncate the target table), but you dont want to run that task in production. Call the Disable property for that task into a configuration, and toggle its value as appropriate when you run the package. Remember, youre trying to avoid editing a tested package, which would be necessary if you simply wanted to delete that task before you rolled the package to production.

You then specify where the configuration file is to be saved. As soon as you exit the Package Configuration Wizard, you can bring up that file in your favorite XML editor (or Notepad). Try modifying one of the values you configured, save the file, and execute the package. The configurations are applied whether you execute the package from within the debug environment in BI Studio, from the command line, or launched as a SQL Agent job. Integration Services will get unhappy , raising errors and warnings, if youve specified a configuration file and it cant find it at runtime.

image from book
 
Develop Multiple Test Datasets

During Operations testing, as we discuss later in this chapter, youre checking that the control flow of the system operates as expected. Its as important to test with bad data as it is to test with good data, especially if youve defined data quality conditions that should cause processing to stop. Plan to create alternative datasets to test error conditions that affect the control flow.

Plan to run tests with at least three sets of data: normal static test data, one or potentially many control flow error datasets, and at least one live source. You could put the test data in whatever data management system suits your fancy, but we find it easiest to put each test dataset in its own SQL Server database on a test server. Name each database to reflect its contents, like RI_Violations. If your real-world ETL process combines data from many sources, you may still want to combine the test datasets into a single SQL Server database for the bulk of your testing. This is very easy to do if all your sources use OLE DB or .NET providers. If you use several types of sources, like flat files and relational sources, develop a system and naming conventions for bundling the different pieces of data that make up one test dataset.

Elsewhere in the testing process, as you get closer to deploying the system, youll need to test with live data. For now, isolate the control flow error condition testing from the tests of connectivity and integration across multiple systems.

Warning 

You absolutely dont want to continually edit Integration Services packages to point to these alternative sourcesonce a package is open for edit, you have no control over whats being changed. Instead, use the Integration Services configurations feature (see the related sidebar) to change the properties of the source connections. We walk through an example of using configurations in the related sidebar.

Run Tests and Log Results

When youre actually running tests you must keep track of which systems and data you used and what the test results were. Theres nothing particularly unusual about this part of testing a DW/BI system. As with any other software and system testing, the important thing is to run the tests and log the results.

Figure 14.3 illustrates the absolute minimum information to collect about each test run. Undoubtedly the most important information is the status of the test and any specific comments about what the tester observed .

RunDate

5/16/2006

     

Who

Testers Name

     

System

Dev (Apollo, Zeus, Minerva)

     
         
         

Task#

Name

Config

Result

Comment

1.01

Job kickoff

Test02.dtsconfig

1

 

1.02

Peoplesoft dependency (negative)

Test02.dtsconfig

Job stopped correctly, but email not sent

1.03

Peoplesoft dependency (positive)

Test03.dtsconfig

1

 
 

etc

     
         

Figure 14.3: Example test run log

Historical Load Data Quality Testing

Weve been discussing the system testing process. Now its time to look at the kinds of tests you need to develop, and the issues to watch out for.

Many DW/BI projects begin by loading a relatively large volume of historical data into the system. You may have built a separate ETL process for loading historical data, different from the incremental processing. This is common if you need to go to heroic efforts to clean and integrate historic data. The historical load must be tested, but it often makes sense to test the data rather than the load process. In other words, perform quality assurance checks on the data before you start with incremental updates, and dont worry about testing the historical load process. Its a lot easier to verify the data than to verify the process. If you verify the starting point and the incremental procedures, and have reliable backup and recovery procedures, you may never run the historical load again. Data quality assurance testing is discussed later in this chapter.

The incremental data loading system, by contrast, requires extensive testing of all the processes, procedures, and control flow paths.

Operations Process Testing

During development, people have been working on one component of the system at a time. Even if your team is very small, you usually isolate subsystems during development. Experienced teams begin small-scale operations and end-to-end testing from the outset. As soon as two components are ready, check how they fit together. Every time you develop another piece of the puzzle, step back and make sure everything still works.

Now its time to pull things together and confirm the entire system operates as designed. First, verify end-to-end operations using the primary set of static data that we described earlier in this chapter. The primary testing dataset should run from start (extracting from source systems) to finish (running and delivering reports) without triggering any errors that stop processing.

The operations process testing should verify that ETL jobs start when they should and run correctly under normal operation. Confirm that Analysis Services databases are processed as expected. Verify that standard reports are built on schedule, only after the underlying data has been correctly updated. Use team members email addresses to confirm that report distribution via email works as expected.

Check all the branches in your incremental ETL processing. Perhaps your scheduling system kicks off the master package at 1:30 a.m., and the first thing the master package does is check to see if the ERP system has finished its nightly processing. Verify that the communication between the ERP and ETL systems works as expected. Does your package run to completion if it starts when the ERP system has finished its work? Does it do the right thing when the ERP process has failed or hasnt yet completed?

As we discuss in Chapter 6, your ETL system should check row counts and checksums. Ideally, youve built processes for handling unusual events, like an unbelievably low or high row count. Test these procedures, using alternative datasets and configurations.

While there are potentially dozens of infrastructure-related problems that surface during operations testing, youll almost inevitably stumble over security issues. As we described in Chapter 12, system operations should use system accounts that are independent of any individual user ID. The system needs to continue running even if the ETL developer quits and her ID is cancelled.

Depending on your organization, it can take a long time just to get a system account set up, not to mention getting the necessary authorizations into the various source systems, servers, and other resources. In a large organization, you should plan on it taking a few weeks to iron this out (depending on whom you know).

Data Quality Assurance Testing

We cant overemphasize the importance of testing the quality of the data in your DW/BI system. The data needs to be checked very carefully before the system goes live. Develop subsystems and procedures for continuing to check data accuracy after deployment. Delivering bad data is much worse than delivering no data at all.

Data quality testing, at its heart, consists of running a query or report from the source system or systems, running the corresponding query or report from the DW/BI system, and comparing results. The magic comes in knowing the corresponding query or report. You may need significant business knowledge to match up the multiple systems. For this reason and for buy-in from the business community, you must include the business users in the data quality assurance process.

Make sure you can reproduce existing reports to the penny. Sometimes we find during DW/BI system development that existing reports have long been in error. This makes it harder to verify the new reports, but you absolutely must audit them and document any discrepancies. If you have an Internal Audit group , enlist their assistance in this process.

Weve often been asked to recommend tools to help test data quality. We dont know of any tools that help with the hard part of testing: determining what data needs to be tested and how to do so. At the very least, the data quality testing reports should include row counts, grand totals and subtotals along major dimensions and hierarchies, and by time.

Your Analysis Services database may include some complex calculations and KPIs. These too are difficult to validate. You should have someone test the calculations externallyusually in Excelto confirm that the MDX expressions are correct.

Report definitions sometimes include calculations as well. Check everything: in a budget variance report that displays budgets , actuals, and variance. Confirm that the variance is truly the difference between the other columns. Even in this trivial case you may see a penny difference due to rounding. Discuss that rounding with your business users and get a decision from them on how to handle it.

As with other kinds of tests, its useful to automate data quality tests as much as possible. During testing and deployment, youll typically run the data quality tests at least three times:

  • Test the historical load: As we discussed earlier in this chapter, historical load testing is primarily a data quality assurance process.

  • Test the outcome of running the primary test dataset: This static dataset is usually small, and is easiest to check thoroughly.

  • Test the real data: Once you start running live data through your test system, you should test the validity of that data, as we describe in the next section.

Any automated data quality tests that you developed for the deployment phase should eventually be folded into the ETL process. Develop processes for logging the results of the ongoing data quality tests, and publish data quality reports to the business community.

Live Testing

Once youre confident that your historical data is accurate and the operational processes are clean, you can begin testing with real data. Before you begin live testing, back up the database after the historical loads data has been tested. Hope for smooth sailing, but plan for rough seas.

Set up the test system so that it points to the same operational sources that youll use in production. You need to run the live testing for long enough to see real world patterns in the data and operational environment. Nightly loads need at least a week, perhaps several weeks, of exercising with live data before the system launches. If theres an operational cycle that may affect your system, like a monthly closing process, ensure your tests span a complete cycle. If you load data on a monthly cycle, its not realistic to leave the data in live testing for very many load cycles. Two iterations is a lot better than one, if you can manage it.

There are several reasons to perform live testing:

  • System integration: No matter how thorough and realistic your test datasets and test systems have been, you probably havent been connecting to the production systems on a regular basis. All sorts of things can go wrong, beginning with the permissions on the test systems account and server.

  • Data quality testing: Run your data quality tests against the data after incremental processing.

  • Performance testing: Performance testing works best against a live load, at least as a final confirmation that performance tuning has been effective.

Performance Testing

The larger and more complex your system is, and the more usersespecially ad hoc usersyou have, the more important it is for you to conduct rigorous performance testing before you go into production. You want to launch your system with the best performance possible, and you certainly want to be confident that you can perform all processing within the necessary load windows.

You may have several goals for conducting performance tests. The most common are:

  • System tuning: How can you tweak the system to deliver the best possible performance?

  • System sizing and configuration: What hardware do you need?

  • Confirmation of service levels: Will you meet your downtime and query performance requirements?

  • Headroom analysis: How long will todays system and hardware meet your requirements, as the DW/BI system continues to grow?

As with everything else associated with systems, the performance testing process is best begun with some planning. Specify the goals from your testing process, and develop tests to address those goals.

Rather than thinking about performance testing as associated with each of the components of SQL Server (RDBMS, Analysis Services, and so on), we prefer a more integrated approach. Test processing performance (extract, transformation, load, cube incremental processing, and standard report generation) and query performance (ad hoc queries to Analysis Services and the relational database, on-demand reports to Reporting Services).

System Tuning

Before you run your first test, figure out how youll evaluate the results of your tests. What will you measure? You need the obvious measure of timing, but if you want any hope of improving performance, you need a lot more information than that. In Chapter 15 we discuss the kinds of information that should be logged during system operation. These counters and event traces are invaluable to diagnose performance problems, both during the testing period and during operations.

Run a baseline test that measures system performance of your best guess of how the system is optimally configured. Imagine that you were going to go into production without doing any performance testing at all. (This thought exercise may strain your imagination , but give it a try.) This baseline test should measure the performance and operating characteristics of that best guess system.

Next, test one change at a time. Change an index. Test. Change a parameter. Test. If you change more than one thing at a time, youll never know which change was helpful. Use a spreadsheet to track tests and results. If youre trying to wring the last ounce of performance from the system, you may need several iterations. Use the one-change-at-a-time technique to develop an improved optimal configuration. Try a second set of one-at-a-time changes against that new configuration until youre satisfied with the performance.

Note 

We would apologize for providing you with such obvious guidance, except that weve hardly seen anyone conduct performance tests correctly.

Its vital to conduct system tuning performance tests on the nearly completed system as a whole. You can and should unit-test individual components of the system, like cube processing. But unless the production system will run each unit in isolation, this testing is far from conclusive. The biggest problems youll encounter arise from competition for scarce resourcesusually memoryacross multiple units of the system.

System Sizing

Its really hard to figure out exactly what system to purchase, and even harder to figure out whether youre better off centralizing your system or distributing processing onto multiple servers. Hardware decisions are often made very early in the project, and hence are typically made for political reasons rather than for technical ones.

In Chapter 4 we provided some very broad guidance on system sizing. To get more specific advice than this, youll need to turn to one of the following:

  • Hardware vendors sizing tools (if they exist)

  • Published performance studies on similar systems (if they exist)

  • A knowledgeable consultant whos familiar with both your system and a wide range of similar implementations (if you can find one)

  • System sizing performance tests

You can and should request the hardware vendors to provide system sizing tools for your DW/BI system. We hope theyll have such tools, if not at the time of SQL Server 2005s launch, then soon after. But the sad truth is that DW/BI systems tend to be customized, variable, and nearly unique. No tool is going to do a great job of helping you size , much less determine whether youre better off centralizing or distributing. The only way to really figure it out is to perform testing with your system, data, data volumes , and usage patterns. Its seldom feasible to do this long before your initial launch. Youd have to complete the application, then test performance on several alternative hardware configurations, then order and install the chosen system, all before deploying the system in production. Most organizations use a much less exact method of choosing hardware for the initial deployment, thenif performance is poor and tuning doesnt fix itperform a full-blown system sizing test.

Tip 

Buy hardware thats bigger than what you think youll need. Buy expandable systems, especially systems that can hold more memory than youre initially purchasing. 64-bit is very appealing, even if you dont think you need all that memory at the outset. But if you buy a system that lets you upgrade to 8 or 16GB of memory, or more, itll be much easier for you to adjust to a system thats used more heavily than you initially predicted .

Service Level Confirmation

Increasingly, DW/BI teams are entering into Service Level Agreements with the user community. These agreements cover data latency, user downtime, and often user query performance.

If you have such an agreement in place, then you surely must test that your system is likely to conform to the agreement. This is often a first step that leads to more extensive performance testing for tuning work, or even alternative system sizing and configuration efforts. But if youve cleverly made an agreement with pretty low minimum standards, you may simply need to confirm youre above those standards.

Tip 

Service Level Agreements (SLAs) are a valuable tool for focusing management attention on important issues. But dont let your SLA drive you to deliver mediocrity by striving only to meet the stated requirements. Under-promise and over-deliver. Never promise more than your clients are requesting, but always try to deliver more than theyve imagined possible.

Be very careful in negotiating Service Level Agreements that include metrics for ad hoc query performance. Dont let yourself agree to an absolute ceiling for ad hoc query times, like all queries complete in 10 seconds. Youd be much better off agreeing that 90 percent of queries would complete in 5 seconds. In a system of any size and complexity, its always possible to write an ad hoc query that exceeds any reasonable maximum.

Clearly specify in the SLA what you mean by important terms, like query completion. Does this mean on the server side, or does it also include the transport (over a potentially low bandwidth WAN) to the client? The SLA is, basically, a contract between you and your users. You probably dont need to include Legal on this contract, but you should take it seriously. Your management will take it seriously if you dont maintain service levels.

Processing Performance: Getting Data In

Performance testing for the data processing side of the problem is fairly straightforward. The live testing that we described earlier in this chapter is the basis for the processing performance tests.

The simplest approach to building a processing system is to serialize the major components. All ETL work to the RDBMS finishes before you begin cube processing, and that completes before you start generating reports. Such a serialized system is easy to performance test and diagnose, because the units of work are isolated. You can test and tune each unit separately. Unless your load window is very small or your latency requirements approach real time, youll probably start off with serialized processing.

You may be able to design your processing system so that work is parallelized. You need to process shared dimensions first, but you should be able to start the work on one fact table while a second fact table is still loading. You can make significant improvements in the overall loading time by parallelizing some activities, but this is a much harder system to design and tune. Your performance tests must run on the integrated processing system. All parts of the DW/BI system compete for resources. You cant test each component separately and sum their processing times. This is true even if the different components are distributed across multiple servers because theres always some burden placed on the upstream servers.

Another issue to consider is confirming that changes made to improve the performance of one part of the system dont negatively impact another part of the system. The classic problem is index and aggregation design. You may want lots of indexes and aggregations for queries to run quickly. But these structures must be maintained , which can place an intolerable burden on the processing performance. Every time a change is considered , evaluate the effects on a complete test system before deploying to production.

Query Performance: Getting Data Out

Testing query performance, especially ad hoc query performance, is a much harder problem than testing processing performance. The fundamental problem is that you dont know what your users are going to want to do. You can ask them and get some ideas, but those ideas are going to bear, at best, only a resemblance to reality.

Standard reports are either pre-run and cached, or run on-demand. Typically pre-run reports are executed at the end of the ETL processing, and we consider the work more like a processing workload than a query and reporting workload. You can set up Reporting Services to email the results of a pre-run report to the users; doing so shifts the entire burden of report generation to a scheduled time. Alternatively, users might access the pre-run report from the BI portal, in which case theres a modest on-demand element associated with displaying the report. A solid performance test of pre-run standard reports uses estimated usage patterns for accessing the pre-run reports. For example, 500 people will access the report at random times between 8 a.m. and 9:30 a.m. The relational database where the Reporting Services catalog is stored, the Reporting Services engine, and the web servers are all involved with serving pre-stored reports.

A standard report thats executed on-demand involves more work. The first demand is on the database upon which the report is defined, to serve up the basic data for the report. Then, Reporting Services works on that result set to render the report. Finally, the report is distributed, usually across the web to the users browser window. On-demand reports are often used for parameterized reports, for infrequently accessed reports that dont warrant pre-executing and storing the results, and for reports with low latency. A good performance test for on-demand reports includes a realistic estimate of who is running the reports, when, and with what parameters. Reports can be cached in memory, which is great for performance. In the absence of real world data about how users are running reports, its very difficult to accurately estimate the use of the report cache.

Finally, laboratory tests of the performance of ad hoc queries is fiendishly difficult. The first problem is to know what users are going to want to do. You know your predefined reports and other BI applications, but ad hoc is, well, ad hoc. You have to return to your business requirements document to extract information about analyses. Watch (by collecting query text) what the early business users and testers are doing with the system. Of course, if youre testing a system thats already in production, you should collect a broad range of queries from the system logs that we discuss in Chapter 15.

One of the biggest challenges in performance testing ad hoc query loads is how to capture the typical analytic experience. The business user issues a queryeither an ad hoc query or a standard reportand then explores around that data by drilling up, drilling down, and launching new related lines of inquiry. The user will look at a screen of information for a while before deciding where next to go. A good ad hoc query performance and scalability test will consist of sets of related queries separated by user think times.

Analysis Services, Reporting Services, and the database engine all have caches from which similar queries can be served . These caches are valuable because they greatly improve performance for the common behavior we just discussed: a chain of queries on a related topic. Its really difficult to design query performance and scale tests that take appropriate advantage of such a cache, without going too far and unrealistically assuming all queries will be resolved from cache.

We have not seen a query-testing tool on the market that adequately addresses this pattern of behavior. This is the main reason you see very few published performance and scalability studies for Analysis Services. To do it well, the authors of a study would need to develop a testing tool. We dont know of anyone whos gone to the considerable trouble of packaging such a home-built tool for you to conduct performance tests at your site.

Reference 

Check the web site for the internal Microsoft project Project Real, at www.microsoft.com/sql/bi/ProjectReal . Its possible theyll have information, or even a tool, to assist in performance testing. At the time of this writing, no such tool is available.

SQL Server Standard Edition Testing

Your DW/BI team will probably use SQL Server Developer Edition during development. Developer Edition contains all the functionality of Enterprise Edition, which means it has more functionality than is available in Standard Edition.

If youre using Enterprise Edition in production, you may put Developer Edition on your test system (assuming the test system is devoted to testing and isnt used for any production purposes). But if youre using Standard Edition in production, you need to develop a plan to test that the system developed using Developer Edition will actually work on Standard Edition. By far the best way to do this is to run an end-to-end test on a test system with Standard Edition installed.

SQL Server 2005 doesnt throttle Standard Editions performance. Instead, the difference between Standard and Enterprise Editions is based on features. Thus, you could probably get away with a stand-alone test of Standard Edition functionality on a small test system. A different test system, used for scale and performance testing, may use the much less expensive Developer Edition.

Of all the DW/BI services in SQL Server, Analysis Services has the most features excluded from Standard Edition. The most important differentiating feature is Analysis Services partitioning. You cant define a partitioning strategy for Analysis Services measure groups in Standard Edition. Perspectives are also excluded from Standard Edition.

Reference 

For a complete list of the SQL Server features that are excluded from Standard Edition, see the Books Online topic Features Supported by the Editions of SQL Server 2005.

As we describe in Chapter 7, Analysis Services does provide assistance to you during the development cycle, to prevent you from using features that arent supported in Standard Edition. A wise person would still test.

64-bit Platform Testing

If, as we often recommend, your production hardware is 64-bit, you need to test with 64-bit hardware as well. Serious performance testing must be conducted on 64-bit hardware, as the runtime characteristics between 32-bit and 64-bit platforms will vary widely. But failing that, you do at least need to perform functional testing on a 64-bit box.

So much is the same between the 32-bit and 64-bit platforms that you may be tempted to skip this functional test. Unwise! A particular element to watch out for is an Integration Services Script task, which can be precompiled. A task that was precompiled on Win32 is not going to work on Win64.

Usability Testing

Unless youve developed custom user-oriented software as part of your DW/BI solution, usability testing will not be a huge burden. In large part this is because, with shrink-wrapped front-end tools, there are relatively few things you can change. You can typically change the names of things (columns, tables, and reports), and the way they are organized.

Nonetheless, perform some usability testing with actual business users. As with all usability tests, you need to find fresh minds: people who have not been intimately associated with the project. Walk a few people through the BI portal and the reports, and see what trips them up.

Later in this chapter we discuss user training. If you have a large user community, you may conduct a dry run of the training with a friendly audience. Plan to gather feedback from the users not only about the class itself, but also about the usability of the system. Although this usability feedback is coming late in the process, theres enough time to change the names and organization of the report folders.

Earlier in the system development process, when you start working on the Analysis Services database and defining reports, you should show a draft of the object names to business users. Rely on them to tell you what objects should be called in the interfaces they see. You tried to get object names correct when you were designing the relational database. But often business users change their minds about names when the system gets closer to reality. We let business user names diverge from the physical names, if it helps the business users understand the system. Because theres always at least one layer between the physical relational database and the business userbe it relational views, Analysis Services cubes, Reporting Services Report Models, or all of these layersyou can change names that business users see without messing up your ETL system development. But you do need to get the names right for Analysis Services and the reporting metadata layers .

Tip 

Another issue to think about early on is the hierarchies within dimensions. Dimension hierarchies often become dropdown lists for ad hoc queries and parameterized reports. During the database design sessions you should think about the user experience associated with large, flat hierarchies. What will the user experience be if they are trying to navigate a dropdown list and you populate a list with 100,000 items? This is a common problem, and not one thats easily fixed at the last minute, just before rollout.

Before your new system goes live, you must have implemented security at the user, role, report, and database levels, as we described in Chapter 12. If users dont have the correct security permissions, the system isfrom their point of viewcompletely unusable.

Deployment

Your deployment process will depend on the system architecture you set up for your DW/BI. In Chapter 4, we described typical small, medium, and large configurations. Each of these will follow the same general deployment process, but will have different specific steps.

If youre implementing a new system, on new hardware, your deployment procedures can be somewhat casual. Its common to use the production system for testing and even user training, before the new system goes live. You can think of the deployment process in this simple case as via emailsend a message when the system has passed all tests and is ready to go live.

After that first free deployment, it gets a lot harder. Any modifications to the systemand there are always modificationsshould be accomplished with minimal disruption to the business user community. The only way to do this is to:

  • Perform testing on a test system thats as identical to the production system as possible.

  • Use scripts rather than clicking through a user interface. Any time you need to open a tool and especially to click through a wizard, you open the possibility of doing the wrong thing.

  • Develop a deployment process playbook that describes exactly what to do and in what order. This is especially vital if you cant run a script but instead must do something within a tool.

  • Test the playbook on the test system before trying it on production.

Relational Database Deployment

There are several ways to deploy a relational database from test to production. The easiest, of course, is to declare the test system to be production (email deployment). But assuming a more realistic world, SQL Server still provides you with several tools for deployment.

If you want to deploy the entire database, for example for a brand new system, you could use backup and restore. Youd need to modify the restore script to move the data files to the correct location on the production server, if your test server is configured differently.

Its more common to script the creation or modification of the target database. The easiest way to do this is to let SQL Server generate the script for you. In Management Studios Object Explorer window, right-click on an object like a database or table and choose Script (object type) as CREATE. This menu option scripts only the specific object youve chosen. Youll probably need to script several objects.

After you generate the CREATE scripts, you may need to modify the location of data files. Ideally, your test system is configured like your production system, so running the scripts on the test system is an excellent test. If the systems are configured differently, test the scripts as much as possible, and document what is different and what to verify during the production run. Check the scripts into source control.

If youre modifying an existing relational data warehouse by adding a new dimensional model, the process is much the same. Generate CREATE scripts for the new tables and associated objects like indexes and views. If these are truly additions, the impact of creating new empty objects in the production system should be zero (but test it first!).

Over time, youll surely modify existing tables somewhat. This is a trickier proposition, because if you ALTER a table you run a risk of breaking the existing ETL processes. Certainly you can address this risk by thorough testing.

Tip 

Recall from Chapter 4 that we recommended you always build a layer of views to abstract access to the relational tables. This view layer can provide valuable flexibility for modifying the production system in steps, keeping the old table and view around until youre confident that you can safely switch over to the new structure.

You may want to bring data from the test system to production, without taking the draconian step of using a full database restore. In this case, you can build simple Integration Services packagesor use Bulk Insertto copy the data from test to production for the subset of tables.

Tip 

The Deployment Playbook for deploying a new relational database, or modifications to an existing database, should include the following:

  • Any edits to make to SQL scripts (like editing file locations or database names). Its far safer to parameterize these changes, but writing the scripts is a lot more complicated if you do so.

  • Any static data in the new database. This data may include configuration data, like a table containing metadata, Integration Services configurations, or global variables. And dont forget to populate any new static dimensions.

  • Some mechanism for verifying that the edits were done correctly if edits are made directly in scripts. At the very least, advise the operator to search for specific phrases that should have been changed during the editing process.

  • The run command for any scripts, like SQL scripts, including any parameters used in the script execution.

  • Integration Services packages to run, which will load data into some or all tables.

  • A script or instructions to verify that all completed correctly. There are third-party metadata differencing tools that can compare structures between two databases. If you cant get access to such a tool, at the very least you can generate CREATE scripts for the two systems, and use windiff to compare them.

Integration Services Package Deployment

The process of deploying a package from development to test to production is straightforward. The hard part is testing that the package works correctly in its new destination, and theres no magical tool to help with testing.

There are tools to help with package deployment. The first tool is Integration Services configurations, which we discussed earlier in this chapter. Configurations let you change at run-time the characteristics of an Integration Services package, like the connection string to sources, the location of file folders, a parameter or variable value. Changing values at run-time is valuable because it lets you modify the way a package executes without opening and editing the package.

The second feature of Integration Services that simplifies deployment is the aptly named Deployment Utility. The Deployment Utility bundles into a deployment folder all the components associated with a set of packages, including any configuration files, code libraries, or other files that you included in your project. You can copy the deployment folder from one server to another, and then launch the Package Installation Wizard to install the packages.

Tip 

During the development process, you probably discovered that in most cases you dont need the Deployment Utility and Package Installation Wizard. You can easily copy Integration Services package and configuration files, and they work just fine on the new server. But why rely on a person to remember and document which files to move? The utility and wizard do the right thing for you, reducing the risk of error.

Launch the Deployment Utility from within BI Studio. Open the project that contains the packages you want to deploy. On the Deployment Utility page of the Properties pane, set AllowConfigurationChanges and CreateDeployment Utility to True. Then build the deployment folder by right-clicking the project in Solution Explorer and choosing Build.

During the Build process, BI Studio effectively compiles your DTS packages. It performs error checking and spits out errors and warnings for any problems it sees, like potential data type conversion errors. By default, the deployment folder is created in the bin folder of the solution. If your solution is in c:\SQLProjects\MDWT Test Project SSIS, then youd find the build deployment folder along that path in bin\Deployment.

After you copy the deployment folder to the target machine, launch the Package Installation Wizard by double-clicking on the DtsDeploymentManifest.xml file within the deployment folder. You can choose whether to deploy the packages to the file system or the SQL Server relational database.

Some organizations are deeply opposed to using a wizard for deploying to production. Scripts can be fully tested and automated in advance. If you use a wizard like the Package Installation Wizard, you run the risk of someone clicking the wrong box or making a typo during the deployment process. If your organization is adamant about not using wizards, you can write a batch script that calls dtexec. Dtexec is a command line utility, fully documented in Books Online, that will copy, delete, encrypt, or deploy a package into SQL Server or a file folder. If your deployment folder includes multiple packages, youd need to call dtsexec multiple times to install them one by one. And if your deployment folder contains configuration or other files, your script would also need to move them to the appropriate places.

Tip 

The Deployment Playbook for deploying a package should include the following:

  • The location of the deployment folder to be copied to the production server.

  • Where to copy the deployment folder to.

  • Instructions for running the Package Installation Wizard, including all choices to make in the dialog boxes. Alternatively, the command script to run that calls dtexec to install the packages, and which copies other necessary files to the appropriate destinations. Dont forget the configuration files and code libraries.

  • Instructions for creating any Windows global variables that the package uses.

  • A script or instructions to verify that all completed correctly.

Analysis Services Database Deployment

As with the other software components of the Microsoft DW/BI solution, there are several ways to deploy a new Analysis Services database, or modifications to an existing database. The best methods for deploying from test to production are to use the Deployment Wizard or the Synchronize Database Wizard. Less appealing is to generate a CREATE database script from Management Studio, just as you would with deploying a relational database. You may think you could simply copy data and metadata files from one system to another, but this final approach is not as simple as it sounds, and is decidedly not recommended.

Note 

Deploying an Analysis Services database into production is very differentand much improvedin Analysis Services 2005.

By the time youre working on plans for deploying the Analysis Services database to production, youve built and deployed it many times, as we describe in Chapter 7. During the development process, you deploy the database onto a development server, and typically process a subset of data. Youve been working with the database as a project in BI Studio.

When you deploy the database from development to test, you should use the Deployment Wizard. Make sure youve built, processed, and unit-tested the development database. The basic scripts for deploying to test exist and are ready for you in the projects bin folder. If your project is in c:\SQLProjects\MDWT test Project SSAS, youd find the deployment scripts along that path in the bin subfolder.

There are four deployment script files in the bin folder. All are in XML format:

  • <ProjectName>.asdatabase: Contains the main script for all the database objects including dimensions, measure groups, partitions, and so on. Some of the settings in this script are overwritten by the other deployment files. You can launch the Deployment Wizard by double-clicking on this file.

  • <ProjectName>.deploymenttargets: Contains the target server and database names. By default, these are set from the Deployment page of the Analysis Services project properties in BI Studio at the time the project was built. But you can change these properties at the time you run the Deployment Wizard to actually deploy the database.

  • <ProjectName>.configsettings: Contains settings about the target environment, like data source connection information and object storage locations. The settings in this file will override the corresponding settings in the asdatabase file.

  • <ProjectName>.deploymentoptions: Contains settings about the deployment itself, including whether and how to process the database after deployment, and whether to replace any existing partitions.

There are several ways to change the settings in these files. The Deployment Wizard contains a user interface for changing the settings. Alternatively, you can edit the deploymenttargets, configsettings, and deploymentoptions files using any text editor. We recommend that you use the Deployment Wizard UI for the first deployment, from development to test. Later, when youre developing the Playbook for deployment from test to production, edit only the deploymenttargets file directly. The other settings should be fixed and tested so that, for the final deployment, the only thing youre changing is the server name. You shouldnt edit the asdatabase file directly. Any changes should be made in BI Studio, or within the other three deployment files.

In addition to launching the Deployment Wizard by double-clicking on the asdatabase file, you can navigate to the wizard from the Windows Start menu, or from the command line.

Reference 

See the Books Online topic Running the Analysis Services Deployment Wizard for information, including command line options.

If the target database exists already, the Deployment Wizard, by default, will deploy only incremental changes. Over time, youll modify the structure of your Analysis Services database, adding new calculations and so on. As with the initial database design process, use BI Studio on a development server, deploy the modifications to the test system, develop and test the deployment script, and then deploy changes to production. It would be foolish to skip the testing steps even for relatively minor incremental changes.

The Deployment Wizard deploys a database project from BI Studio onto a database server. It builds the empty database structure. New measure groups and partitions must be populated by processing the partitions. What if you want to copy all or part of a database from one server to another? You could restore from backup, but a better approach is to use the Synchronize Database Wizard, which copies both metadata and data.

The Synchronize Database Wizard can also increase the availability and scalability of your Analysis Services database. One option is to perform all processing on a staging server, and then synchronize the data to the production server. Analysis Services keeps the old database active until all the data is synchronized, at which time users are switched over to the new structure and the old data dropped. As with the Deployment Wizard, Synchronization is smart enough to work on only changed objects. You can choose to exclude security roles from the synchronization. Synchronization is a useful approach if you find cube processing is interfering with query performance, because synchronizing is often less costly to the target server than processing is. Synchronization is also useful for an Analysis Services clustered implementation, as the clustered copies of the database must, obviously, be kept in synch.

image from book
BE EXTRA CAREFUL WITH ANALYSIS SERVICES DEPLOYMENTS

You need to be really careful about deploying changes to the Analysis Services database, to make sure you dont inadvertently wipe out important characteristics of your production database. Imagine the following scenario:

  • You have an Analysis Services database in production.

  • Youve added security roles to that database.

  • Your ETL system automatically adds new partitions every month.

  • Now its time to add a new calculated measure or set.

Because youre being careful, youll create a new development project by reading the metadata of the production database. This puts your development environment perfectly in synch with production. You develop the new calculated measure.

Maybe you wouldnt mind reprocessing the entire cube, so you figure youll just build, deploy, and process the cube from development into production (because youve ignored our advice about always testing first!). What happens if someone defined a new role, modified membership in a role, or if a new partition came online in the time span between starting your development project and deploying it? The two systems were in synch at the starting point, but no longer.

Please dont do this. The safest way is to script changes, and very carefully test on your test server, as we recommend.

image from book
 

You can launch the Synchronize Database Wizard from within Management Studio, by right-clicking on a database in the Object Explorer pane and choosing Synchronize. You can execute the synchronization immediately or, more professionally, save the script for execution later.

Note 

The Deployment Wizard deploys an Analysis Services projects metadata, and typically launches data processing from the relational data warehouse. The Synchronize Database Wizard copies a database, including data, from one server to another. A common technique will be to deploy a database from development to test, conduct the appropriate tests, and then synchronize the database from test to production.

Reference 

For a complete description of the Synchronize Database Wizard, including how to save the resulting script to a file or execute it immediately, see the set of Books Online topics that begin with Synchronize Database Wizard F1 Help (SSAS).

An alternative method of deploying a database to production is to generate a CREATE script in Management Studio. This is the same method as we recommend for the relational database. Click on the Analysis Services database in the Object Explorer pane, right-click, and choose to generate the script. This method generates an XML script to create the database, but cant be used to create a piece of the database like a dimension, measure group, or role. This script generation isnt as flexible as the wizards weve already discussed in this chapter, so we recommend you use it primarily to generate documentation.

Reporting Services Report Deployment

Deploying a new report is generally a lot easier and less exciting than deploying or changing a database or package. When you launch a new DW/BI system or add a business process dimensional model, you will probably begin the development of the initial suite of reports on a test server, ideally against a complete set of data. As soon as the production server is populated, migrate any existing reports, and continue report development on the production server. If you use shared data sources for Reporting Services reports, its a simple task to point the reports to the production databases.

You will modify and create reports far more often than youll modify the underlying databases. All reports should be tested before theyre released to the user community. The most important tests are to ensure the report definitions are accurate. Complex reports that access a lot of data, especially if the reports are run by a lot of people, should be tested for performance. You may find that you need to write a stored procedure to generate the reports dataset as efficiently as possible.

image from book
REPORT DEPLOYMENT PROCESS

Complete the following steps to safely deploy a new or changed report on the production system:

  • Identify the business users who will test and verify the new or changed report.

  • Create a Reporting Services role ReportTest that includes the DW/BI team and the business users wholl test the report. You may need several report testing roles, if there are a lot of report development projects going on at once.

  • Set up a TestFolder folder structure in the BI portal thats accessible only to the ReportTest role.

  • Develop the new report in BI Studio, and deploy it to TestFolder.

  • Notify the testers that the reports available, and when you expect to hear back from them about it. Your organization may have formal user acceptance procedures for you to rely on here.

  • When the relevant people have signed off on the report, re-deploy it to its appropriate place in the BI portal, with the appropriate security.

image from book
 

Most companies will develop and test new reports in a private area of the production report server, rather than set up a completely separate test instance of Reporting Services. Standard reports dont change data, so you dont need to worry about damaging the databases. All you need is to insulate most users from the test area, which is easy to do with the Reporting Services security settings discussed in Chapter 12.

Warning 

If you develop new reports on the production server, you need to be careful about inexperienced report writers running expensive queries, like joining two fact tables. All right, this warning isnt limited to the inexperienced. Experts can do stupid things, too.

As you may expect, the hardest part of deploying reports isnt technical but political. The greatest challenge is to create policies and procedures that enable your business community to contribute new reports and analyses, while maintaining the appropriate level of control over published reports. You should develop a quality assurance process, and procedures for publishing reports to a broad audience. This is particularly important for highly regulated companies.

Note 

Sometimes, changes to the underlying databases will require that existing reports be modified. Earlier in this chapter we stressed the importance of end-to-end testing for any significant modifications to the DW/BI system. Its really important that the standard report suite be tested before database changes are moved into production. Its usually easy to fix up reports in response to a schema change, but if you forget this step the user experience is the same as if you messed up the underlying data. From their point of view, the DW/BI system is broken. Also, any change that breaks a standard report will likely break user reports as well. If youre implementing these kinds of changes, notify your users early on and discuss what theyll need to do to deal with the changes. You may want to set up a user report migration project to help rewrite some of the key user reports.



Microsoft Data Warehouse Toolkit. With SQL Server 2005 and the Microsoft Business Intelligence Toolset
The MicrosoftВ Data Warehouse Toolkit: With SQL ServerВ 2005 and the MicrosoftВ Business Intelligence Toolset
ISBN: B000YIVXC2
EAN: N/A
Year: 2006
Pages: 125

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net