Software Installation and Configuration

The basic installation of the SQL Server and other Microsoft components is straightforward and well documented in Books Online. We wont discuss the installation experience here. Instead, well describe which pieces of software need to be installed on different developers workstations, depending on what part of the DW/BI Project theyre working on. Next , well describe some best practices for the initial configuration of the various SQL Server components, including the relational database, Analysis Services, Integration Services, and Reporting Services.

Most multi-person development teams share one or two database servers, and install only the development tools on their personal machines. A common development team configuration is illustrated in Figure 4.6.

image from book
Figure 4.6: A common development team configuration

Depending on their roles, different members of the development team will need to install different components of SQL Server, and some will need to install other Microsoft and third-party software. These requirements are outlined in the next section.

Development Environment Software Requirements

The following section outlines the software to be installed on the development database server, and the workstations for different common development roles.

image from book
HOW POWERFUL SHOULD DEVELOPMENT AND TEST SYSTEMS BE?

In a perfect world, the test system will be physically identical to the production system. The test system plays two key roles. First, its the system on which modifications are tested . In this first role, its as important to test the scripts that deploy the system changes as it is to test the changes themselves . For testing the deployment process, the test system doesnt need to be identical to the production system. The second major role of the test system is to serve as a place to experiment with performance optimizations, such as indexes and aggregates. For performance tests to be valid, the test system should have similar physical characteristics as production. Many hardware vendors have Technology Centers, and may make those resources available to help validate system sizing prior to production deployment.

Development often takes place on a subset of data. If so, the developers systems, including shared development database servers, can be much less powerful than the test and production servers. Memory is important. Install at least 2GB of RAM on any computer thats running one of the database services.

The ETL system developer should work on the development server by way of a Remote Desktop session because Integration Services packages run in debugging mode on the machine where BI Studio is running. With significant data volumes , package execution can overwhelm a typical developers desktop.

Finally, the screenshots in this book should convince you that team members who use BI Studio will need big monitors . The screenshots in this book were taken at 1024 — 768, and thats really not big enough.

image from book
 

Development Database Server

As we described in the previous sidebar, most DW/BI teams share a database server for development purposes. The SQL Server components to install on the development database server are:

  • Relational engine

  • Integration Services

  • Analysis Services

  • Reporting Services

  • BI Studio for remote use by ETL developers

Most development occurs with small data volumes, so co-hosting all the server components on a single machine is usually fine from a technical point of view. If you have access to plenty of servers, distribute the components in the same way for development as is planned for production.

The development database server requires the Internet Information Service (IIS) to be running in order to install and operate Reporting Services.

Database Designer

BI Studio is the main design tool for Analysis Services databases. The relational data warehouse database is primarily developed in Management Studio. Relational database designers may choose to install the relational database server on their local workstation. The SQL Server components to install on the database designers workstation are:

  • Management Studio

  • BI Studio

  • Analysis Services

  • Relational database engine (optional)

In addition, the database designer should install Visual SourceSafe, or any other source control product that can be integrated with Visual Studio.

The relational database designer may want a data-modeling tool such as ERWin or Microsofts Visio Enterprise Architect Edition. These tools support visual modeling of the database, with excellent forward engineering capabilities.

Note 

At this time, the version of Visio that most of us buy packaged with Office can reverse engineer a database, but cannot forward engineer it. In other words, you can slurp a database definition into Visio and make a pretty picture. You cant edit that database and instantiate the changes. The only way to do the forward engineering piece is to purchase Visual Studio Enterprise Architect. Confused? Perplexed? Talk to Microsoft.

Development Database Administrator

Development databases are usually not managed very wellthey are for development, after all. However, someone needs to perform some basic DBA tasks like ensuring the database has enough space. Backups are a really good idea, too. Management Studio is the tool for operating and maintaining databases (relational and OLAP), and managing the operation of Integration Services packages. The only SQL Server component required on the development DBAs workstation is Management Studio.

The development DBA should install and use the teams source control system for the management of any database maintenance scripts.

ETL System Developer

The ETL system developer will create Integration Services packages. BI Studio is the tool for developing and debugging Integration Services, and the Integration Services components also need to be installed on the developers workstation. The SQL Server components to install on the ETL developers workstation are:

  • Management Studio

  • BI Studio

  • Integration Services

In addition, the ETL system developer should install source control such as Visual SourceSafe. Depending on how complex and unusual your ETL problems are, the ETL system developer may need to install the full Visual Studio 2005 product in order to develop custom objects in C# or VB. This is relatively unusual; the vast majority of ETL systems can be developed without any need for custom coding in a Visual Studio .NET language.

The ETL system developer often uses the development server to work on packages because packages run in debug mode on the same machine where BI Studio is running. Some teams start developing packages on the development server from the outset. Others may start on the developers workstation but move packages to the development server when its time to test with a reasonable volume of data. Still other teams find that its easiest to simply buy a bigger workstation for the ETL developers.

Report Designer

The DW/BI team members who develop reports need the following software on their workstations:

  • SQL Server BI Studio.

  • Source control software such as Visual SourceSafe.

  • Microsoft Office, especially Office Web Components and Excel. Your business users live in Office; you need to confirm that the reports you create look good when rendered to Excel.

  • Optional: A non-Microsoft relational ad hoc query tool. Its a matter of taste, but we dislike the Microsoft-standard query generator thats part of Visual Studio. We recommend that you use your favorite third-party tool to formulate the query and then paste the SQL into the report designer. Microsofts Report Builder can also be used to create datasets in the Report Designer.

  • Optional: A nonMicrosoft Analysis Services query tool. As we have discussed elsewhere in this book, many Microsoft-based DW/BI systems use a third-party Analysis Services query tool to circumvent the limitations of the Office suite. Look for future versions of Microsoft Office (after Office 2003) to fill in this hole in Microsofts product offering.

Reporting Portal Developer

As we discuss in Chapter 9, many DW/BI systems will embed Reporting Services into a reporting portal built using SharePoint Services or some other portal software. The reporting portal developer needs the following software:

  • SQL Server BI Studio

  • Source control software such as Visual SourceSafe

  • Microsoft Office Excel and Office Web Components

  • Microsoft Office SharePoint Services

It would be unusual for the reporting portal developer to need Visual Studio.NET to implement functionality not included with SharePoint Services.

Data Mining Model Developer

As we discuss in Chapter 10, the first major step in building a data mining application is to develop and train data mining models. This is an activity that requires knowledge of statistics, of the business problems, and of the data, but it does not require actual coding skills. The software required includes:

  • SQL Server BI Studio for developing the data mining models

  • Analysis Services

  • Relational and Analysis Services query tools for investigating the data

  • Source control software

Analytic Application Developer

Analytic applications embed domain expertise and best practices into a guided analytic activity. Although some analytic applications are just a collection of predefined reports, sometimes theyre more structured than that. The more structured the analytic application, the more likely the analytic application developer is writing code, perhaps integrating the data mining model described previously with the operational systems. The analytic application developer needs the following software:

  • SQL Server Client and Developer tools. The application developers typically work with databases and data mining models that others have developed. You will need the SQL Server object models, which are installed by default with the Client Components.

  • Source control software.

  • Visual Studio 2005, with one or more .NET languages such as C# or VB.

Test and Production Software Requirements

In a perfect world, your test and production machines will have the same physical configuration. We recognize thats not always realistic, but it is realistic and mandatory that they have the same software configuration. They must have the same operating system with the same configuration. They must have the same components of SQL Server, with the same versions including service packs .

If youre using your test system only for testing, and not for any production use, you can use Developer Edition on the test systems. This is extremely appealing for components that use Enterprise Edition in production, not only because Enterprise Edition is expensive but also because the feature set of Developer Edition is the same as that of Enterprise Edition. But its problematic to use Developer Edition on test machines where the production machines are running Standard Edition. Thats because the Developer Edition feature set is richer, and you might not discover a dependency on an Enterprise Edition feature until too late. This issue is discussed at greater length in Chapter 14.

Earlier in this chapter we discussed how a small system might build an all-in-one server, with all SQL Server DW/BI components on a single machine. A large system will distribute the components across multiple machines.

Database Server

The database servers, unsurprisingly, require the SQL Server components:

  • Relational database and/or

  • Integration Services and/or

  • Analysis Services and/or

  • Reporting Services. Reporting Services requires Internet Information Services (IIS) in order to install and operate.

To manage the system effectively, you should install the following Windows Server components:

  • Performance Monitor (also known as System Monitor) is installed automatically with Windows. It is the tool that exposes the performance counters that are published by the database servers, and is an invaluable tool for monitoring system performance.

  • Microsoft Operations Manager (MOM) is an add-on to the Windows Server System that provides system monitoring and management services. Large or complex systems should evaluate whether they should purchase MOM or use alternative systems operations software.

Database Administrator

The DBAs for the test and production machines will use SQL Server Management Studio (Management Studio) to operate and maintain the DW/BI system, and manage the operation of Integration Services packages.

Operating Systems

SQL Server requires a Windows Server operating system in production: Windows Server 2000 (Service Pack 4), Windows Server 2003, or later. You can purchase a 64-bit version of SQL Server to install on a 64-bit operating system. You can use a wider variety of operating systems on your development machines. Most developers use Windows XP Professional. The SQL Server setup program will not let you install on a machine that is not configured appropriately.

Server operating systems and software are usually installed on a RAID-1 array, for data redundancy and failover. Internet Information Services (IIS) must be running in order to install and operate Reporting Services.

SQL Server Relational Database Setup

Use the following guidelines when installing, configuring, and securing SQL Server.

Install the relational database component of SQL Server Enterprise Edition or Standard Edition. You may choose to install the Management Studio tools on a server; often the server is managed remotely from a DBAs workstation. Go to the Microsoft SQL Server web site at www.microsoft.com/sql to learn if any service packs are available. In the past, service packs have been made available without charge. Upgrade the instance to the latest service pack. You can install the relational database as either a default instance or a named instance. The only compelling reason weve come up with for installing multiple instances on a single server is to test multiple scenarios on a single machine.

Note 

Multiple SQL Server components can be installed on the same machine. Most organizations use per-processor licensing for DW/BI systems rather than licensing based on the number of users. Your licensing cost is the same if you install and use all SQL Server components on a machine, or only one component such as the relational engine.

The SQL Server relational database is resource intensive and youd seldom choose to share the physical server that holds the data warehouse database with another data application. In production, you usually create the data warehouse databases in the default instance. In the development and test environments, multiple named instances may be very useful.

The SQL Server relational database is a Windows service just like any other Windows service. Use the Management Studio tool to start, stop, administer, and manage a SQL Server relational database instance. The SQL Server database engine service has no dependencies on any other component of SQL Server. As we describe in Chapter 11, you can choose to install and use the relational database as the only component from the SQL Server product suite.

Security Options During Installation

Install and run the SQL Server relational database in Windows Authentication mode. The mixed authentication mode is inherently less secure and should be avoided if possible. During installation, you are asked to provide a Service Account to run each service. Its easiest to manage if you use the same account for all the services, but you are not required to do so. You can modify this choice later by using the SQL Server Configuration Manager utility.

Warning 

Use the SQL Server Configuration Manager utility to change the characteristics of the service account, rather than the Administrative Tools Services tool from the Control Panel.

Restrict access to SQL Server data and log files to system administrators and the SQL Server and SQL Server Agent service accounts. The system databases are secured by default by the SQL Server setup program, but we still like to check.

After youve installed the SQL Server database engine, you may need to run the Surface Area Configuration tool, available from the Start menu, to turn off and on some features and services. If youve found that the . or LocalHost connections to the local machine dont work as expected, you need to go into this utility to enable local and remote connections.

Security issues are discussed in detail in Chapter 12.

Files, Filegroups, and RAID

SQL Server data is stored in files, which can be grouped into filegroups . One of the main reasons to use filegroups is to make it easier to balance I/O across multiple disks. If you place database files on RAID drives , this I/O balancing is handled for you. Unless youre partitioning your fact tables, simply place database files in the default filegroup on a RAID drive. (Later in this chapter we discuss partitioning in detail.)

We strongly recommend that all relational database files be placed on RAID arrays, either RAID-1 or RAID-5, preferably with hardware controllers rather than managed by the operating system. RAID-1 is significantly better than RAID-5, although predictably more expensive, because its faster both for writes and for recovery after a disk failure.

The SQL Server system databases master , model , and msdb can be placed on the RAID-1 array that holds the operating system. These databases are typically very small, and theres usually plenty of room for them on that array. Alternatively, place them on their own small RAID-1 array. A third alternative is to place them on the same RAID array as user databases. It is vital that these databases, especially master , be placed on a fault-tolerant array.

The fourth system database, tempdb , could grow significantly as the data warehouse database is being used. You should pre-allocate tempdb to a large size to avoid auto-growth during query operations. Dont place tempdb on the system RAID-1 array which is usually small. Use at least one file per CPU for high-performance tempdb operations. Be sure to spread tempdb out over many drives to maximize I/O performance. If the total number of drives on your system is limited, you can place tempdb on the same RAID array as the user databases, especially if you use RAID-1 or RAID-0+1 for the user databases. However, if you set up the user databases on a RAID-5 array, you should consider separating out tempdb onto its own RAID-1 array to minimize potential bottlenecks.

Youll likely have at least three user databases: the data warehouse database, a staging database, and a metadata database. Reporting Services has a separate database, which in some circumstances can grow to be quite large. You can put all these databases together on a single RAID-1, RAID-0+1, or RAID-5 array. Because of the size of the data warehouse database and possibly the Reporting Services catalog, the size of this array will dwarf the other databases arrays.

Assuming youre using RAID, you could put all of the data warehouses data into a single file. But long before you reach 2TB, the maximum drive size that SQL Server can address, youll want to break the database into multiple files in order to simplify management and backup. You can use filegroups to ensure data is spread evenly across all the drives in use. In the absence of partitioning, discussed later in this chapter, consider creating a file all for dimensions and a file for each fact table. A single file should be adequate for the staging database and the metadata database, unless your DBAs have strong feelings about an alternative configuration.

Note 

The file layout discussion in this section was directed primarily at small- to medium- sized systems. If you have huge data volumes, in the terabyte range and above, you should work with your storage vendor to lay out your files and disks very carefully .

Database Recovery Model

SQL Server provides three recovery modes: Full, Bulk Logged, and Simple. A database that is operated with Simple recovery mode can be recovered only to the point of the last backup. The Simple model may be appropriate for the staging database, which is often designed to be emptied of data at the start of each load.

Note 

Even if you empty out the staging database at the beginning of each load, keep a copy of the extracted data somewhere. We often keep a copy of the extracts in the file system, in a file whose name includes the date and time of the extract. These extracts should be backed up, as should any permanent data in the staging database. More often than not, the kinds of backups you can do with the Simple recovery model meet the needs of backing up the staging database.

The Bulk Logged recovery model sounds very appealing for a data warehouse database, as we know were going to use bulk inserts as much as possible. But the sad reality is that, other than the initial historical load of data, your ability to perform bulk inserts into the data warehouse database is limited. Bulk inserts work fast only on a table empty of data or on a table without indexesan unlikely state of affairs after the initial loading is complete. A possible exception occurs when youre using partitioned tables, as we discuss later in this chapter. Unless youre using partitioned tables, you should set the data warehouse database to the Full recovery model once the historical load is complete.

Database Initial Size

Set up your SQL Server database with an initial size adequate to hold the initial historical load. You might have permitted automatic growth up to a certain maximum, but you should monitor the database size carefully and increase the databases file size during a period of slow usage. Thats because the initial allocation and subsequent increases are resource-intensive. Its better to pay this file initialization price at a time managed by the DBAs than during the DW/BI systems load processing window.

Windows Server 2003 includes a feature called Instant File Initialization that improves the performance of database allocation. However, instant is a bit of a misnomer; its still a resource-intensive process. For best performance, especially for very large databases, avoid auto-grow. Instead, set up an automated process to check for needed file space and programmatically increase file sizes if necessary.

As we discuss in Chapter 15, you should set up an automated process to check for disk space on a weekly, daily, or load-by-load basis.

Reference 

See the Books Online topic Database File Initialization for more information.

After the database tables physical design has been finalized, as we discuss later in this chapter, you can accurately assess the storage requirements for the initial database setup and storage layout.

Analysis Services Setup

The Analysis Services server machine must have at least the Analysis Services component of SQL Server installed on it. Other components, including the relational database engine and the Studio tools, are not required on that server. Often, especially in locked down production environments, the server is always managed remotely from the DBAs workstation. Make sure you keep up to date with service packs.

You can run Analysis Services alone, if you wish, without using any other SQL Server technology. You dont need even a small instance of the relational database engine to run an Analysis Services database. See Chapter 11 for a discussion of running Analysis Services by itself.

Note 

Remember that if you install the Relational database on one machine and Analysis Services on a second machine, you have to pay for two licenses.

Like the relational database, Analysis Services supports multiple instances on a single server machine. In a standard production environment, we dont see a compelling argument for using multiple instances rather than multiple databases within the same instance. Multiple instances may be useful during development and testing. If youre building a solution for external parties such as vendors, you may find that multiple instances provide an extra level of security or comfort to your customers.

Analysis Services File Locations and Storage Requirements

The main configuration choice to make at or soon after installation time is where the program, data, and log files are located. These choices are made for an instance; all cubes and databases within that instance use the same default location. A RAID array, either RAID-1, RAID-0+1, or RAID-5, is the best choice for all file locations. The program files for a production system should be installed on a RAID-1 array, often the operating system array.

The best place for the log files is their default location, near the SQL Server program files. We recommend using RAID-1 or RAID-0+1 for these files, as you would probably use for the program files.

Analysis Services data files are by far the largest kind of files. It is nearly impossible at design time to estimate with any degree of accuracy how big the Analysis Services data files will be. Lets start with a simple rule of thumb: an Analysis Services database thats built at the same grain as a relational fact table will take approximately 25 percent of the space of that fact table (atomic data only, no indexes). This 25 percent rule includes Analysis Services data and indexes, before aggregations are added, and again its worth emphasizing that this is at the same grain as the fact table. We have seen Analysis Services atomic data at 15-40 percent of its corresponding relational data, but 25 percent is a reasonable midpoint .

When you add well-designed aggregations, the total data size is usually 35100 percent of the relational data. You will be at the high end of that range if you use distinct count measures. In our experience, Analysis Services databases including indexes and aggregations typically take 3550 percent of the data of the corresponding relational table at the same grain, data only, no indexes. This factor was included in the very high-level storage space guesstimate that we discussed at the beginning of this chapter.

For small- and medium-sized installations, a 50 percent factor should suffice for disk planning. For very large installations, you should partition your Analysis Services database. Build a test database with several partitions, and then scale that storage requirement by the number of partitions.

We recommend that you use RAID-0+1 to store the data files. Use a SAN for large installations. To maximize processing speed, use a different RAID array, with a different physical controller, than the location of the relational data warehouse database that feeds the Analysis Services database.

If youre too cost conscious to use RAID-0+1 for the data files, but dont want to take the write performance hit of RAID-5, its not as important to use redundant storage for the Analysis Services database as for the relational database. After all, you can always reprocess the Analysis Services database from the relational data warehouse database, or restore it from backup. Be warned , however, that it could take many hours to process an Analysis Services database that covers multiple terabytes of relational data. We strongly recommend using some level of redundant storage.

When you create a partition for an Analysis Services cube, you can place that partition anywhere in your storage system. Assuming youre using RAID, we see no compelling reason for placing data files anywhere but in the default location.

image from book
SQL SERVER 2000/2005 CHANGE: ANALYSIS SERVICES METADATA

Readers who are familiar with Analysis Services 2000 may be asking when and where they should migrate the metadata repository. There is no formal repository for Analysis Services 2005. Instead, the metadata consists of the XML files throughout the OLAP Data directory, such as DatabaseName .db.xml, CubeName .cub.xml, DimensionName. dim.xml, and so on.

This change may sound dramatic, but the main implication for Analysis Services administrators is improved manageability, particularly for backups and restores , as described in Chapter 15 .

image from book
 

Analysis Services and Memory

Analysis Services loves memory. Analysis Services 2005 has been redesigned to solve the most intractable memory problems associated with Analysis Services 2000. Nonetheless, the more data you can cache in physical memory, the happier you and Analysis Services will be.

Those familiar with Analysis Services 2000 probably know that Analysis Services required that all of a servers dimensions always be resident in memory. If the server didnt have enough memory, Analysis Services 2000 wouldnt run. This is no longer true; in Analysis Services 2005, dimensions do not need to be memory resident. Information about dimension members will move in and out of memory cache as needed. This is great, for certainly a server should handle memory contention gracefully. Nonetheless, for excellent query performance you want plenty of memory for dimension members, a result set cache, the computation engines cache, and other uses. Dont skimp on memory, and seriously consider using 64-bit hardware for the Analysis Services component of your DW/BI system infrastructure.

Integration Services Setup

Integration Services has two major components: a design environment, which is part of the BI Studio; and a runtime environment, which is what you install on your production servers. The design environment is where you create and edit packages. You can see a visual representation of the packages tasks, and run the package in debug mode on the development machine, and only on the development machine. The only way to remotely execute an Integration Services in development/debugging mode is to use a remote desktop connection to the remote machine.

On the production server, install the Integration Services component of SQL Server Enterprise Edition or Standard Edition. You may install the Management Studio tools on your production server, although often production instances of SQL Server are managed remotely from an administrators workstation.

You can use Management Studio to interactively execute a package thats been deployed to test or production. But for the ETL system, you will use SQL Agent to schedule the execution of the DTExecUI or DTExec utility. Using these utilities, you can run on one server a package that is stored on a second server. In production, Integration Services packages can be located anywhere, and can be run on any server that has the Integration Services runtime executables.

As we describe in Chapter 5, you can design your ETL system to run multiple Integration Services packages on multiple servers. If you choose this architecture for your high scale ETL problem, install Integration Services executables (and pay SQL Server licenses) on all of the servers on which the packages are running.

Integration Services has no dependency on any other component of the SQL Server product suite. It could be used as the ETL tool for an otherwise nonSQL Server DW/BI system.

Integration Services presents the option of storing package definitions in the SQL Server. This is not something you need to decide at installation time.

As we described earlier in this chapter, most DW/BI systems will run Integration Services on the same server as the relational data warehouse database. It is easy to change package locations as your warehouse matures and your requirements change.

Integration Services File Locations and Storage Requirements

You may use a relational database to stage data during ETL processing. As we describe in Chapter 5, you will probably use both a relational staging area and a file system staging area. You may use the file system staging area to rest data after it has been extracted from the source systems but before the heavy duty ETL processing begins; you may also use this staging area for intermediate storage, and for a kind of backup of changed data before launching an update. Many people hold on to the source system extracts for days, weeks, or months before deleting them or moving them to offline storage. The volume of disk space youll need for the file-based and relational staging areas depends completely on the design of your ETL system.

Many people use non-redundant storage for staging areas. RAID-1 or RAID-0+1, as always, is recommended, but its the least important data to store redundantly.

Reporting Services Setup

Like all the other components of SQL Server, Reporting Services can be installed on a standalone reporting server, or it can share a server with one or more other components of SQL Server.

When you install Reporting Services, you must supply several pieces of configuration information:

  • The location of the report server catalog database: The report server catalog is a relational database that Reporting Services needs in order to run. It is where report definitions, metadata, histories, and snapshots are stored. The installation program will create the report server database for you. The report server database can be located on a different machine, but you must have the appropriate privileges. This catalog database must be a SQL Server relational database.

  • Configuration options for email delivery of reports: You will probably want to run a subset of standard reports, and use email to deliver either the report or a link to the report. The email account information and other configuration options are well documented in Books Online.

You may choose to install the client-side report authoring tool, Report Designer, on the server. Report Designer is integrated into BI Studio, and most developers use it on their workstations rather than on the server.

Reporting Services is a Windows service just like the SQL Server relational database or any other Windows service. It is also implemented as an ASP.NET Web service that runs on Internet Information Services (IIS). Both the Windows service and the Web service are implemented on the report server. Use the Management Studio tool to start, stop, administer, and manage a Reporting Services instance.

Reporting Services doesnt require any significant file storage other than the report catalog database. Issues around the potential size and placement of the report server catalog were discussed earlier in this chapter, in the section on the SQL Server relational database setup.

Note 

Like the other components of SQL Server, Reporting Services can be installed in isolation. However, it does need access to a SQL Server relational database server for the report catalog.



Microsoft Data Warehouse Toolkit. With SQL Server 2005 and the Microsoft Business Intelligence Toolset
The MicrosoftВ Data Warehouse Toolkit: With SQL ServerВ 2005 and the MicrosoftВ Business Intelligence Toolset
ISBN: B000YIVXC2
EAN: N/A
Year: 2006
Pages: 125

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net