3 4
The data storage and analysis needs of a modern corporation or government organization are very complex. SQL Server 2000 provides a set of components that work together to meet the needs of the largest data processing systems and commercial Web sites while providing easy-to-use data storage services to an individual or small business. This lesson introduces you to the SQL Server 2000 architecture and describes how the various components work together to manage data effectively.
SQL Server 2000 data is stored in databases. The data in a database is organized into the logical components that are visible to users, while the database itself is physically implemented as two or more files on disk.
When using a database, you work primarily with the logical components (such as tables, views, procedures, and users). The physical implementation of files is largely transparent. Typically, only the database administrator needs to work with the physical implementation. Figure 1.2 illustrates the difference between the user view and the physical implementation of a database.
Each instance of SQL Server has four system databases (master, tempdb, msdb, and model) and one or more user databases. Some organizations have only one user database that contains all of their data; some organizations have different databases for each group in their organization. They might also have a database used by a single application. For example, an organization could have one database for sales, one for payroll, one for a document management application, and so on. Some applications use only one database; other applications might access several databases. Figure 1.3 shows the SQL Server system databases and several user databases.
You do not need to run multiple copies of the SQL Server database engine in order for multiple users to access the databases on a server. An instance of the SQL Server Standard Edition or Enterprise Edition is capable of handling thousands of users who are working in multiple databases at the same time. Each instance of SQL Server makes all databases in the instance available to all users who connect to the instance (subject to the defined security permissions).
Figure 1.2 User view and physical implementation of a database.
Figure 1.3 System databases and user databases.
If you connect to an instance of SQL Server, your connection is associated with a particular database on the server. This database is called the current database. You are usually connected to a database defined as your default database by the system administrator, although you can use connection options in the database APIs to specify another database. You can switch from one database to another by using either the Transact-SQL USE <database_name> statement or by using an API function that changes your current database context.
SQL Server 2000 enables you to detach a database from an instance of SQL Server, then reattach it to another instance or even attach the database back to the same instance. If you have a SQL Server database file, you can tell SQL Server when you connect to attach that database file using a specific database name.
The logical database components include objects, collations, logins, users, roles, and groups.
The data in a SQL Server 2000 database is organized into several different objects, which users see when they connect to the database. The following table provides a brief description of the main objects in a database. These objects are discussed in more detail in subsequent chapters.
Object | Description |
---|---|
Table | A two-dimensional object consisting of rows and columns that is used to store data in a relational database. Each table stores information about one of the types of objects modeled by the database. For example, an education database might have one table for teachers, a second for students, and a third for classes. |
Data type | An attribute that specifies what type of information can be stored in a column, parameter, or variable. SQL Server provides system-supplied data types; you can also create user-defined data types. |
View | A database object that can be referenced the same way as a table in SQL statements. Views are defined by using a SELECT statement and are analogous to an object that contains the result set of this statement. |
Stored procedure | A precompiled collection of Transact-SQL statements stored under a name and processed as a unit. SQL Server supplies stored procedures for managing SQL Server and for displaying information about databases and users. SQL Server-supplied stored procedures are called system stored procedures. |
Function | A piece of code that operates as a single logical unit. A function is called by name, accepts optional input parameters, and returns a status and optional output parameters. Many programming languages support functions, including C, Visual Basic, and Transact-SQL. Transact-SQL supplies built-in functions that cannot be modified and supports user-defined functions that users can create and modify. |
Index | In a relational database, a database object that provides fast access to data in the rows of a table, based on key values. Indexes can also enforce uniqueness on the rows in a table. SQL Server supports clustered and non-clustered indexes. The primary key of a table is automatically indexed. In full-text search, a full-text index stores information about significant words and their location within a given column. |
Constraint | A property assigned to a table column that prevents certain types of invalid data values from being placed in the column. For example, a UNIQUE or PRIMARY KEY constraint prevents you from inserting a value that is a duplicate of an existing value; a CHECK constraint prevents you from inserting a value that does not match a search condition; and NOT NULL prevents empty values. |
Rule | A database object that is bound to columns or user-defined data types and specifies which data values are acceptable in a column. CHECK constraints provide the same functionality and are preferred because they are in the SQL-92 standard. |
Default | A data value, option setting, collation, or name assigned automatically by the system if a user does not specify the value, setting, collation, or name; also known as an action that is taken automatically at certain events if a user has not specified the action to take. |
Trigger | A stored procedure that is executed when data in a specified table is modified. Triggers are often created to enforce referential integrity or consistency among logically related data in different tables. |
Collations control the physical storage of character strings in SQL Server 2000. A collation specifies the bit patterns that represent each character and the rules by which characters are sorted and compared.
An object in a SQL Server 2000 database can use a collation different from another object within that same database. Separate SQL Server 2000 collations can be specified down to the level of columns. Each column in a table can be assigned different collations. Earlier versions of SQL Server support only one collation for each instance of SQL Server. All databases and database objects that are created in an instance of SQL Server 7.0 or earlier have the same collation.
SQL Server 2000 supports several collations. A collation encodes the rules governing the proper use of characters for either a language, such as Macedonian or Polish, or an alphabet, such as Latin1_General (the Latin alphabet used by western European languages).
Each SQL Server collation specifies three properties:
NOTE
SQL Server 2000 collations can be specified at any level. When you install an instance of SQL Server 2000, you specify the default collation for that instance. When you create a database, you can specify its default collation; if you do not specify a collation, the default collation for the database is the default collation for the instance. Whenever you define a character column, variable, or parameter, you can specify the collation of the object; if you do not specify a collation, the object is created with the default collation of the database.
Logins, users, roles, and groups are the foundation for the security mechanisms of SQL Server 2000. Users who connect to SQL Server must identify themselves by using a Specific Login Identifier (ID). Users can then see only the tables and views that they are authorized to see and can execute only the stored procedures and administrative functions that they are authorized to execute. This system of security is based on the IDs used to identify users. The following table provides a description of each type of security mechanism:
Security Mechanism | Description |
---|---|
Logins | Login identifiers are associated with users when they connect to SQL Server 2000. Login IDs are the account names that control access to the SQL Server system. A user cannot connect to SQL Server without first specifying a valid login ID. Members of the sysadmin fixed server role define login IDs. |
Users | A user identifier identifies a user within a database. All permissions and ownership of objects in the database are controlled by the user account. User accounts are specific to a database; for example, the xyz user account in the sales database is different from the xyz user account in the inventory database, although both accounts have the same ID. User IDs are defined by members of the db_owner fixed database role. |
Roles | A role is like a user group in a Windows 2000 domain. It allows you to collect users into a single unit so that you can apply permissions against those users. Permissions that are granted to, denied to, or revoked from a role also apply to any members of the role. You can establish a role that represents a job performed by a class of workers in your organization and grant the appropriate permissions to that role. As workers rotate into the job, you simply add them as a member of the role. As they rotate out of the job, you can remove them from the role. You do not have to repeatedly grant, deny, and revoke permissions to or from each person as he or she accepts or leaves the job. The permissions are applied automatically when the users become members of the role. A role is similar to a Windows security group. |
Groups | There are no groups in SQL Server 2000 or SQL Server 7.0. You can, however, manage SQL Server security at the level of an entire Windows NT or Windows 2000 group. |
This section describes the way in which SQL Server 2000 files and databases are organized. The organization of SQL Server 2000 and SQL Server 7.0 is different from the organization of data in SQL Server 6.5 or earlier.
The fundamental unit of data storage in SQL Server is the page. In SQL Server 2000, the page size is 8 kilobytes (KB). In other words, SQL Server 2000 databases contain 128 pages per megabyte (MB).
The start of each page is a 96-byte header used to store system information, such as the type of page, the amount of free space on the page, and the object ID of the object owning the page.
Data pages contain all of the data in data rows (except text, ntext, and image data, which are stored in separate pages). Data rows are placed serially on the page (starting immediately after the header). A row offset table starts at the end of the page. The row offset table contains one entry for each row on the page, and each entry records how far the first byte of the row is from the start of the page. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page, as shown in Figure 1.4.
Extents are the basic unit in which space is allocated to tables and indexes. An extent is eight contiguous pages, or 64 KB. In other words, SQL Server 2000 databases have 16 extents per megabyte.
Figure 1.4 Entries in the row offset and rows on the page.
To make its space allocation efficient, SQL Server 2000 does not allocate entire extents to tables that have small amounts of data. SQL Server 2000 has two types of extents:
A new table or index is usually allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it is switched to uniform extents. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents.
SQL Server 2000 maps a database over a set of operating-system files. Data and log information are never mixed on the same file, and individual files are used only by one database.
SQL Server 2000 databases have three types of files:
SQL Server 2000 is effective at quickly allocating pages to objects and reusing space freed up by deleted rows. These operations are internal to the system and use data structures not visible to users, yet these processes and structures are occasionally referenced in SQL Server messages.
SQL Server uses two types of allocation maps to record the allocation of extents:
SQL Server 2000 supports indexes on views. The first index allowed on a view is a clustered index. At the time a CREATE INDEX statement is executed on a view, the result set for the view materializes and is stored in the database with the same structure as a table that has a clustered index.
The data rows for each table or indexed view are stored in a collection of 8 KB data pages. Each data page has a 96-byte header containing system information, such as the identifier of the table that owns the page. The page header also includes pointers to the next and previous pages that are used if the pages are linked in a list. A row offset table is at the end of the page. Data rows fill the rest of the page, as shown in Figure 1.5.
SQL Server 2000 tables use one of two methods to organize their data pages—clustered tables and heaps:
Figure 1.5 Organization of data pages.
Indexed views have the same storage structure as clustered tables.
SQL Server also supports up to 249 non-clustered indexes on each table or indexed view. The non-clustered indexes also have a B-tree structure but utilize it differently than clustered indexes. The difference is that non-clustered indexes have no effect on the order of the data rows. Clustered tables and indexed views keep their data rows in order based on the clustered index key. The collection of data pages for a heap is not affected if non-clustered indexes are defined for the table. The data pages remain in a heap unless a clustered index is defined.
Every SQL Server 2000 database has a transaction log that records all transactions and the database modifications made by each transaction. This record of transactions and their modifications supports three operations:
The server components of SQL Server 2000 receive SQL statements from clients and process those SQL statements. Figure 1.6 shows the major components involved with processing a SQL statement that is received from a SQL Server client.
SQL statements are sent from clients by using an application-level protocol specific to SQL Server, called Tabular Data Stream (TDS). SQL Server 2000 accepts the following versions of TDS:
TDS packets are built by the Microsoft OLE DB Provider for SQL Server, the SQL Server Open Database Connectivity (ODBC) driver, or the DB-Library dynamic link library (DLL). The TDS packets are then passed to a SQL Server client Net-Library, which encapsulates them into network protocol packets. On the server, the network protocol packets are received by a server Net-Library that extracts the TDS packets and passes them to the relational database engine. This process is reversed when results are returned to the client.
Figure 1.6 Processing a SQL statement that is received from a client.
Each server can be listening simultaneously on several network protocols and will be running one server Net-Library for each protocol on which it is listening.
The database server processes all requests passed to it from the server Net-Libraries. The server then compiles all the SQL statements into execution plans and uses the plans to access the requested data and build the result set that is returned to the client.
The relational database engine of SQL Server 2000 has two main parts: the relational engine and the storage engine. One of the most important architectural changes made in SQL Server 7.0 (and carried over to SQL Server 2000) was to strictly separate the relational and storage engine components within the server and to have them use the OLE DB API to communicate with each other, as shown in Figure 1.7.
Figure 1.7 Relational engine components.
SQL statements are the only commands sent from applications to SQL Server 2000. All of the work done by an instance of SQL Server is the result of accepting, interpreting, and executing SQL statements. SQL Server 2000 executes SQL statements by using one of the following processes:
SQL Server 2000 dynamically acquires and frees memory as needed. Typically, it is not necessary for an administrator to specify how much memory should be allocated to SQL Server, although the option still exists and is required in some environments. When running multiple instances of SQL Server on a computer, each instance can dynamically acquire and free memory to adjust for changes in the workload of the instance.
SQL Server 2000 Enterprise Edition introduces support for using Windows 2000 Address Windowing Extensions (AWE) to address approximately 8 GB of memory for instances running on Windows 2000 Advanced Server and approximately 64 GB for instances running on Windows 2000 Data Center. Each instance using this additional memory, however, must statically allocate the memory it needs.
The primary purpose of a database is to store and retrieve data, so performing a lot of disk reads and writes is one of the inherent attributes of a database engine. Disk input/output (I/O) operations consume many resources and take a relatively long time to complete. Much of the logic in relational database software concerns making the pattern of I/O usage highly efficient.
SQL Server 2000 allocates much of its virtual memory to a buffer cache and uses the cache to reduce physical I/O. Each instance of SQL Server 2000 has its own buffer cache. Data is read from the database disk files into the buffer cache. Multiple logical reads of the data can be satisfied without requiring the data to be physically read again. The data remains in the cache until it has not been referenced for some time and the database needs the buffer area to read in more data. Data is written back to the disk only if it is modified. Data can be changed multiple times by logical writes before a physical write transfers the new data back to the disk.
The I/O from an instance of SQL Server is divided into logical and physical I/O. A logical read occurs every time the database engine requests a page from the buffer cache. If the page is not currently in the buffer cache, a physical read is then performed in order to read the page into the buffer cache. If the page is currently in the cache, no physical read is generated. The buffer cache simply uses the page that is already in memory. A logical write occurs when data is modified in a page in memory. A physical write occurs when the page is written to disk. A page can remain in memory long enough to have more than one logical write made before it is physically written to disk.
The SQL Server 2000 full-text query component supports sophisticated searches on character string columns. This capability is implemented by the Microsoft Search service, which has two roles:
The full-text catalogs and indexes are not stored in a SQL Server database. They are stored in separate files that are managed by the Microsoft Search service. The full-text catalog files are not recovered during a SQL Server recovery. They also cannot be backed up and restored by using the Transact-SQL BACKUP and RESTORE statements. The full-text catalogs must be resynchronized separately after a recovery or restore operation. The full-text catalog files are accessible only to the Microsoft Search service and to the Windows NT or Windows 2000 system administrator.
SQL Server 2000 maintains the consistency and integrity of each database, despite errors that occur in the system. Every application that updates data in a SQL Server database does so by using transactions. A transaction is a logical unit of work made up of a series of statements (selects, inserts, updates, or deletes). If no errors are encountered during a transaction, all of the modifications in the transaction become a permanent part of the database. If errors are encountered, none of the modifications are made to the database.
A transaction goes through several phases:
SQL Server applications can also run in autocommit mode. In autocommit mode, each individual Transact-SQL statement is committed automatically if it is successful and is rolled back automatically if it generates an error. There is no need for an application running in autocommit mode to issue statements that specifically start or end a transaction.
All Transact-SQL statements run in a transaction: an explicit transaction, an implicit transaction, or an autocommit transaction. All SQL Server transactions that include data modifications either reach a new point of consistency and are committed or are rolled back to the original point of consistency. Transactions are not left in an intermediate state in which the database is not consistent.
Each new version of SQL Server seeks to automate or eliminate some of the repetitive work performed by database administrators. Because database administrators are typically among the people most highly trained in database issues at a site, these improvements enable a valuable resource—the administrator—to spend more time working on database design and application data access issues.
Many components contribute to the effectiveness of SQL Server 2000 administration:
Transact-SQL is the language used for all commands sent to SQL Server 2000 from any application. Transact-SQL contains statements that support all administrative work done in SQL Server. These statements fall into two main categories:
The SQL Distributed Management Framework (SQL-DMF) is an integrated framework of objects, services, and components that are used to manage SQL Server 2000. SQL-DMF provides a flexible and scalable management framework that is adaptable to the requirements of an organization. This tool lessens the need for user-attended maintenance tasks (such as database backup and alert notification) by providing services that interact directly with SQL Server 2000.
The key components of SQL-DMF support the proactive management of the instances of SQL Server on your network by enabling you to define the following information:
Figure 1.8 shows the main components of SQL-DMF.
SQL Server 2000 includes many graphical utilities that help users, programmers, and administrators perform the following tasks:
In addition to these tools, SQL Server contains several wizards to walk administrators and programmers through the steps needed to perform more complex administrative tasks.
Figure 1.8 The components that make up SQL-DMF.
SQL Server 2000 provides a number of features that enable administrators to program the server to administer itself for many repetitive actions or exception conditions. This functionality enables the administrators to spend more time on activities such as designing databases and working with programmers on efficient database access coding techniques. Applications from any vendor can choose SQL Server as their data storage component and minimize the administrative requirements of customers by automating administrative tasks.
These automation features are not limited to database administration tasks such as scheduling backups. They can also be used to help automate the business practices that the database supports. Applications can be scheduled to run at specific times or intervals. Specific conditions detected in the system can be used to trigger these applications if they need to be executed before the next scheduled time.
The features that support the automation of administrative tasks are SQL Server Agent, jobs, events and alerts, operators, and triggers.
The backup and restore components of SQL Server 2000 allow you to create a copy of a database. This copy is stored in a location protected from the potential failures of the server running the instance of SQL Server. If the server running the instance of SQL Server fails or if the database is somehow damaged, the backup copy can be used to recreate or restore the database.
SQL Server 2000 includes several components that support importing and exporting data, including DTS, replication, bulk copying, and distributed queries.
Data Transformation Services can be used to import and export data between heterogeneous OLE DB and ODBC data sources. A DTS package is defined that specifies the source and target OLE DB data sources; the package can then be executed on an as-required basis or at scheduled times or intervals. A single DTS package can cover multiple tables. DTS packages are not limited to transferring data straight from one table to another, because the package can specify a query as the source of the data. This functionality enables packages to transform data, such as by running a query that returns aggregate summary values instead of the raw data.
Replication is used to create copies of data in separate databases and keep these copies synchronized by replicating modifications in one copy to all the others. If it is acceptable for each site to have data that might be a minute or so out of date, replication enables the distribution of data without the overhead of requiring distributed transactions to ensure all sites have an exact copy of the current data. Replication can therefore support the distribution of data for a relatively low cost in network and computing resources.
The bulk copy feature of SQL Server provides for the efficient transfer of large amounts of data. Bulk copying transfers data into or out of one table at a time.
Transact-SQL statements use distributed queries to reference data in an OLE DB data source. The OLE DB data sources can be another instance of SQL Server or a heterogeneous data source, such as a Microsoft Access database or Oracle database.
Transact-SQL uses a set of DBCC statements to verify the integrity of a database. The DBCC statements in SQL Server 2000 and SQL Server 7.0 contain several improvements to the DBCC statements used in SQL Server 6.5:
Replication is a set of technologies that allows you to keep copies of the same data on multiple sites, sometimes covering hundreds of sites. Replication uses a publish-subscribe model for distributing data:
SQL Server 2000 also supports replication to and from heterogeneous data sources. OLE DB or ODBC data sources can subscribe to SQL Server publications. SQL Server can also receive data replicated from a number of data sources, including Microsoft Exchange, Microsoft Access, Oracle, and DB2.
SQL Server 2000 provides components that can be used to build data warehouses or data marts. The data warehouses or data marts can be used for sophisticated enterprise intelligence systems that process the types of queries used to discover trends and analyze critical factors. These systems are called OLAP systems. The data in data warehouses and data marts is organized differently than in traditional transaction-processing databases.
Enterprise-level relational database management software, such as SQL Server 2000, was designed originally to centrally store the data generated by the daily transactions of large companies or government organizations. Over the decades, these databases have grown to be highly efficient systems for recording the data required to perform the daily operations of the enterprise. Because the system is based on computers and records the business transactions of the enterprise, these systems are known as Online Transaction-Processing (OLTP) systems.
The data in OLTP systems is organized primarily to support transactions such as the following:
Individual transactions are completed quickly and access relatively small amounts of data. OLTP systems are designed and tuned to process hundreds or thousands of transactions being entered at the same time.
Although OLTP systems excel at recording the data required to support daily operations, OLTP data is not organized in a manner that easily provides the information required by managers to plan the work of their organizations. Managers often need summary information from which they can analyze trends that affect their organization or team.
Systems designed to handle the types of queries used to discover trends and critical factors are called OLAP systems. OLAP queries typically require large amounts of data. For example, the head of a government motor vehicle licensing department could ask for a report that shows the number of each make and model of vehicle registered by the department each year for the past 20 years. Running this type of query against the original detail data in an OLTP system has two effects:
Another issue is that many large enterprises do not have only one OLTP system that records all the transaction data. Most large enterprises have multiple OLTP systems, many of which were developed at different times and use different software and hardware. In many cases, the codes and names used to identify items in one system are different from the codes and names used in another. Managers who are running OLAP queries generally need to be able to reference the data from several of these OLTP systems.
Online analytical processing systems operate on OLAP data in data warehouses or data marts. A data warehouse stores enterprise-level OLAP data, while a data mart is smaller and typically covers a single function in an organization.
Applications use two components to access a database: an API or a Uniform Resource Locator (URL) and a database language.
SQL Server supports a number of APIs for building general-purpose database applications. The supported APIs include open APIs with publicly defined specifications supported by several database vendors, such as the following:
Internet applications can also use URLs that specify IIS virtual roots referencing an instance of SQL Server. The URL can contain an XPath query, a Transact-SQL statement, or a template. In addition to using URLs, Internet applications can also use ADO or OLE DB to work with data in the form of XML documents.
A database language defines the syntax of the commands sent to the database. The commands sent through the API enable the application to access and modify data. They also enable the application to create and modify objects in the database. All commands are subject to the permissions granted to the user. SQL Server 2000 supports two languages: (1) Transact-SQL and (2) Internet applications running on IIS and using XPath queries with mapping schemas.
Transact-SQL is the database language supported by SQL Server 2000. Transact-SQL complies with the entry-level SQL-92 standard but also supports several features from the intermediate and full levels. Transact-SQL also supports some powerful extensions to the SQL-92 standard.
The ODBC specification defines extensions to the SQL defined in the SQL-92 standard. The ODBC SQL extensions are also supported by OLE DB. Transact-SQL supports the ODBC extensions from applications using the ADO, OLE DB, or ODBC APIs, or the APIs that layer over ODBC. The ODBC SQL extensions are not supported from applications that use the DB-Library or Embedded SQL APIs.
SQL Server 2000 supports a subset of the XPath language defined by the World Wide Web Consortium (W3C). XPath is a graph navigation language used to select nodes from XML documents. First, you use a mapping schema to define an XML-based view of the data in one or more SQL Server tables and views, then you can use XPath queries to retrieve data from that mapping schema.
You usually use XPath queries in either URLs or the ADO API. The OLE DB API also supports XPath queries.
The SQL Server 2000 architecture consists of many components. One type of component in SQL Server is the database, which is where data is actually stored. A database is made up of logical components and physical components. Another component of SQL Server is the relational database engine. The relational database engine processes queries and manages memory, thread, task, and I/O activity. This engine also processes full-text queries and transactions. SQL Server 2000 supports database administration through DDL and DML, stored procedures, SQL-DMF, graphical tools, automated administration, backup and restore processes, import and export processes, data validation, and replication. In addition, SQL Server 2000 provides components that can be used to build data warehouses or data marts. SQL Server supports OLAP systems and OLTP systems. Applications use two components to access a SQL Server database: an API or a URL and a database language.