If you want to be able to audit a database effectively, you need to have a basic understanding of how a database works. Here, we will cover a broad set of components that you as the auditor will need to understand to audit a database properly.
In the early 1990s, applications were written using the client-server model. This consisted of a desktop program connecting over a network directly to a database backend. This was referred to as a two-tier application.
In the late 1990s, three-tiered applications became the norm. This new model consisted of a web browser connecting to a middle-tier web application. The middle tier then connected to the database backend. Three-tiered applications were a great step forward. Custom software was not needed on every client workstation. Software updates only needed to be applied to the central server. Clients could run any operating system that supported a basic browser. Moreover, in the three-tiered model, securing the database was much simpler.
Of course, the infrastructure the database required to support two-tier applications still exists in database backends for three-tiered applications. The danger now exists that an attacker will circumvent the web application to attack the backend database.
Typically, an audit engagement will focus on one or two database vendors, such as Oracle or DB2. However, any medium-sized or large organization typically will be made up of a sampling of many different database platforms. Below is a summary of the most common databases and vendors, along with a short overview of each.
Oracle Corporation is the largest database vendor, having an entire series of databases. In addition, Oracle Corporation has grown beyond standard database software, having a variety of products including but not limited to web servers, development tools, identity-management software, a collaboration suite, and multiple enterprise resource planning (ERP) solutions.
In the database market, the Oracle database has one of the largest install bases and an impressive feature set. The database comes in multiple flavors, including Standard Edition, Enterprise Edition, OracleLite, Express Edition, and others. Most Oracle databases you audit will be either Standard Edition or Enterprise Edition. The features are fairly similar; however, the advanced features in Enterprise Edition are changing constantly, so you will need to access the Oracle website to check up on the exact feature sets included in the version you are auditing.
Oracle Corporation also has branched out into other databases, having purchased several other database vendors, including:
Sleepycat Software, which maintains Berkeley DB, an open-source, embedded database
The TimesTen in-memory database
InnoDB, a transaction engine for the MySQL database
IBM is another of the largest database vendors, although IBM's database software is a small piece of the company's business. IBM's main database is the DB2 product line. This line is composed of two main products:
DB2 Universal Database, providing database software for AIX, Linux, HP-UX, Sun, and Windows
DB2 Universal Database for z/OS, providing software for the mainframe
There is a lot of confusion around the nomenclature of these two products. Typically, people refer to UDB as the Linux, Unix, and Windows version and DB2 as the mainframe version. This is a misnomer because UDB is actually a term used for all of IBM's latest DB2 software. Understand what people mean when they use these terms, but try to use the correct terms to avoid confusion.
IBM acquired Informix Software in 2001 and now maintains the Informix Dynamic Server. Informix was, for a brief period of time, the second most popular database. Owing to some misgovernance issues, Informix fell out of favor and hit hard times. These days Informix is rarely used for new database installations, but there is a large installed base within many enterprises, and you should expect Informix to exist for quite some time into the future.
IBM also maintains one of the first commercially available database management systems, Information Management System (IMS). IMS dates back to 1969 and is not actually a relational database but rather a hierarchical database. IMS typically runs on the mainframe and does not usually work in a client-server model.
MySQL is an open-source database used extensively in small or medium-sized web applications. MySQL is owned and developed under the GNU Public License by MySQL AB, a privately held Swedish company. MySQL has a large and growing grassroots following and is the M in the LAMP (Linux, Apache, MySQL, and PHP) open-source web platform.
MySQL traditionally has been a bare-bones database, providing a small fraction of the functionality available from other database vendors. From the security perspective, this is good because MySQL does exactly what it was meant to do very well-and little else. Administration costs are relatively low, and MySQL provides adequate performance for all but the most demanding web applications.
MySQL AB is investing heavily in the MySQL database, and the results can be seen. MySQL 5.0 has added significant functionality, including stored procedures, views, and triggers. MySQL is one of the simplest databases to secure from hacking because of the small attack surface it exposes. In addition, MySQL development is done in the open, meaning that source code is available for anyone to see. This had led to a relatively secure and vulnerability-free code base. Vulnerabilities have been discovered in the MySQL source code, but security holes are discovered early in the life cycle of each release and are patched quickly.
MySQL AB also offers a second open-source database called MaxDB. MaxDB is designed specifically as a high-reliability backend for SAP systems.
Another leading database vendor is Sybase, Inc. Sybase produces several databases, including:
The flagship Sybase Adaptive Server Enterprise database, designed for enterprise databases
Sybase Adaptive Server Anywhere, designed as a lighter-weight database
Sybase originally partnered with Microsoft to develop the early versions of its database system, which was referred to at the time as Sybase SQL Server on Unix and Microsoft SQL Server on Windows. As of version 4.9, Microsoft and Sybase split the code line and went their separate ways.
Sybase has expanded beyond databases as well. The company has various developer tools and a web application server and currently is focused on the delivery of data to mobile devices. Although the company has lost significant market share to the competition in the database market, it continues to maintain a presence in many places, and its databases will continue to exist for a long time.
Microsoft SQL Server is one of the most popular databases owing to its low price tag and its simplistic administration model, as well as the sheer momentum of Microsoft. Microsoft SQL Server comes in several flavors listed here:
Microsoft SQL Server 7.0 is an older version of the product with a few legacy installations still in existence.
Microsoft SQL Server 2000 (aka SQL Server 8.0) was Microsoft's main database version for 5 years. As such, it is heavily entrenched in a large number of enterprises.
Microsoft SQL Server 2005 is the latest release of the database. It provides a rich new set of security features among other functionality.
The Microsoft Database Engine (MSDE) is a free version of SQL Server providing a backend for ISVs to embed databases in their applications. Because MSDE is free, it is embedded in a large number of applications and is very common. With the delivery of SQL Server 2005, MSDE has been renamed to SQL Server 2005 Express Edition.
Microsoft SQL Server is often referred to as SQL, Sql Server, MSSQL, and even MS Sql Server. Although it's best to stick to the proper nomenclature to avoid confusion, it's important to understand the common, although incorrect, lingo.
Because Microsoft SQL Server is so easy to install and administer, it is often used by people with relatively little knowledge about securing it properly. This can lead to problems not because Microsoft SQL Server is insecure but because many people using it haven't taken even the most basic steps to protect it.
Each database vendor has a slightly different implementation of the various database components. However, the theories and principles apply to all the different platforms fairly universally. We will cover enough of these basics to give you the 30,000-ft view. From there, you should have enough background to follow a technical guide on a specific database platform. Below are the major pieces of the database that you will need to understand from the auditor's perspective.
A database is implemented as a software system, and as such, it is made up of a core set of operating system files. These files include the executable files that will run the database management system. It also may contain other nonexecutable program files such as help files, source and include files, sample files, and installation files.
These files should be protected because the integrity of the database relies on these files not being tampered with. These files should be guarded from any form of modification-particularly any executable files. Access controls should be as restrictive as possible on the directory that holds these files. Ideally, only database administrators should have access to this directory.
Databases rely heavily on configuration settings to determine how the system operates. Protecting these settings is important because if the configuration can be manipulated, security can be subverted.
Configuration values reside in a variety of places, including:
In operating system text files
In the data files
On Windows, stored in the registry
In environment variables
Configuration values are used for a wide range of settings, such as:
Setting the type of authentication or trust model
Setting which groups are database administrators
Determining password management features
Verifying the integrity of configuration values is a critical component of any audit.
Databases need to store the data they hold in physical operating system files. Typically, they are comprised of a series of files. The format of the file is typically proprietary, and the data files contain such information as:
Data being stored
Pointers from one field to the next field or from one row to the next row
Index data, including pointers from the index to the physical data
Indices (the plural of index) contain a subset of the data to which they point. This means that if an attacker can get access to the index, he or she may not need access to the physical data themselves. Ensure that access to any index is protected to the same degree as the data themselves.
Usually, the database dictionary is stored in these data files, so again, any access to these files can be used to circumvent controls built into the database.
An important component of any database system is the client. Typically, the client is on a remote system from the database. The client also can connect from the local system. Connecting locally frequently is used with batch processes.
In order for a client to connect to the database, a client library or driver is required on the client's machine. This usually consists of a set of executables such as DLLs and shared objects, as well as an API that the client can use to connect to the database. The client libraries are hard to protect because they usually exist on remote systems where access controls are much more difficult to maintain. However, it is very important to maintain the integrity of the client drivers in locations from where administrators or even regular users will be connecting.
One weak point in the security model is the integrity of the client libraries. If the client drivers can be manipulated, credentials can be stolen fairly easily. Client drivers can be trojaned, or even something as simple as a keyboard logger on the client system can lead to a compromise of the database.
Communication over the network also requires network drivers on the database. These drivers are another point of focus for the auditor because they are the avenue that the attacker will use to access the database.
Backups are a very important piece of every database platform. Failure in some component of the database is not a question of if but when. Whether the problem is a hardware or a software failure, having a backup is critical. Backups contain a copy of the database. The backup can be to a separate file, to a tape, or to another storage facility.
A very common place that data are stolen from, lost, or leaked is through the backup facility. Backups often are secured by encrypting the data as they are written to a file or encrypting the entire file after it is written. Storing the encryption key then becomes important to properly securing the backup. Just as important is ensuring that you have properly backed up the encryption keys along with the data so that the backup can be restored properly. If you can't restore the files, then the backup becomes worthless. Backups that cannot be restored result in a loss of utility.
Structured Query Language (SQL) is used to access data in a relational database. Technically, SQL should be pronounced as three separate letters "S-Q-L," but the pronunciation "sequel" has become so commonplace it is also accepted as correct. SQL is a set-based language, meaning that it works on a set of data at a time. It is not a procedural language, meaning that it does not have any procedural components such as while loops, if statements, for loops, etc. Most database platforms do have extensions to SQL to provide procedural components. For instance, Oracle has PL/SQL, and Sybase and Microsoft SQL Server have Transact-SQL.
SQL statements are used to pull data from the database. SQL is built around four core statements, as illustrated in Table 9-1.
View a subset of data from a table
Add new data to a table
Modify existing data in a table
Remove a subset of data from a table
The statement you will need to understand best is SELECT. The basic syntax of the SELECT statement is
SELECT <COLUMN LIST> FROM <TABLE NAME> WHERE <CONDITION>
In this statement, <COLUMN LIST> is a comma-separated list of column names that will be displayed. As a shortcut, you can use an asterisk to display all columns in the output. <TABLE NAME> is replaced with the name of the table to be displayed. <CONDITION> and the word WHERE are optional. If you do not indicate a WHERE clause, all rows in the table are returned. Using the WHERE clause, you can SELECT only the rows you are interested in.
An example of selecting the first and last names of all employees who make more than $20,000 is shown here:
SELECT FIRST_NAME, LAST_NAME FROM EMPLOYEES WHERE SALARY > 20000
SELECT statements can get much more complex then this. Your audit typically does not need to go much deeper than this, though.
A database is composed of a variety of objects. Each object has a different task or serves a different purpose. Understanding each object is not necessary. However, you should have a grasp of the common object types.
Table 9-2 shows the most common types of database objects. Each database platform also has many proprietary object types, such as table spaces, schemas, rules, sequences, and synonyms. You should review the specific documentation for your database platform for more details.
Stores rows of data in one or more columns.
A SELECT statement on top of a table or another view that creates a virtual table. Views can change the number or order of columns, can call functions, and can manipulate data in a variety of ways.
Procedural code that can be called to execute complex functionality within the database. Functions return values. Procedures do not return values. Stored procedures are very efficient for data access.
Procedural code that is called when a table is modified. Can be used to perform any actions, including modifications to other tables, when data are changed.
Mechanism to provide fast lookup of data. Indices are complex objects, and their proper tuning is critical to database performance.
The database stores metadata about itself. These metadata are referred to as the data dictionary or also sometimes as the system tables. The metadata tell the database about its own configuration, setup, and objects. Note that the metadata do not say anything about the content of the information in the database, only about the format of the database. The format of the data dictionary is static. The data dictionary does contain metadata about its own structure, but the data dictionary format is not something that can be modified easily.
The metadata in the data dictionary are designed to be manipulated. Rarely is the data dictionary manipulated directly. Instead, special stored procedures with complex validation logic are used to manipulate the system tables. Direct access to the system tables is dangerous because even a small misstep could corrupt the data dictionary, leading to serious database problems.
The data dictionary defines the rest of the database, specifying objects such as users, groups, and permissions. The data dictionary defines the structure of the database, including specifying where physical files are stored on disk, the names of tables, column types and lengths, and the code for stored procedure, trigger, and views.