6.8 Minimizing Audit-Log Space Consumption6.8.1 ProblemYou are going to implement an audit log, but you need a way to minimize space consumption so that the resulting log uses disk space as efficiently as possible. It's still important to be able to construct an accurate snapshot of the base table as it appeared at any give point in time. 6.8.2 SolutionThe log recipe shown earlier in this chapter implemented a log that contained a complete snapshot of a row for every operation performed on that row. While simple to implement and to query, the result is a log that contains a great deal of redundant information where update and delete operations are concerned . To minimize the space used by a row-level log, you can design it such that a column value is recorded only when it is changed. Thus, a change to one column in a row does not cause an entirely new copy of that row to be written to the log. Instead, only the before and after versions of the modified column are recorded in the log. Similarly, with deletes, only the delete action needs to be recorded. For purposes of this recipe, assume that the Stock table to be audited can be created using the following statement: CREATE TABLE Stock( ProductId CHAR(40), Qty INTEGER, Price INTEGER, PRIMARY KEY(ProductId) ) Given this table, the solution requires that a log table is created and that triggers are created to record changes to the data in the Stock table's audit log. 6.8.2.1 Step 1: Create the audit tableCreate an audit table that stores the audit data for the base table. The following statement creates a table named StockLog that can be used to log changes made to the table named Stock: CREATE TABLE StockLog( ProductId CHAR(40) NOT NULL, Qty INTEGER NULL, Price INTEGER NULL, EventTime DATETIME NOT NULL, DeleteFlag CHAR(1) NULL, LogId INTEGER IDENTITY ) As you can see, the audit table has a similar structure to the Stock table's, with the following exceptions:
6.8.2.2 Step 2: Implement the audit triggersThis set of triggers records all insert, update, and delete activity in the StockLog table, but the approach is slightly different from the previous audit-log recipe in this chapter. This time, column values are not recorded for deletes, and only modified columns are recorded with update log entries. Pay special attention to the CASE statements in the updateStock trigger as you read the code. CREATE TRIGGER insertStock ON Stock FOR INSERT AS BEGIN INSERT INTO StockLog (ProductId, Qty, Price, DeleteFlag, EventTime) SELECT i.ProductId, i.Qty, i.Price, null, CURRENT_TIMESTAMP FROM inserted I END GO CREATE TRIGGER deleteStock ON Stock FOR DELETE AS BEGIN INSERT INTO StockLog (ProductId, Qty, Price, DeleteFlag, EventTime) SELECT d.ProductId, null, null, 'D', CURRENT_TIMESTAMP FROM deleted d END GO CREATE TRIGGER updateStock ON Stock FOR UPDATE AS BEGIN INSERT INTO stockLog (ProductId, Qty, Price, DeleteFlag, EventTime) SELECT d.productId, (CASE WHEN d.Qty!=i.Qty THEN i.Qty ELSE NULL END), (CASE WHEN d.Price!=i.Price THEN i.Price ELSE NULL END), null, CURRENT_TIMESTAMP FROM deleted d,inserted I WHERE d.ProductId=i.ProductId END GO 6.8.3 DiscussionSpace can often become an issue if all details are to be stored in a log. This technique demonstrates how to store the information with minimum space consumption. The concept underlying the solution shown here is that column values are only recorded in the log table when they change. An INSERT operation is the only operation where the log record will contain a complete copy of a row. For UPDATE operations, the log will only contain the new values for the columns that were changed and the value of the row's primary key. All columns not changed by an UPDATE statement are left set to NULL in the log record to avoid storing data redundantly. In this way, space consumption is minimized. Deletes are a bit different. Each log record has a delete flag in its descriptor. When a row is deleted, the delete flag is set in the corresponding log's record. Consider, for example, the following DML statements: INSERT INTO Stock(ProductID, Qty, Price) VALUES ('Bananas', 10, 112) INSERT INTO Stock(ProductID, Qty, Price) VALUES ('Apples', 20, 223) UPDATE Stock SET Qty=25 WHERE ProductID='Apples' UPDATE Stock SET Qty=30 WHERE ProductId='Apples' DELETE FROM Stock WHERE ProductId='Bananas' The audit-log entries generated by these statements would be as follows : SELECT LogId, ProductId, Qty, Price, DeleteFlag, EventTime FROM stockLog GO LogId ProductId Qty Price DeleteFlag EventTime ------ ---------- ----- ------ ---------- ------------------------ 1 Bananas 10 112 NULL 2000-06-27 23:01:25.943 2 Apples 20 223 NULL 2000-06-27 23:01:25.950 3 Apples 25 NULL NULL 2000-06-27 23:01:25.970 4 Apples 30 NULL NULL 2000-06-27 23:01:25.970 5 Bananas NULL NULL D 2000-06-27 23:01:25.993 The first two DML statements were INSERT statements, and, consequently, the log entries each contain a complete copy of the data being inserted. The next two DML statements, however, were UPDATE statements. Both of these updated only one column, and the log entries reflect that fact by setting the other columns to NULL. The final log entry represents a deletion of the Bananas record. The D in the DeleteFlag column indicates that the record was deleted. No other information need be recorded; consequently, both the Qty and Price columns have been set to NULL. Most database systems optimize the storage of NULLS such that the amount of space used for that purpose is negligible. 6.8.3.1 Snapshot generationGenerating a snapshot of the Stock table is more difficult when the log records changes to a single column than it is when each log record contains a complete copy of the row being changed. In the case of the audit log described in this recipe, you can use the following SELECTstatement to generate a snapshot of the base table as of a specific time. The @TIME variable in the example represents the "as of" time used for the snapshot generation. For testing purposes, you can replace it with CURRENT_TIMESTAMP to return the current snapshot. SELECT t.ProductId, Qty=(SELECT TOP 1 tQ.Qty FROM StockLog tQ WHERE tQ.Qty IS NOT NULL AND tQ.ProductId=t.ProductId AND tQ.EventTime <= @TIME ORDER BY tQ.LogId DESC), Price=(SELECT TOP 1 tP.Price FROM StockLog tP WHERE tP.Price IS NOT NULL AND tP.ProductId=t.ProductId AND tP.EventTime <= @TIME ORDER BY tP.LogId DESC) FROM StockLog t WHERE t.EventTime<=@TIME GROUP BY t.ProductId HAVING NOT EXISTS(SELECT * FROM StockLog WHERE productID=t.productID AND logID=MAX(t.logID) AND DeleteFlag='D') This SELECT statement contains a number of subqueries: one for each column in the Stock table to retrieve the most recent values for those columns and one at the end to eliminate all deleted rows from the report. Each subquery retrieves the most recent value of a column with respect to the date specified by the @TIME variable. The one exception is that the primary key column does not require a subquery because we are assuming that the primary key does not change. To get the most recent value for a column, we execute a subquery. The subquery ignores rows where the column is NULL. The subquery sorts the rows in descending order by LogId and returns the topmost value, which is the most recent value of the column. The subquery respects the time you pass (@TIME) and will ignore values set later than that time. The following example shows the subquery used to return the most recent value for column Qty: SELECT TOP 1 tQ.Qty FROM StockLog tQ WHERE tQ.Qty IS NOT NULL AND tQ.ProductId=t.ProductId AND tQ.EventTime <= @TIME ORDER BY tQ.LogId DESC This same subquery is used for each column in the table, except for the columns in the record descriptor. The WHERE clause in the main query limits the scope of the selection to the rows that were inserted before or on the date and at the time represented by @TIME. The ProductId column is then used to group the result set. Grouping is done because we are interested only in one instance (a snapshot) of each row in the original table. 6.8.3.2 Deleted recordsThe HAVING clause in the query eliminates Stock records that have been deleted, so that they don't appear in the snapshot. The HAVING clause looks like this: HAVING NOT EXISTS(SELECT * FROM StockLog WHERE productID=t.productID AND logID=MAX(t.logID) AND DeleteFlag='D') The subquery in the HAVING clause checks to see if the most recent log entry for a Stock record represents a delete. If it does, then that Stock record is not returned as part of the snapshot. 6.8.3.3 Dealing with NULL valuesIf the Stock table includes some columns that are allowed to be NULL, the solution shown in this recipe will not work properly. That's because the query to generate a snapshot has no way to know whether a NULL value in a particular column represents a NULL column value or if it just means that there has been no change for that column. One possible way to deal with NULL values is to use a second column as a flag to indicate whether the first is a null. This flag column can be a BIT type if the database implementation allows it, or it can be a character type such as a CHAR(1) . A BIT type allows for TRUE and FALSE values. The StockLog table is extended with the columns representing the NULL values: CREATE TABLE StockLog( ProductId char(40) NOT NULL, Qty INTEGER NULL, Price INTEGER NULL, IsEmptyPrice BIT, IsEmptyQty BIT, EventTime DATETIME NOT NULL, DeleteFlag CHAR(1) NULL, LogId INTEGER IDENTITY ) Assuming that you implemented a flag column named IsEmptyQty and that you used 1 to represent a NULL value, and 0 otherwise , the subquery in the snapshot query would need to use a WHERE clause, such as the one shown here: SELECT t.ProductId, Qty=(SELECT TOP 1 tQ.Qty FROM StockLog tQ WHERE tQ.IsEmptyQty=0 AND tQ.ProductId=t.ProductId AND tQ.EventTime <= @TIME ORDER BY tQ.LogId DESC ), Price=(SELECT TOP 1 tP.Price FROM StockLog tP WHERE tP.IsEmptyPrice=0 AND tP.ProductId=t.ProductId AND tP.EventTime <= @TIME ORDER BY tp.LogId DESC) FROM StockLog t WHERE EventTime<=@TIME GROUP BY t.ProductId HAVING NOT EXISTS(SELECT * FROM StockLog WHERE productID=t.productID AND logID=MAX(t.logID) AND DeleteFlag='D') This solution requires only minimal changes to the SELECT statement used to generate snapshots. It also doesn't consume much space in an implementation where the DBMS supports the BIT datatype. Sybase and MS SQL Server both support BIT types. The storage of such BIT types is optimized by combining all the BIT columns in a row together into as few bytes as possible. If there are 8 bit columns, they will consume just 1 byte per row. |