Vacuuming Old Data


Up to this point, we have not really considered what happens to old event and notification data in a SQL-NS application's database after it has been processed. Generally, old data remains in the tables until it is vacuumed. Vacuuming is a process the SQL-NS engine runs periodically to delete old data that isn't needed anymore. By getting rid of unnecessary, old data, the SQL-NS engine can reclaim disk space and maintain the performance of the application.

In the ADF, you can configure a vacuuming schedule that specifies when the engine should clean up the application's old data. You can also specify the retention age for the old data. At the times specified in the vacuuming schedule, the engine runs the vacuum process, removing any items older than the specified retention age. Newer items are preserved (even if they have been completely processed).

Advantages Of The SQL-NS Vacuuming Model

The vacuum in the SQL-NS engine performs cleanup on a periodic schedule. You might wonder why this approach was chosen, instead of simply cleaning up event and notification data immediately after it is processed.

There are two reasons for the design choice. First, immediately removing processed data would make debugging, tracking, and auditing the behavior of a notification application difficult. Leaving old data in the tables, for some amount of time after it has been processed, provides a convenient history of the application's processing that can be queried by developers and administrators.

The other motivation for the chosen vacuuming strategy is that deleting data can adversely affect performance. Delete operations usually cause writes to disk and involve significant logging (depending on the recovery model used). If data was deleted as part of the normal processing pipeline, the throughput of the application would be reduced. Deleting data asynchronously, on a periodic schedule, has two performance advantages: The fixed costs of the delete operations can be amortized over larger sets of data, and the operations can be done at times of the day when the application is otherwise mostly inactive (during the middle of the night, for example).


This section describes the ADF elements used to configure vacuuming and provides some general guidelines for using vacuuming in SQL-NS applications. Although vacuuming falls into the category of maintenance tasks (covered in Chapter 14, "Administering a SQL-NS Instance"), it's important to think about vacuuming at the deployment stage because an application's vacuuming policy affects the amount of space needed on the database server's disks, as well as the appropriate size and growth settings for its database files.

Configuring Vacuuming in the ADF

A SQL-NS application's vacuuming configuration is specified in the <Vacuum> element of its ADF. Listing 13.3 shows a sample <Vacuum> declaration.

Listing 13.3. Configuring the Vacuum in the ADF

 <Application>   ...   <ApplicationExecutionSettings>     ...     <Vacuum>       <RetentionAge>P3DT00H00M00S</RetentionAge>       <VacuumSchedule>         <Schedule>           <StartTime>23:00:00</StartTime>           <Duration>P0DT02H00M00S</Duration>         </Schedule>         <Schedule>           <StartTime>03:00:00</StartTime>           <Duration>P0DT02H00M00S</Duration>         </Schedule>       </VacuumSchedule>     </Vacuum>     ...   </ApplicationExecutionSettings> </Application> 

The <Vacuum> element's two top-level subelements, <RetentionAge> and <VacuumSchedule>, specify the retention age for old data and the times at which the vacuum should run, respectively.

The <RetentionAge> element defines the minimum age for items to be vacuumed. The <RetentionAge> value is specified using the XSD duration syntax (described in detail in the sidebar "The XSD duration Data Type," p. 157, in Chapter 5, "Designing and Prototyping an Application"). In the example shown in Listing 13.3, the retention age is set to three days.

Note

The vacuum will not delete data that has not been completely processed, even if such data is older than the specified retention age. An example of old data that still requires processing is a set of notifications awaiting delivery retry (after having experienced delivery failures one or more times). The data associated with these notifications will be removed only after all delivery attempts have been completed.


The <VacuumSchedule> element specifies when the vacuum will run. The schedule consists of a series of <Schedule> elements, each of which specifies a start time and a duration.

Start times are specified in UTC, and durations are specified in the XSD duration syntax. For each vacuum schedule specified, the vacuum starts running at the given start time and continues either until the specified duration elapses or until all items eligible for vacuuming have been removed (whichever happens first). The <Duration> element is optional, and, if omitted, the duration is assumed to run from the given start time until the start of the next vacuum schedule. You can specify as many <Schedule> elements as you need to implement an appropriate vacuuming policy for your application.

Vacuum schedules have an implied recurrence of one day. In other words, the vacuum starts operating at the specified start times every day. If the <Duration> element is omitted from the last vacuum schedule declared, the assumed duration for that schedule runs from the given start time to the start time of the first vacuuming schedule on the next day.

Caution

Don't forget that schedule start times are specified in UTC. To ensure that the vacuum runs when you intend it to, covert local times to UTC before specifying them in the ADF.


Note

The SQL-NS engine runs only one vacuum process at a time. If you specify overlapping vacuum schedulesthat is, one schedule duration that runs past another schedule's start timethe vacuum adjusts the durations to make the intervals nonoverlapping.


Choosing Appropriate Vacuuming Settings

In any application, the appropriate retention age for old items is a trade-off between the need to retain historical data and the disk space and performance implications of storing that data. The appropriate retention age for your applications depends on their auditing and tracking requirements. You need to retain data for as long as you expect to have to answer detailed questions about notifications after they are generated. Some applications have relaxed auditing and tracking requirements, and, in these applications, a short retention age is appropriate. Other applications need to support customer queries about their notifications and therefore might need to keep several days' worth of old data.

The larger the retention age you use, the more disk space you will need for the application's database. You should ensure that your database system has sufficient capacity to store the data and that the file sizes are set so that file expansion does not have to be performed too frequently (see the "Defining the Physical Storage of SQL-NS Database Objects" section, p. 459, for details on specifying file size attributes). Also, be aware that as the tables in the database grow larger, queries against them become increasingly inefficient. This inefficiency can be mitigated, to some extent, by proper indexing.

Because the actual vacuuming operations can have a negative performance impact, it's important to plan the vacuum schedule carefully. During a given day, most SQL-NS applications have some periods of busy activity (around the times that events arrive and most scheduled subscriptions are set to be evaluated) and other periods of relative inactivity. Vacuuming should always be scheduled at the times that your application is expected to be least busy. You need to determine what these times are for your applications, based on predictions of the expected usage patterns, or, more reliably, based on observation of the application in production.

It's important not to underestimate the time required for vacuuming to complete. If, at the end of one run, the vacuum determines that it did not complete its jobthat is, it had to stop because the specified schedule duration had run out, but there were still remaining items to vacuumit will write a warning message to the Application Event Log. (For details on viewing messages in the Application Event Log, see the "Using the Application Event Log" section, p. 484, in Chapter 14.) If you notice this happening frequently, you should adjust the vacuuming schedule, either by adding more schedule elements or by increasing the durations of the existing ones.




Microsoft SQL Server 2005 Notification Services
Microsoft SQL Server 2005 Notification Services
ISBN: 0672327791
EAN: 2147483647
Year: 2006
Pages: 166
Authors: Shyam Pather

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net