Cacheability Rules and Invalidation

Establishing detailed cacheability rules and invalidation policies is generally more of a developer's task than an administrator's, although it's the administrator who implements the policy. The objective is to determine which content within the application can be cached and how long it should remain in the cache. Not all content should always be cached and there should be rules to define when a piece of content needs to be refreshed from the origin server. Fortunately, WC does provide the following excellent methods for caching and invalidating content:

Default cache settings. Some types of content are set for automatic caching without any special configuration by the administrator. This makes WC useful without any special, application-specific configuration.
Regular expressions. Writing caching rules that parse URLs using regular expressions allows you to identify specific content to be cached.
Full and partial page caching. It isn't necessary to cache an entire HTML page. Many pages have both static and dynamic content. WC can cache and serve the static content while retrieving the dynamic, uncacheable content.
Compression. WC can store both compressed and uncompressed content. The idea behind serving compressed content is that it's faster to download and uncompress a document inside the browser than to try to download (over a potentially slow network) an uncompressed document. This also reduces the load on the network.
Expiration settings. Rules for how long a document resides in the cache can be established. For example, there's a default rule of 300 seconds and after that time period the cached document will be flushed from the cache.
ESI and JESI. Edge Side Includes and Java Edge Side Includes are ways to break documents into different components depending on their cacheability. For example, all users may have the same basic HTML template when they log in to a portal page. However, you'll need to fetch different components such as personalized greetings and messages or content retrieved from a web service. ESI and JESI allow pages to be designed and cached in this manner.
Multiversion caching. Multiple, slightly different versions of the same web page can be cached. This is useful for situations in which all users can see a web page. But based on their status and permissions (implemented via cookies), different users see different data.
Multiple invalidation methods. Content can expire based on time expiration or it can be invalidated automatically from within the application or manually by the administrator.

Within the WC Manager tool, these features are implemented as rules and controlled by the pages under the Rule Association and Rules for Caching, Personalization, and Compression sections. In the following sections we'll explain these features and show how you can use them within WC.

Default Cache Settings

When WC is installed, it's configured by default to cache basic, static content that you would expect. This makes it useful "out of the box," without having to engage in an exhaustive effort to identify all of the application-specific content to cache. Additionally, WC lays down a basic framework of rules to begin with rather than forcing you to start from scratch. It should be noted that cacheable content is not loaded into the cache when WC is started because there's no "preload" feature; it's loaded the first time it's requested .

By default, WC is configured to cache the objects based on the following file extensions:

PDF. Adobe PDF files
JS. JavaScript files
CSS. Cascading Style Sheets
HTML. Static HTML files
Image Files. GIFs, JPEGs, JPGs, PNGs, and BMPs
SWF. Shockwave Flash files

If a file with any of these extensions is returned from the origin server, as long as it's not larger than the maximum object size , WC will, by default, cache these files for future requests .

In Figure 19-21 you can see how the default caching rules are defined with the Caching, Personalization, and Compression Rules page.

Figure 19-21: Caching, Personalization, and Compression Rules page

In Figure 19-21 you can see the default rules for caching. From within this page you can create, add, delete, or edit any caching rule. You know that these rules actually work based on Figure 19-19, where you saw multiple JPG and GIF files being cached.

Notice the rules for Compression and the Detailed Settings headings. Compression determines whether or not the content will be compressed during download and in WC. Some objects are already compressed by default so it doesn't make sense to compress them again. The Detailed Settings link provides additional information for each caching rule, comments, and the ability to change settings. It also lists the expiration policy for the content. In Figure 19-22 you can see the details page for HTML documents.

Figure 19-22: Caching, Personalization, and Compression Rule Details page

In Figure 19-22 you can see that \.html?$ , a regular expression, is being used to cache files ending in .htm and .html . Any file with this extension will be cached in a compressed format. After 300 seconds in the cache, it will be removed immediately. To change this rule (to increase the time frame, for example), simply click the Edit button to make modifications.

Expiration Policies

You define the rules for content expiration on the Expiration Policy Definitions page. This page identifies what expiration policies currently exist and which object types they apply to. WC comes with these three default expiration policies for content management:

Immediately removes the object from the cache as directed by the HTTP Expires header.
Immediately removes the object from the cache after 300 seconds (5 minutes) of being in the cache.
Immediately removes the object from the cache after 3,600 seconds (60 minutes) of being in the cache.

Each cacheability rule has one of these expiration policies assigned to it. You can add, delete, or modify expiration policies as needed. The duration for an object can be set for the amount of time in the cache or based on the object creation time. Once an object has passed its expiration time period it can either be forcibly removed immediately or it can be refreshed at a later time from the origin server when the system load is less busy.

In Figure 19-23 you can see the editing options available to the 300 second expiration policy.

Figure 19-23: Edit Expiration Policy page

In Figure 19-23 you can see that any file with a .pdf, .js, .css, .html , or .htm extension will only be cached for a maximum of 300 seconds after being cached. After that 300 second period, the object will immediately be removed.

Invalidation Methods

Before implementing WC, it's necessary to determine what content is static enough to be served in accordance with the expiration policies and what content needs to be removed more often. The concern would be a situation in which users receive data that isn't correct and base their business decisions on invalid cached content. Content can be invalidated manually by the administrator or it can be automatically invalidated by WC or the application.

Manual Invalidation

To manually invalidate content, use either the Basic or Advanced Content Invalidation pages located under the Operations section of the WC Manager. Both screens allow you to review the content to be invalidated before actually invalidating it. They also allow you to either remove it immediately or specify a time frame within which you can remove the documents.

The Basic page allows you to remove all cached content or specify an exact URL.

The Advanced page provides greater control over which content is invalidated. Content to be invalidated can be identified based on the following:

HTTP methods of GET or POST
URL expressions using either substrings or regular expressions
URL parameters and values using either substrings or regular expressions
Post body expressions using either substrings or regular expressions
Cookie or header names and values
Search keys

The Advanced page provides some intricate search methods for content, but it would be wise to preview the documents before you invalidate them so you can avoid any mistakes with regular expressions.

Automatic Invalidation

No administrator can or should be tasked with invalidating content except in emergency situations; it simply isn't possible to keep up with all the application requests. A far better method is for the developers (who should know their content better than anyone else), to write application triggers that will invalidate content when it changes. These triggers come in the form of

Database triggers executed when data is inserted, updated, or deleted
Invalidation requests called within application code
Shell scripts called after content loads

Oracle provides templates for this code in $ORACLE_HOME/webcache/examples and $ORACLE_HOME/webcache/toolkit .

By using the templates and documentation provided in $ORACLE_HOME/webcache/docs , you can develop a logical invalidation policy to ensure that critical, updated content is being provided to the user .

Be sure to work with the developers to determine which components of the application should be cached. Especially with ESI, partial pages, and application-specific invalidation policies, these rules can become complex. Establish baselines of performance prior to implementation, and then after WC implementation in order to quantify the benefits. Finally, be sure to take advantage of the logging and monitoring utilities within WC so you can measure the performance.