Verity Collections


A Verity collection is a read-optimized, logical database made up of a number of physical files stored on a Web server's hard drive. When a collection is created, the physical files and directories that make up the Verity collection are written to the server.

The collection's logical name is associated with its physical file structure on disk. The logical name also is used when referring to collections by way of the <cfindex> and <cfsearch> tags' collection attribute.

Creating and Indexing Collections

A Verity collection can be created in a number of ways: through ColdFusion Administrator, the <cfcollection> tag, or some other third-party Verity tool. The initialization of the collection effectively sets up the directory structure and records a logical name for the collection. The following code snippet builds a collection programmatically using the <CFCOLLECTION> tag:

 <cfcollection action="create"               collection="SnailsAndPuppyDogTails"               path="c:\cfusionmx7\verity\collections"               language="English"> 

The default directory location for all collections is verity\collections under the ColdFusion root. The contents of these directories is managed and maintained by ColdFusion and Verity, and can for the most part be ignored. Once a collection is created, ColdFusion access to it is via this name (in much the same way that data sources are used to access databases).

<cfcollection> also can be used to map an alias to an existing Verity collection that was created by a tool other than ColdFusion. The action, collection, and path attributes are required. The path must point to a valid Verity collection; mapping does not validate the path.

NOTE

Deleting a mapped collection unregisters the alias; the base collection is not deleted.


Maintaining Collections

A collection starts its life empty and must be populated by way of <cfindex>. You can use <cfindex> to create both file- and query-based indexes. The collection simply needs to know where to find the body of words on which the engine is to search, and what information will make up the result key to be returned on a successful match. Additional attributes are available for filtering and providing result summaries, among other features.

A file library containing a mixture of file types could be indexed by using the following:

 <cfindex action="update"          collection="SnailsAndPuppyDogTails"          key="c:\filestore\whatboyslike"          type="PATH"          extensions=".doc, .xls, .ppt, .pdf"          RECURSE="Yes"> 

The key attribute for type="PATH" is a directory on the server. Each record in the index uses the filename as the key value. The extensions attribute lets you restrict the file types to be indexed in the specified directory. recurse="YES" instructs ColdFusion to work recursively through all subdirectories in the branch of the nominated root directory.

A database query with large text fields could be indexed efficiently using the following:

 <cfquery name="GirlsLike" datasource="dsn"> SELECT * FROM AllThingsNice </cfquery> <cfindex action="update"          collection="SugarAndSpice"          key="Things_ID"          type="CUSTOM"          title="Sugar"          query="GirlsLike"          body="Spice"          custom1="AllThings"          custom2=""> 

In this example, key, title, body, and custom1 are all fields in the collection index that are mapped to a specific column in the query object. key is the unique identifier for the record (the primary key, in this instance); title is a descriptive name, which is not unique; body refers to the document to be searched; and custom1 and custom2 are developer-definable fields that are returned with the search results. In effect, any query object can be used to populate a custom index, including queries generated from <cfpop>, <cfldap>, and <cfquery>.

As application data changes, the Verity collection must be updated in order to synchronize with the information kept in the database or file store. <cfindex>, with both update and delete actions, can be used to update collections one record at a time. This type of action might be coupled with data changes in the application to ensure that the Verity collection is always up to date:

 <!--- updating a document in a file store index ---> <cfindex action="update"          collection="SnailsAndPuppyDogTails"          key="c:\filestore\whatboyslike\escargot.doc"          type="file"> <!--- deleting a single record from query-based index ---> <cfindex action="delete"          collection="SugarAndSpice"          key="1234"> 

The entire index could be purged and made ready for repopulation or, alternatively, cleared and repopulated by using the refresh action. However, these options are not always suitable for regular update procedures and might be very time-consuming on large data stores.

TIP

On a high-traffic site, you might need to schedule downtime for the search interface while the collection is maintained. If you choose to maintain the collection after every data change in the application, the collection might not be available for searching during frequent or prolonged update periods.


From time to time, the Verity collection can become corrupted. If this happens, you can repair the collection by using <cfcollection>:

 <cfcollection action="repair"               collection="SugarAndSpice"> 

In some instances, you might need to delete the collection entirely and reindex.

Optimizing Collections

Verity collections require regular optimization, depending on the frequency of updates to the collection. Rather than performing a complete reindex each time the collection is updated, Verity adds additional files to the collection. This is a faster update mechanism, but it eventually leads to fragmentation of the collection. When a search is performed, each file in the collection is checked for a match. The more files or fragmentation present in an index, the slower the search. Optimization compacts and aggregates the Verity metadata files, which significantly improves the overall performance of the search engine.

Collections can be optimized through ColdFusion Administrator, or programmatically by using the <cfcollection> tag:

 <cfcollection action="optimize"               collection="SugarAndSpice"> 

TIP

Every update leads to further fragmentation of the Verity collection. Fragmented collections take up significantly more disk space and can eventually slow the collection to the point of being unsearchable.

One method of minimizing fragmentation is to reduce the number of transactions being performed on the collection. Rather than updating the collection every time you update your data, you should consider periodic updates that bulk all your data changes into a single submission.

In any event, be sure to optimize regularly!


Indexing XML Documents

Verity can index XML documents. The documents need to have the .xml extension in order for Verity's universal XML filter to process them. XML documents with any extension can be indexed into an XML-only collection by modifying Verity's Style files. The Style files can be used to modify the XML filter to correctly reflect specific XML Schemas. Style files contain configuration parameters and can be modified with a standard text editor.



Macromedia ColdFusion MX 7 Certified Developer Study Guide
Macromedia ColdFusion MX 7 Certified Developer Study Guide
ISBN: 0321330110
EAN: 2147483647
Year: 2004
Pages: 389
Authors: Ben Forta

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net