Managing and Populating Catalogs | Beginning Transact-SQL With SQL Server 2000 and 2005

Even though the task of managing full-text indexes belongs to the Microsoft Search Service, rather than SQL Server, all of the management work can be performed within the SQL Server management tools. These tasks are similar in SQL Server 2000 and SQL Server 2005. I'll start with the SQL Server 2000 Enterprise Manager and then move on to the SQL Server 2005 Management Studio. I'll briefly show you how to create and populate a full-text catalog. For detailed information on managing full-text indexes and catalogs, please refer to Professional SQL Server 2000 Programming and Professional SQL Server 2005 from Wrox Press.

SQL Server 2000

A few different methods are used to create a catalog and indexes. If you are using Enterprise Manager, you can use one of these methods to create a new full-text catalog:

Right-click a database and choose New Full-Text Catalog to open the New Full-Text Catalog dialog window (see Figure 11-2).

Figure 11-2:
From the Tools menu, choose Full-Text Indexing to open the Full-Text Index Wizard.
Right-click a table and, from the pop-up menu, select Full-Text Index Table Define Full-Text Indexing. If a catalog doesn't currently exist, you will be prompted to create one and then to define an index for the table.

This launches the Full-Text Indexing Wizard dialog shown in Figure 11-3.

image from book
Figure 11-3:

Enter a name for the new catalog and finish the wizard, accepting all other default settings, as shown in Figure 11-4.

image from book
Figure 11-4:

After a full-text catalog has been created, the console tree will have an icon added enabling you to manage full-text indexing for the database. Indexes can be created in a number of ways. One simple technique is to right-click a table icon in Enterprise Manager and then select Full-Text Index Table from the menu, choosing the Define Full-Text Indexing on a Table. . . option, as shown in Figure 11-5.

image from book
Figure 11-5:

SQL Server 2005

Probably the most significant enhancement to full-text indexing in SQL Server 2005 is that the database engine can update indexes as data changes. This can make full-text indexing behave more like standard indexing and greatly reduce data latency. Just keep in mind that this feature can have a significant impact on overall server performance. This may not be a wise option in a busy transactional database environment unless you have a very capable server.

Implementing catalogs and indexes in the SQL Server Management Studio is very similar to using Enterprise Manager. The Full-Text Indexing Wizard contains several pages, and I'm not going to show them all. Begin by defining a new catalog for the AdventureWorks2000 database. Under the database, expand the Storage node and right-click Full-Text Catalogs. From the menu, select New Full-Text Catalog. . ., as shown in Figure 11-6.

image from book
Figure 11-6:

The New Full-Text Catalog dialog is used to name, specify a file location, and set options for the new catalog file, as shown in Figure 11-7.

image from book
Figure 11-7:

After creating a catalog, indexes can be created for tables. To create a new index, right-click a table icon and choose Full-Text Index Define Full-Text Index. . . , as shown in Figure 11-8.

image from book
Figure 11-8:

When the Full-Text Indexing Wizard opens, navigate past the opening page. Because the wizard was launched from the Product table, it polls the table for a list of indexes. A unique index must exist in order to build a full-text index. Accept the default selection, which is the primary key for this table, as shown in Figure 11-9.

image from book
Figure 11-9:

Next, a list of columns is displayed. These are candidates for full-text indexing. Check any columns that you would like to have included in the full-text index, as demonstrated in Figure 11-10.

image from book
Figure 11-10:

If you would like to have SQL Server track and automatically update this full-text index as data is modified, leave the Change Tracking option set to Automatically, as shown in Figure 11-11.

image from book
Figure 11-11:

The page gives you the option to select an existing catalog or to create a new catalog. Because I created a catalog to store full-text indexes for this database, select the existing catalog, as shown in Figure 11-12.

image from book
Figure 11-12:

The next page allows you to manage full-text catalog population schedules. Creating a schedule invokes the SQL Server Agent service, which must be running for this option to function properly. Click Next to skip this page.

The following page displays summary information (see Figure 11-13). Using this dialog, you can review your selections and options. Because no settings have yet been applied, you can use the Next and Previous buttons to navigate to any page to make changes. Click the Finish button to apply your selections and build the next full-text index.

image from book
Figure 11-13:

The final wizard page, shown in Figure 11-14, displays the progress of each step, as it is applied by the wizard.

image from book
Figure 11-14:

Later in this chapter, I'm going to use some examples of full-text queries on the ProductReview table. Rather than using the wizard, create this index using the following script:

 CREATE FULLTEXT INDEX  ON ProductReview(Comments, ReviewerName)  KEY Index PK_ProductReview_ProductReviewID

An optional statement can be used to explicitly control whether SQL Server tracks changes and whether it automatically populates and updates the index. You may want to disable change tracking to conserve server resources or to give yourself more control over this process.

To explicitly populate the index using change tracking, add this line to the end of the prior script:

 WITH Change_Tracking Auto

To explicitly turn off change tracking, use this option:

 WITH Change_Tracking OFF

In case you want to manually populate a full-text index that has not been set up for automatic population, use this option of the sp_fulltext_table system stored procedure:

 sp_fulltext_table 'ProductReview', 'start_full'

This stored procedure can be used in place of the Create FullText Index expression used previously and includes several related maintenance options.

Full-Text Query Expressions

Full-text indexing in SQL Server extends the Transact-SQL feature set by adding four languages predicates:

CONTAINS
FREETEXT
CONTAINSTABLE
FREETEXTTABLE

You'll recall that a predicate is simply a functional statement that yields a Boolean result. Predicates always return a true or false value. Functionally, these are really only two predicate statements with each having an alternate implementation that returns a SQL Server table object — rather than a standard result set — from the query. A predicate is simply an extension to the SQL language, used in a WHERE clause, that provides a conduit from SQL Server to the Microsoft Search Service. As far as you are concerned, you are working with SQL and communicating to the database engine. The reality is that these statements take your request outside of SQL Server and make requests against the search service. The only real evidence of this is in the way you must pass string values.

Quotes in Quotes

This is an interesting idiosyncrasy of the full-text query syntax. As you know, when passing text string values to Transact-SQL, these values are encapsulated in single quotes. This is still the case when using full-text predicates, however, these string values are then passed from SQL to the search service, which requires that values are passed within double quotes. This means that when you need to pass values to a full-text query expression (if the value contains spaces) you must pass a double-quoted value within single quotes, like this:

'"My Value"'

This may seem a little strange but let me explain why this is necessary. Transact-SQL requires literal string values to be passed in single quotes. Before SQL Server reroutes the statement to the search service, it strips off the single quotes, passing values in the proper format for the search service, which requires literal values that include spaces to be enclosed in double quotes. When using logical operators within a full-text predicate call, you may need to pass multiple quoted values between operators, all of which are enclosed within single quotes for SQL to handle them, and each value in double quotes, as follows:

'"My Value" OR "Your Value"'

Examples of the entire call syntax follow shortly, but I want to make sure you're comfortable with this requirement to pass double-quoted values (required by the Microsoft Search Service) within single quotes (required by SQL Server).

The CONTAINS Predicate

The CONTAINS predicate lets you find and return rows where one or any combination of indexed column values contains a specified value, or optionally a form of a specified word. The features of this predicate are as follows:

Search criterion can apply to values in one or any number of specified column(s) contained in the full-text index.
Search criterion can apply to values in all columns contained in the full-text index.
The columns' text includes a word or string of characters located anywhere within the text. Matching text can include wildcards indicating that a word starts with, ends with, or contains a string of characters.
Match may be based on a form of a specified word. For example, the text may include a plural, singular, different gender form, or different tense of the word.

The full-text indexing engine includes a vast thesaurus of words in different forms and inflections. This supports multiple languages if different language packs have been installed. To be able to apply language rules to the text, the engine needs to know what language to use. The language parameter for all predicates will accept either the language alias (friendly name) or the LCID, an integer value used internally. Full-text indexing recognizes the languages listed in the following table.

Alias	LCID
Arabic	1025
Brazilian	1046
British English	2057
Bulgarian	1026
Croatian	1050
Czech	1029
Danish	1030
Dutch	1043
English	1033
Estonian	1061
Finnish	1035
French	1036
German	1031
Greek	1032
Hungarian	1038
Italian	1040
Japanese	1041
Korean	1042
Latvian	1062
Lithuanian	1063
Norwegian	2068
Polish	1045
Portuguese	2070
Romanian	1048
Russian	1049
Simplified Chinese	2052
Slovak	1051
Slovenian	1060
Spanish	3082
Swedish	1053
Thai	1054
Traditional Chinese	1028
Turkish	1055

If the language parameter is omitted, the language will be derived from the column, table, or database.

I'll start with a simple example. I'm interested in returning all Product records where any indexed column contains the value "Black." The first parameter to this function-like statement indicates the indexed columns I want to include in the search. The asterisk (*) represents all available columns. The second parameter is my search criteria:

 SELECT * FROM Product  WHERE CONTAINS(*, '"Black"')

The results are shown in Figure 11-15.

image from book
Figure 11-15:

As you can see, rows are returned where the word "Black" is contained in both the Name and Color columns. However, you may be wondering why the word "Black" was found in the middle of a field value when I didn't use any wildcard characters. Something to get used to when using full-text queries are the differences in behavior from this and the SQL LIKE operator. Full-text queries match whole words anywhere within a field without using wildcards. Wildcard matching is performed to match a substring, or part of a word. For example, I'll look for any rows that contain text beginning with the letters "crank":

 SELECT * FROM Product  WHERE CONTAINS(*, '"crank*"')

Note that the wildcard character isn't the percent symbol, %, as it is in Transact-SQL. It's the asterisk, *. The results are shown in Figure 11-16.

image from book
Figure 11-16:

You can also specify a list of columns you want to include in the search by specifying a comma-delimited column list within parentheses:

 SELECT * FROM Product  WHERE CONTAINS((ProductNumber, Name, Color), '"Black"')

The full-text indexing engine includes an internal thesaurus of words and their variations. This enables the CONTAINS predicate to match different forms of a word. This might include past-, future-, or present-tense, or different gender inflections. For example, performing a full-text search on the Product table for the word "tour" returns records containing the word "touring," as shown in Figure 11-17.

image from book
Figure 11-17:

 SELECT * FROM Product  WHERE CONTAINS(*, 'FORMSOF(Inflectional, "Tour")')

Weighting Values

You can affect the outcome of word matching, and relative ranking of rows, by designating relative weight values for different words. A weight value is a numeric value between 0.0 and 1.0, accurate to one decimal position. Because these values are actually passed as a text string along with the rest of the search criteria, SQL Server doesn't really see this as a numerical type. These values are used only for relative comparison, so it's not necessary to make them add up to anything in particular. A weighted-value word list is passed to the ISABOUT() function, within the CONTAINS predicate expression:

ISABOUT (<word> weight (.75), <word> weight (.25))

The result of this weighting will affect whether or not some rows are included in the result set but may not otherwise be apparent when using the CONTAINS predicate. This is apparent, however, in the value of the calculated Rank column returned by the CONTAINSTABLE and FREETEXTTABLE predicates.

Ranked Results

Internally, the CONTAINS predicate calculates a qualifying ranking value for each row, based on exact and approximate word matching, logical operators, and explicit weighting value factors. Because the CONTAINS and FREETEXT predicates are only used to qualify selected rows returned in the result set, these techniques can't expose the ranking of each row. The CONTAINSTABLE and FREETEXTTABLE predicates do create a new result set, returned as a SQL table object. A new column, called Rank, is added to the result with the relative ranking value of each row.

The CONTAINSTABLE Predicate

Functionally, this is the CONTAINS predicate, wrapped by functionality that returns a SQL table object. Two additional columns are added to the result. The Key column is just a duplicate of the full-text index key column, which was specified when the full-text index was created. The Rank column appears, as I mentioned previously.

 SELECT ProductID, Name, ProductNumber, Color, Rank FROM Product INNER JOIN CONTAINSTABLE(Product, * , 'ISABOUT (Black weight (.2), Blue weight (.8))') AS ConTbl ON Product.ProductID = ConTbl.[Key] ORDER BY Rank DESC

Take a look at another example. Full-text queries are ideal for searching large volumes of text. The first thing I'll do is create a full-text index on the ProductReview table. This table contains a Comments column used to hold verbose text. After populating the index, the following query can be executed. Note the weight values for the two words:

 SELECT Comments, Rank FROM ProductReview INNER JOIN CONTAINSTABLE(ProductReview, Comments , 'ISABOUT (terrible weight(.9), advertised weight(.1))') AS ConTbl  ON ProductReview.ProductReviewID = ConTbl.[Key] ORDER BY Rank DESC

When the query is executed, a rank value is calculated based on these words found in the Comments column and the relative weight values. Note the values in the Rank column shown in Figure 11-18.

image from book
Figure 11-18:

Now I'll change the weight values (reversing .9 and .1) and execute the query again:

 SELECT Comments, Rank FROM ProductReview INNER JOIN CONTAINSTABLE(ProductReview, Comments,  'ISABOUT (terrible weight(.1), advertised weight(.9))') AS ConTbl  ON ProductReview.ProductReviewID = ConTbl.[Key] ORDER BY Rank DESC

The FREETEXT Predicate

Can a computer really understand what you want rather than simply give you exactly what you asked it for? The FREETEXT predicate attempts to do just that — to understand the meaning of a phrase or sentence. It does this by breaking a phrase down into individual words and then using the full-text indexing thesaurus to match all forms of these words, applying language rules. It may choose to return text that only contains forms of some of these words. As each row is considered for selection, an algorithm calculates a relative ranking value, used to qualify each record against the matching phrase.

The FREETEXT predicate takes few parameters, and the only optional parameter is the language. As with the CONTAINS predicate, if omitted, the language will be derived from the database. The ranking is not exposed in the result, and the order of records is unaffected by the ranking.

 SELECT * FROM Product WHERE FREETEXT (*, 'Yellow road bike')

Not only are records returned where indexed columns contain the words "yellow," "road," and "bike," but those records that contain any one of these words or forms of these words are also returned, as shown in Figure 11-19.

image from book
Figure 11-19:

Logical Operators

Multiple words or text strings can be specified applying three different forms of logic, as explained in the following table.

Operator	Logic
AND	Criteria on both sides of the operator must match. If two values were provided with the AND operator, a single column value in each qualifying row must match both of the values.
OR	Criteria on either side of the operator must match. If two values were provided with the OR operator, a single column value in each qualifying row must match any provided value.
NEAR	Like the AND operator, both values must match text in a single column value for qualifying rows.

The FREETEXTTABLE Predicate

Like the CONTAINSTABLE predicate, FREETEXTTABLE is functionally the same as the FREETEXT predicate, but it returns a table with ranking values. Using the same technique as before, this table can be joined with the base table to return matching rows and the ranking values.

In this example, I've used a phrase that doesn't match any text exactly but several of the words may be found in the column text:

 SELECT Comments, Rank FROM ProductReview INNER JOIN  FREETEXTTABLE(ProductReview, Comments , 'mountain biking is new for me') AS FtTbl ON ProductReview.ProductReviewID = FtTbl.[Key]  ORDER BY Rank DESC

The result returns two rows with one row ranked significantly higher than the other, as shown in Figure 11-20.

Figure 11-20:

The goal of free-text matching is to loosen the matching rules and provide some level of flexibility. Inevitably, this will return some rows that are simply not all that similar to the search text. To make the FREETEXT or FREETEXTTABLE predicate behave in a more predictable manner, you can force it to match the text exactly as it is presented by encapsulating the entire search text in double quotes:

 SELECT Comments FROM ProductReview  WHERE FREETEXT (Comments, '"new to mountain biking"')

Only one row matches this text exactly, as shown in Figure 11-21.

image from book
Figure 11-21: