Full-Text Searches


SQL Server and other relational database management systems (RDBMS) have always had the capability to search data stored as text using basic search criteria, but have never really had the capability to search data using more sophisticated forms of search criteria.

Full-Text Search is a completely separate program that runs as a service, namely Microsoft Search Service or MSSearch. Full-Text Search can be used in conjunction with all sorts of information from all the various MS BackOffice products. The full-text catalogs and indexes are not stored in a SQL Server database. They are stored in separate files managed by the Microsoft Search service.

Full-Text Indexes

Full-text indexes are special indexes that efficiently track the words you're looking for in a table. They help in enabling special searching functions that differ from regular indexes. Full-text indexes are not automatically updated, and they reside in a storage space called the full-text catalog.

When a full-text index is created, you can perform wildcard searches (full-text search) that locate words in close proximity. All full-text indexes are by default placed in a single full-text catalog. Each SQL Server at its apex can store 256 full-text catalogs.

The full-text catalog files are not recovered during a SQL Server recovery. They also cannot be backed up and restored using the T-SQL BACKUP and RESTORE statements. The full-text catalogs must be resynchronized separately after a recovery or restore operation. The full-text catalog files are accessible only to the Microsoft Search service and the Windows NT or Windows 2000 system administrator.

To enable full-text searches, you can run the Full-Text Indexing Wizard, which enables you to manage and create full-text indexes. Note that full-text indexes may be created only on columns that contain only text. Full-text indexes are not automatically updated, thereby bringing up the need to automate the process of updating by setting a job or performing a manual administrative task.

Administering Full-Text Features

SQL Server enables you to easily manage full-text features with the Enterprise Manager and many stored procedures. These are summarized in Table 10.2.

Table 10.2. Full-Text Stored Procedures

Stored Procedure

Brief Description

sp_fulltext_database

Initializes full-text indexing or removes all full-text catalogs from the current database.

sp_fulltext_table

Marks or unmarks a table for full-text indexing.

sp_fulltext_column

Specifies whether a particular column of a table participates in full-text indexing.

sp_fulltext_catalog

Creates or drops a full-text catalog.

sp_fulltext_service

Changes Microsoft Search Service properties.

To find syntaxes for these stored procedures, look through the index in Books Online. If you noticed, each one of the stored procedures' syntaxes has an Action argument that determines what action the procedure performs .

Creating Full-Text Indexes

An easy way to add full-text indexes is to use the Full-Text Indexing Wizard. Step by Step 10.10 shows you how to add full-text index functionality on the pub_info table in the Pubs database. In Step by Step 10.10, you are going to create a full-text index and full-text catalog, and then populate that catalog.

STEP BY STEP

10.10 Using the Full-Text Wizard

  1. Initiate the Full-Text Wizard by selecting the Pubs database on the left pane and then selecting Full-Text Indexing from the Tools menu.

  2. The first screen describes the aims of the wizard. Having read them, click Next to enter the Database Selection screen. In this, you select the table you want to index. The table that you select can be among the tables found in the Pubs database. In this case, select the dbo.pub_info table and click Next.

  3. This screen advises you that there must already be a unique index associated with the table if you are going to continue. The wizard locates one for you, UPKC_pubinfo . This index is the unique Primary Key for the pub_info table.

  4. After clicking Next, you can choose the columns on which you want full-text indexing to be enabled. In this case, select the check box next to pr_info (because it is a column that contains text data) and click Next.

  5. Before using full-text searches, you must make a full-text catalog or choose from an existing one listed. You don't have any existing catalogs, so type Catalog Pubs in the New Catalog Name box as shown in Figure 10.10, and click Next to proceed.

    Figure 10.10. Creating a new catalog.

    graphics/10fig10.gif

  6. This screen enables you to choose the population schedule type. You can schedule repopulation for a whole catalog at once or a table at a time. Choose New Catalog Schedule to repopulate this catalog.

  7. The next screen enables you to specify options to populate the new catalog. You can choose the time when SQL Server repopulates the new catalog you have created: either when the Server agent starts or at a time specified. Type Populate Pubs as the name and leave all other settings with their defaults as shown in Figure 10.11. Click OK and then Next.

    Figure 10.11. Scheduling population of Catalog Pubs.

    graphics/10fig11.jpg

  8. The final screen shows you a summary of the decisions you made. Click Finish to end the wizard.

After completing Step by Step 10.10, you have to populate your full-text catalog. To do this, open SQL Server Enterprise Manager from the Start menu.

Expand Server Group , expand your server, expand databases, and then expand Pubs. Click Full-Text Catalogs, and then right-click Catalog Pubs. Select Start Full Population.

You can now search for words using the FREETEXT , CONTAINS , CONSTAINSTABLE , and FREETEXTTABLE statements, which unleash search power beyond that of the LIKE predicate. To search for the word Book in the pr_info column using FREETEXT , you would enter code as shown in Figure 10.12.

Figure 10.12. Making sure full-text indexing succeeded.

graphics/10fig12.gif

Executing Full-Text Searches

After full-text indexes have been created and a catalog is populated , you can proceed with actually performing a search. A full-text search is nothing more than a SELECT statement, in the WHERE clause, with the optional addition of four useful operators:

  • FREETEXT

  • FREETEXTTABLE

  • CONTAINS

  • CONTAINSTABLE

These are each discussed in detail in the following sections.

FREETEXT Searches

FREETEXT searches columns containing values that match the words given. It is somewhat similar to the LIKE keyword. A practical example is given in Step by Step 10.11. The syntax for FREETEXT is as follows :

 FREETEXT ( { column  * } , 'freetext_string' ) 

Where

  • Column is a full-text index-enabled column, to which the search is directed.

  • * specifies that all columns are enabled for full-text search and that the search is directed to these columns.

  • 'freetext_string' specifies the search criteria. This has to be data of type char , varchar , nchar , nvarchar , or any other character data. This argument is required.

STEP BY STEP

10.11 Executing Full-Text Searches Using FREETEXT

  1. Open the Query Analyzer from the Start menu in Programs, SQL Server 2000.

  2. To search for all records that contain the word publishing in the pr_info column, regardless of case, execute the following:

     USE Pubs GO SELECT pr_info FROM pub_info WHERE FREETEXT (pr_info, 'PuBLiShInG') 
FREETEXTTABLE Searches

FREETEXTTABLE searches columns based on a search condition and returns a resultset. Unlike the FREETEXT search statement, FREETEXTTABLE can be specified in the FROM clause of a SELECT statement. Additionally, if rows with highest occurrences of related words are to be chosen , the FREETEXTTABLE supplies an additional argument top_N_by_Rank . The syntax for FREETEXTTABLE is as follows.

 FREETEXTTABLE ( table , { column  * } , 'freetext_string' [ , top_n_by_rank ] ) 

Where

  • Table is the name of the table enabled for full-text searching.

  • Column is a full-text index-enabled column, to which the search is directed.

  • * specifies that all columns are enabled for full-text search and that the search is directed to these columns.

  • 'Freetext_string' specifies the search condition, a character string, phrase, or even sentence .

  • Top_n_by_rank specifies that only the n number of rows having the most occurrences of words from the search condition are to be returned.

  • Word values in searches may not be variables .

CONTAINS Searches

CONTAINS is a statement that is similar to both the LIKE and FREETEXT predicates. Its purpose is to search for strings, phrases, or even sentences that occur throughout records. It provides extreme flexibility for accurate searching, hence its favor over the LIKE and FREETEXT keywords. The full syntax of CONTAINS predicate is as follows:

 ( { column  * } , '< contains_search_condition >' ) 

Where

  • Column is a full-text index-enabled column, to which the search is directed.

  • * specifies that all columns are enabled for full-text search and that the search is directed to these columns.

  • '< contains_search_condition >' specifies how to filter the data, and is similar to the 'freetext_string' of the FREETEXT predicate. This argument can further be subdivided into smaller syntaxes, listed in the syntax that follows.

The preceding explanation is the general outline of the CONTAINS predicate. The Contains_Search_Condition argument has its own querying structure, as follows:

 < contains_search_condition > ::=         { < simple_term >          < prefix_term >          < generation_term >          < proximity_term >          < weighted_term >         }          { ( < contains_search_condition > )         { AND  AND NOT  OR } < contains_search_condition > [ ...n ]         } 

And each one of these sub-arguments can be broken down into their most basic form:

 < simple_term > ::=      word  " phrase " < prefix term > ::=      { "word * "  "phrase * " } < generation_term > ::=      FORMSOF ( INFLECTIONAL , < simple_term > [ ,...n ] ) < proximity_term > ::=     { < simple_term >  < prefix_term > }     { { NEAR  ~ } { < simple_term >  < prefix_term > } } [ ...n ] < weighted_term > ::=     ISABOUT         ( { {                 < simple_term >                  < prefix_term >                  < generation_term >                  < proximity_term >                 }             [ WEIGHT ( weight_value ) ]             } [ ,...n ]         ) 

Where

  • Simple_term searches for an exact match for a word or phrase. If a word is given with the exclusion of spaces (database, for example), it can be written without quotation marks. A word specified with the inclusion of spaces (as in "SQL Server Database") needs to be written with the inclusion of double quotation marks (" ").

  • Prefix term is used to specify a root worda prefixthat returns all words beginning with it. To signify this, use an asterisk (*), which acts as zero, one, or more characters . If a phrase is specified, it considers each word a root declaration. Therefore, executing the following query lists titles containing "computer", "complete" and "comparison", and so on.

     SELECT * FROM titles      WHERE CONTAINS (title, '"comp*"') 
  • Generation_term is used when forms of simple_term may vary. INFLECTIONAL is used alongside generation_term when inflectional forms of words are to be returned. To distinguish that you are specifying an argument for generation_form , use the FORMSOF keyword.

    NOTE

    Use the Correct Quotes When Searching If a phrase or word that includes an asterisk is used as a search condition without being bound by two double quotes (" "), SQL Server assumes that the asterisk acts as a character itself rather than as a wildcard. For instance, SELECT * FROM titles WHERE CONTAINS (title, 'comp*') would return any word having a prefix comp* .


  • Proximity_term specifies that words in proximity to each other should be returned. You accomplish this with the help of the NEAR keyword, using NEAR as an operator similar to AND . For example, the following query searches and returns results only when the words IBM and Aptiva are close to one another.

     SELECT * FROM computer_names  WHERE CONTAINS (computer_names, 'IBM NEAR Aptiva') 
  • 'Weighted_term' specifies different words or phrases that are to be matched to return a resultset. Each of these can be referred by a weighted value that gives a sort of "rank" to each word.

     WEIGHT ( weight_value ) 

    This optionally gives each term mentioned in the weighted_term argument a decimal value between 0 and 1 inclusive to show the importance of each word. To distinguish WEIGHT from other arguments, use the ISABOUT keyword.

    Therefore, to return all rows in the ProductName column (in the Products table of the Northwind database) that have Hot (first priority), Sauce (second priority), and Cranberry (third priority), execute the following:

    [View full width]
     
    [View full width]
    SELECT * FROM Products WHERE CONTAINS (ProductName, 'ISABOUT (Hot WEIGHT (0.9), Sauce WEIGHT (0.5), CRANBERRY graphics/ccc.gif WEIGHT (0.2))'

Step By Step 10.12 illustrates the ability to search through text with the CONTAINS option.

STEP BY STEP

10.12 Searching Using CONTAINS

  1. Open the Query Analyzer from the Start menu.

  2. To search for different forms of the word Publish in the pr_info column, execute the following:

     USE Pubs GO SELECT pr_info FROM pub_info WHERE CONTAINS(pr_info , ' FORMSOF (INFLECTIONAL, publish) ') 
CONTAINSTABLE Searches

CONTAINSTABLE works much the same as FREETEXTTABLE works and has similar arguments to that of the CONTAINS statement. The syntax for CONTAINSTABLE is as follows:

 CONTAINSTABLE ( table, { column  * }, ' < contains_search_condition > '     [ , top_n_by_rank ] ) 

Where

  • 'table' is the name of the table that has been set for full-text searching.

  • Column is a full-text index-enabled column, to which the search is directed.

  • * Specifies that all columns are enabled for full-text search and that the search is directed to these columns.

  • ' < contains_search_condition > ' is what you want to search for. The ' < contains_search_condition > ' argument regulates the same rules as the ' < contains_search_condition > ' argument in the CONTAINS statement.

  • top_n_by_rank specifies that the n number of matches, in descending order, are to be returned.

Internally, SQL Server sends the search condition to the Microsoft Search service. The Microsoft Search service finds all the keys that match the full-text search condition and returns them to SQL Server. SQL Server, and then uses the list of keys to determine which table rows are to be processed .



MCSE Training Guide (70-229). Designing and Implementing Databases with MicrosoftR SQL ServerT 2000 Enterprise Edition
MCSE Training Guide (70-229). Designing and Implementing Databases with MicrosoftR SQL ServerT 2000 Enterprise Edition
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 228

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net