Lesson 2: Full-Text Indexing and Searching

[Previous] [Next]

SQL Server 7 allows users to issue full-text queries against plain character-based data in SQL Server tables, including searching for words and phrases and multiple forms of a word or phrase. This lesson describes how to use the full-text search capabilities of SQL Server.

After this lesson, you will be able to

  • Describe the Microsoft Search service
  • Set up full-text indexing on tables in a database
  • Populate a full-text catalog
  • Perform basic full-text searches

Estimated lesson time: 75 minutes

Introduction to Full-Text Searching

In earlier versions of SQL Server, the ability to query text data was very limited. Beginning with SQL Server 7, you can query character-based data in tables by using full-text searches. The component of SQL Server 7 that performs full-text querying is the Microsoft Search service.

Microsoft Search Service

The Microsoft Search service is a full-text indexing and search engine. The same engine is used with Internet Information Server, where it is called Index Server. The service can be installed only when SQL Server is installed on Windows NT Server. (It cannot be installed on Windows NT Workstation, Windows 95, or Windows 98.)

Installation

You can install Full-Text Search during the initial install of SQL Server if you perform a Custom Install. If you did not install it during your SQL Server installation, you can install Full-Text Search by running SQL Server Setup. Full-Text Search is not installed when the default installation is chosen. In the Select Components dialog box, clear all components and then check Full-Text Search. If you do not clear all components (including Server Components) before you check Full-Text Search, other components may be installed.

Indexing

The Microsoft Search service operates separately from SQL Server. It doesn't store information in SQL Server databases, tables, or indexes. It simply communicates with SQL Server to perform indexing and searching.

A full-text index and a standard SQL Server index are very different. Full-text indexes are stored in files in a folder structure in the file system, called a catalog. By default, these folders are created under C:\Mssql7\Ftdata. A single catalog can be used for one or more indexes in the same database, but databases cannot share catalogs. You can create many catalogs for a single database.

The Microsoft Search service accepts a request to populate the full-text index of a given table into a full-text catalog after the table is set up for full-text indexing. It processes some or all of the rows in a table and extracts key words from the columns specified for full-text searching. These words are stored in an index in the catalog.

Querying

The Microsoft Search service processes full-text search queries received from the MSSQLServer service. It provides advanced search capabilities such as proximity and linguistic searches in multiple languages.

Full-Text Search with SQL Server 7

Earlier versions of SQL Server support only basic character searches against columns in a database:

  • Searches for a character value equal to, less than, or greater than a character constant
  • Searches for a character value containing a specific string or a wildcard string

In SQL Server 7, the Microsoft Search service enables SQL Server to support complex searches on character or text data in a database or in files outside any database.

Storing Text Data

You can store text data inside or outside the database. The bulk of an organization's text data is usually stored in files of various types, external to any database. You can store from 4 KB to 2 GB of text in a single SQL Server database field with the char, varchar, text, ntext, nchar, or nvarchar data type. This text data, as well as the text data in word processor documents, spreadsheets, and other documents, can all be indexed and searched with the Microsoft Search service.

Querying Text Data

The principal design requirement for full-text indexing and querying is the presence of a single-column unique index on all tables that are registered for full-text search. A full-text index associates words with the unique index key value of each row in SQL Server tables.

When SQL Server receives a query that requires a full-text search, it passes the search criteria to Microsoft Search, which processes the search and returns the key value and a ranking value for each row that contains matches. The MSSQLServer service uses this information to construct the query result set.

Eliminating Noise Words

To prevent the full-text index from becoming bloated with words that do not help the search, extraneous words (known as noise words) such as "a," "and," "is," and "the" are ignored. For example, specifying the phrase "the products ordered during these summer months" is the same as specifying the phrase "products ordered during summer months."

Noise-word lists for many languages are provided and are available in the C:\Mssql7\Ftdata\Sqlserver\Config folder, which is created when you install Full-Text Search on your server.

Administrators can use any text editor to modify the contents of these lists. For example, system administrators at high-tech companies might add the word "computer" to their noise-word list. Modifications to the list have an effect only when the full-text indexes are next populated.

Introduction to Creating Full-Text Indexes

Full-text support for SQL Server 7 data involves two features: the ability to issue queries against character data, and the creation and maintenance of the underlying indexes that facilitate these queries.

When you work with full-text indexes, you must remember the following:

  • Full-text indexes are stored in the file system but are administered through the database.
  • There is only one full-text index per table.
  • The addition of data to full-text indexes, known as population, must be requested through either a schedule or a specific request.
  • One or more full-text indexes within the same database are gathered together into a full-text catalog.
  • It can be beneficial to create separate catalogs for large tables in a database. When you populate full-text indexes, a complete catalog is populated, so creating separate catalogs reduces the amount of data that must be populated at one time.

Maintaining Full-Text Indexes

Before you can implement a full-text search in a given database, you must ensure that the full-text search indexes are populated regularly.

Populating Full-Text Indexes

You can update a full-text index by using one of the following methods:

Full Population

A full population refreshes the full-text catalog indexes for all rows in a table regardless of whether the index entries have changed since the last population. This is typically used when a catalog is first populated.

Incremental Population

An incremental population refreshes the full-text catalog indexes for rows that have changed since the last population. An incremental population automatically results in a full population in the following cases:

  • A table without a timestamp column is enabled for full-text indexing
  • New columns have been enabled for full-text processing since the last population
  • The table schema have been modified in some way since the last population

Updating Full-Text Indexes

Unlike standard relational database indexes, full-text indexes are not modified instantly when values in full-text-enabled columns are updated, when rows are added to full-text-registered tables, or when rows are deleted from full-text-enabled tables. You must start the population process manually or schedule it to occur at regular intervals. Full-text indexes are populated asynchronously because

  • It typically requires significantly more time to update a full-text index than it does to update a standard index.
  • Full-text searches are usually less precise than standard searches, so the need for a dynamically synchronized index is not as great.

Deactivating Full-Text Indexes

You can deactivate the full-text index for a table so that it no longer participates in the population of the full-text catalog. The full-text index metadata remains, and you can reactivate the table.

After you deactivate a full-text index for a particular table, the existing full-text index remains in place until the next full population. This index is not used because SQL Server blocks queries on deactivated tables.

If you reactivate the table and do not repopulate the index, the old index is still available for queries against any remaining (but not new) full-text-enabled columns. Data from deleted columns is matched in queries that specify a search of all full-text columns (*).

You must use the sp_fulltext_table system stored procedure to deactivate the full-text index for a table.

Setting Up Full-Text Search

Before you can issue full-text queries, you must make sure that the Microsoft Search service is running, and you must create full-text indexes on the tables that will be queried.

Starting Microsoft Search Service

You can start and stop the Microsoft Search service in one of the following ways:

  • Use the context menu of the Full-Text Search object in SQL Server Enterprise Manager.
  • Use SQL Server Service Manager and select Microsoft Search.
  • Execute net start mssearch (or net stop mssearch) from a command prompt.

Creating Full-Text Indexes

You can set up and administer full-text features in SQL Server by using the Full-Text Indexing Wizard and the context menus in SQL Server Enterprise Manager, or by using system stored procedures.

IMPORTANT
If you have a database selected in the console tree when you run the wizard, you will be able to work only with that database in the wizard. If you want to be able to select any database while in the wizard, you must run the wizard with your server selected in the console tree.

The following table lists the system stored procedures that are used to set up full-text indexing.

Stored procedure Function
sp_fulltext_database Initializes full-text indexing or removes all full-text catalogs from the current database.
sp_fulltext_catalog Creates or drops a full-text catalog and starts or stops the indexing action for a catalog.

You can create multiple full-text catalogs for each database.

sp_fulltext_table Marks or unmarks a table for full-text indexing.
sp_fulltext_column Specifies whether a particular column of a table participates in full-text indexing.
sp_fulltext_service Changes Microsoft Search service (full-text search) properties and cleans up a full-text catalog on a server.

Getting Information About Full-Text Search

You can get information about full-text search indexes and catalogs by using full-text search system stored procedures and SQL Server Enterprise Manager.

Using System Stored Procedures to Get Information About Indexes

You can use the system stored procedures in the following table to obtain information about full-text indexes.

Stored procedure Function
sp_help_fulltext_catalogs Returns the ID, name, root directory, status, and number of full-text indexed tables for a specified full-text catalog
sp_help_fulltext_tables Returns a list of tables that are enabled for full-text indexing
sp_help_fulltext_columns Returns a list of columns that are enabled for full-text indexing

Using SQL Server Enterprise Manager to Get Information About Catalogs

You can get information about full-text search catalogs in a database by clicking the Full-Text Catalogs object in any database in the console tree and then double-clicking on a catalog in the details pane. The Full-Text Catalog Properties dialog box indicates the following information about the catalog:

  • Name, location, and physical name
  • Status—the current population status of the catalog
  • Item count—the total number of full-text indexed items in the catalog
  • Catalog size, in MB
  • Unique word count—the total number of unique words in the catalog (this excludes noise words, as they are removed before the index is created)
  • Last population date—the date and time that the catalog was last populated

Exercise: Adding Full-Text Search to a Table

In this exercise, you will add full-text search capability to the Employees table in the StudyNwind database, using the Full-Text Indexing Wizard.

  • To set up full-text search using the Full-Text Indexing Wizard

  1. Open SQL Server Enterprise Manager.
  2. In the console tree, expand Databases, expand StudyNwind, and then click Tables.
  3. In the details pane, right-click the Employees table, point to Full-Text Index Table, and then click Define Full-Text Indexing On A Table.
  4. Use the information in the following table to complete the wizard. Accept the defaults for any options that are not specified.
  5. Option Value
    Select a unique index PK_Employees
    Added columns Notes
    Create a new catalog? Checked
    New catalog _ Name Emp_catalog
    Select or Create Population Schedules (Optional) No

    The final dialog box of the wizard confirms that the full-text index for the Employees table has been defined but not populated. You will populate the full-text index in the next exercise.

  6. Open SQL Server Query Analyzer, and log on to the (local) server with Microsoft Windows NT authentication.
  7. Execute the following system stored procedure and confirm that full-text indexing is enabled for the Employees table:
  8.  USE StudyNwind EXEC sp_help_fulltext_tables 

    You will need to scroll the output in the results pane to the right in order to see all of the information about full-text indexing for the Employees table.

Exercise: Populating a Full-Text Index

In this exercise, you will view catalog information and populate the full-text index for the Emp_Catalog catalog, enabling it for full-text searching.

  • To create a full-text index

  1. Switch to SQL Server Enterprise Manager.
  2. In the console tree, click Full-Text Catalogs below the StudyNwind database.
  3. In the details pane, right-click Emp_Catalog, and then click Properties. View the information about the catalog. Note that the population status is currently idle and that the item count is currently 0. Click Cancel to close the Properties dialog box.
  4. Right-click Emp_Catalog, point to Start Population, and then click Full Population to populate the full-text index.
  5. Right-click Emp_Catalog, and then click Properties. View the information about the catalog. If you open the Properties dialog box quickly enough, the population status may show that a full population is in progress. Click Refresh until the status is idle. The item count is now 10, and there are 178 unique words in the catalog (your values may vary). Click Cancel to close the Properties dialog box.

Writing Full-Text Queries

With a full-text query, you can perform advanced searches of text data in tables enabled for full-text searches. Unlike the LIKE operator, which is used to search for character patterns, full-text searches operate on combinations of words and phrases. Full-text searches also weigh query terms and report how well a match scored or ranked against the original search term.

Using Transact-SQL Predicates and Functions

You can use the following Transact-SQL predicates and row-set value functions to write full-text queries:

  • Use the CONTAINS and FREETEXT predicates in any search condition (including a WHERE clause) of a SELECT statement.
  • Use the CONTAINSTABLE and FREETEXTTABLE functions in the FROM clause of a SELECT statement.

Although the Transact-SQL statement used to specify the full-text search condition is the same for both the predicates and the functions, there are major differences in the way they are used. When you work with these Transact-SQL components, consider the following facts and guidelines:

  • CONTAINS and FREETEXT both return a TRUE or FALSE value, so they are typically specified in the WHERE clause of a SELECT statement.
  • CONTAINS and FREETEXT can be used only to specify selection criteria, which SQL Server uses to determine the membership of the result set.
  • CONTAINSTABLE and FREETEXTTABLE both return a table of zero, one, or more rows based on the selection criteria, so they must always be specified in the FROM clause.
  • The table returned by CONTAINSTABLE and FREETEXTTABLE has a column named KEY that contains full-text key values and a column named RANK that contains values between 0 and 1000.
  • The values in the KEY column are the unique key values of the rows that match the selection criteria specified in the full-text search condition.

    The values in the RANK column are used to rank the rows returned according to how well they meet the selection criteria.

  • CONTAINS and CONTAINSTABLE are used to search for precise or "fuzzy" (less precise) matches to single words and phrases, words within a certain distance of one another, or weighted matches.
  • FREETEXT and FREETEXTTABLE match the meaning but not the exact wording of the text in the specified free-text string.

CONTAINS({column | *}, '<contains_search_condition>' ) FREETEXT({column | * }, 'freetext_string') CONTAINSTABLE(table, {column | *}, '<contains_search_condition>') FREETEXTTABLE (table, {column | *}, 'freetext_string') 

Replace column with the name of the full-text indexed column to search, or specify * to indicate that all full-text indexed columns should be searched. Replace table with the name of the table to be searched.

The following query returns the plant_id, common_name, and price for all rows in which the phrase "English Thyme" is present in any of the full-text-enabled columns.

 SELECT plant_id, common_name, price FROM plants WHERE CONTAINS( *, ' "English Thyme" ' ) 

The following query returns rows in which the full-text indexed description column contains text such as "Jean LeDuc has always loved ice hockey" or "Jean Leduc on Ice—Hockey at Its Best."

 SELECT article_id FROM hockey_articles WHERE CONTAINS (description, ' "Jean LeDuc"  AND "ice hockey" ' ) 

The following example uses a FREETEXT predicate against a column named description.

 SELECT * FROM news_table WHERE FREETEXT (description, ' "The Fulton County Grand Jury said Friday  an investigation of Atlanta's recent primary election produced no  evidence that any irregularities took place." ') 

Here the rows containing text in the description column that matches words, phrases, and meaning within the specified free text will be returned.

Exercise: Writing and Executing Full-Text Queries

In this exercise, you will write and execute full-text queries with SELECT statements that use the CONTAINS and FREETEXT predicates.

  • To write and execute full-text queries

  1. Open or switch to SQL Server Query Analyzer.
  2. Type and execute three SELECT statements that select the lastname, title, hiredate, and notes columns from the employees table. Use three different search terms on the employees.notes column, as given in the following SELECT statements:
  3.  USE StudyNwind SELECT lastname, title, hiredate, notes FROM employees WHERE CONTAINS(notes, '"sales management"') USE StudyNwind SELECT lastname, title, hiredate, notes FROM employees WHERE CONTAINS(notes, '"sales" AND "management"') USE StudyNwind SELECT lastname, title, hiredate, notes FROM employees WHERE CONTAINS(notes, '"sales" NEAR "management"') 

  4. Write a SELECT statement with the CONTAINS predicate that selects the lastname, title, hiredate, and notes columns from the employees table when the employees.notes column contains any form of the word graduate.
  5.  USE StudyNwind SELECT lastname, title, hiredate, notes FROM employees WHERE CONTAINS(notes, 'FORMSOF(INFLECTIONAL, "graduate")') 

  6. Write a SELECT statement with the FREETEXT predicate that selects the lastname, title, hiredate, and notes columns from the employees table when the employees.notes column contains the words cold and toast.
  7.  USE StudyNwind SELECT lastname, title, hiredate, notes FROM Employees WHERE FREETEXT (notes, 'cold toast') 

Lesson Summary

Earlier versions of SQL Server support only basic character searches against columns in a database. SQL Server 7, however, supports full-text queries.

Full-text queries can perform advanced searches of text data in tables enabled for full-text searches. With a full-text query, you can search on combinations of words and phrases. Full-text searches also weigh query terms and report how well a match scored or ranked against the original search term.



Microsoft Press - Microsoft SQL Server 7. 0 System Administration Training Kit
Microsoft SQL Server 7.0 System Administration Training Kit
ISBN: 1572318279
EAN: 2147483647
Year: 1999
Pages: 100

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net