Full-Text Searches


You hook into the Microsoft Search Server by using four special SQL commands: CONTAINS , CONTAINSTABLE , FREETEXT , and FREETEXTTABLE .

To learn how to use these functions, you need to go back to the r sum example you looked at earlier in this chapter. Suppose you wanted to find out how many r sum s were in your table that had the word "SQL" somewhere.

CONTAINS()

If you didn't have full-text search working, you would write a query like this:

 select count(*)  from resume where res_text like '%SQL%' 

If your table has many r sum s in it, this is going to take a while to complete because the entire table (including all text pages) will be scanned. To perform the same search with Search Server, use CONTAINS() :

 select count(*)  from resume where contains(*, 'SQL') 

The first argument to CONTAINS() is the name of the full-text indexed column to search. Using the wildcard searches all columns in all tables in the query that are enabled for full-text indexing.

Here is the full syntax for CONTAINS() :

 CONTAINS      ( {  column  * } , '< contains_search_condition >'     ) < contains_search_condition > ::=         { < simple_term >          < prefix_term >          < generation_term >          < proximity_term >          < weighted_term >         }          { ( < contains_search_condition > )         { AND  AND NOT  OR } < contains_search_condition > [ ...  n  ]         } < simple_term > ::=  word  "  phrase  " < prefix term > ::=     { "  word  * "  "  phrase  * " } < generation_term > ::=     FORMSOF ( INFLECTIONAL  THESAURUS , < simple_term > [ ,...  n  ] ) < proximity_term > ::=     { < simple_term >  < prefix_term > }     { { NEAR  ~ } { < simple_term >  < prefix_term > } } [ ...  n  ] < weighted_term > ::=     ISABOUT         ( { {                 < simple_term >                  < prefix_term >                  < generation_term >                  < proximity_term >                 }             [ WEIGHT (  weight_value  ) ]             } [ ,...  n  ]         ) 

CONTAINS() works a lot like a Web search engine, so if you're comfortable with Boolean expressions in Web search engines, you already have most of what you need to get CONTAINS() running. Just as with search engines on the Internet, you can use quotes around exact phrases. For example, to find someone with SQL Server experience, but exclude people with Oracle experience, you could use this query against the r sum table:

 select *  from resume where contains(res_text, '"SQL Server" and not Oracle') 

You can use FORMSOF() with the inflectional argument to match different word forms of the terms in your query. This will match singular and plural word forms and different verb tenses. In this example, I would get anyone who included the word "query" or "queries" in their r sum text:

 select *  from resume where contains(*, 'formsof(inflectional, "query")') 

The Thesaurus argument in formsof() is largely undocumented. As of Service Pack 2, the Thesaurus option was implemented, but disabled. Microsoft did not supply the actual Thesaurus data. The list of words used by the Thesaurus, like the noise words, is stored in c:\Program Files\Microsoft SQL Server\FTDATA\SQLServer\Config. The Thesaurus is stored in XML format. Without real data, Thesaurus works just like a simple expression in CONTAINS .

FREETEXT()

FREETEXT() is a newer and looser ("fuzzy") version of CONTAINS() . Use FREETEXT() when you want to match rows that are near the meaning of the phrases you suggest, but do not necessarily contain specific words or word forms in your expression. FREETEXT() can provide you with more utility, matching plurals, and other verb forms of a root word.

You can pass sentences or even entire paragraphs to FREETEXT() for it to find similar words and phrases.

This following example would match anything containing the word SQL, server, admin, basic, or visual. FREETEXT() does not use phrase matching or Boolean expression evaluation. The query shown next would be broken down into a query that looked for each of the non-noise words appearing in the sentence , and their varying inflectional word forms. It would identify nouns and multiword noun phrases and search for these, as well.

 select *  from   resume where  freetext(*, 'Our company requires someone with experience in SQL Server, Visual Basic, and a desire to be part of a strong team.') 

CONTAINSTABLE() and FREETEXTTABLE()

These special forms of CONTAINS and FREETEXT are used in a SQL query as part of the FROM expression. They operate like their basic functions, but they return a table object that lists two columns: KEY and RANK .

The KEY column can be joined to the queries table to retrieve the actual text values; these functions do not return text. The RANK column returns a numerical value describing how closely the row matched the search expression, from 0 to 1,000.

NOTE

Because KEY is an ANSI SQL reserved word, you must enclose it in brackets or double quotes when you reference it as the column name returned by CONTAINSTABLE() or FREETEXTTABLE() .

The next example searches for r sum s in the r sum table that have SQL Server, Oracle, or Sybase experience. A weighting value is assigned to each of these expressions, with SQL Server experience worth 5 times as much as Sybase, and 10 times as much as Oracle. A r sum that has more than one of these will have a higher rank. The query will then order the results by rank, with the most impressive r sum at the top:

 select RANK, res_text  from containstable(         resume, *,        'isabout(              "SQL Server" weight (1.0),                Oracle weight (0.1),                Sybase weight (0.2)                )'         ) a        join resume b               on a.[KEY] = b.resume_id order by RANK desc RANK        res_text ----------- --------------------------------------------------------- 120         SQL Server Sybase Oracle 113         SQL Server Sybase 106         SQL Server Oracle 99          SQL Server, ASP, Perl, leadership, grace under pressure 99          SQL Server 18          Sybase Oracle 12          Sybase 6           Oracle 

Limiting Maximum Hits

When using the containstable() and freetexttable() functions, you can optionally specify the maximum number of rows to return from a query. Just as with any large resultset, it can take a long time to get back all the data for a general query.

Here is the previous example, limited to the first three entries:

 select RANK, res_text  from containstable(         resume, res_text,        'isabout(              "SQL Server" weight (1.0),                Oracle weight (0.1),                Sybase weight (0.2)                )',         3         ) a        join resume b                on a.[KEY] = b.resume_id order by RANK desc RANK        res_text ----------- --------------------------------------------------------- 120         SQL Server Sybase Oracle 113         SQL Server Sybase 106         SQL Server Oracle 

You could get the same answer by specifying SELECT TOP 3 in the SQL statement, but this would be much less efficient. Using the optional fourth parameter will instruct the Search service to return immediately upon finding the maximum number of rows specified.



Microsoft SQL Server 2000 Unleashed
Microsoft SQL Server 2000 Unleashed (2nd Edition)
ISBN: 0672324679
EAN: 2147483647
Year: 2002
Pages: 503

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net