Using Full-Text Searching


In order to perform full-text searches, the columns to be searched must be indexed and constantly re-indexed as data changes. MySQL handles all indexing and re-indexing automatically after table columns have been appropriately designated.

After indexing, SELECT can be used with Match() and Against() to actually perform the searches.

Enabling Full-Text Searching Support

Generally, full-text searching is enabled when a table is created. The CREATE TABLE statement (which will be introduced in Chapter 21) accepts a FULLTEXT clause, which is a comma-delimited list of the columns to be indexed.

The following CREATE statement demonstrates the use of the FULLTEXT clause:

Input

CREATE TABLE productnotes (   note_id    int           NOT NULL AUTO_INCREMENT,   prod_id    char(10)      NOT NULL,   note_date datetime       NOT NULL,   note_text  text          NULL ,   PRIMARY KEY(note_id),   FULLTEXT(note_text) ) ENGINE=MyISAM;

Analysis

We'll look at the CREATE TABLE statement in detail in Chapter 21. For now, just note that this CREATE TABLE statement defines table productnotes and lists the columns that it is to contain. One of those columns is named note_text, and it is indexed by MySQL for full-text searching as instructed by the clause FULLTEXT(note_text). Here FULLTEXT indexes a single column, but multiple columns may be specified if needed.

Once defined, MySQL automatically maintains the index. When rows are added, updated, or deleted, the index is automatically updated accordingly.

FULLTEXT may be specified at table creation time, or later on (in which case all existing data would have to be immediately indexed).

Tip

Don't Use FULLTEXT When Importing Data Updating indexes takes timenot a lot of time, but time nonetheless. If you are importing data into a new table, you should not enable FULLTEXT indexing at that time. Rather, first import all of the data, and then modify the table to define FULLTEXT. This makes for a much faster data import (and the total time needed to index all data will be less than the sum of the time needed to index each row individually).


Performing Full-Text Searches

After indexing, full-text searches are performed using two functions: Match() to specify the columns to be searched and Against() to specify the search expression to be used.

Here is a basic example:

Input

SELECT note_text FROM productnotes WHERE Match(note_text) Against('rabbit');

Output

+-------------------------------------------------------------------+ | note_text                                                         | +-------------------------------------------------------------------| | Customer complaint: rabbit has been able to detect trap, food     | | apparently less effective now.                                    | | Quantity varies, sold by the sack load. All guaranteed to be      | | bright and orange, and suitable for use as rabbit bait.           | +-------------------------------------------------------------------+

Analysis

The SELECT statement retrieves a single row, note_text. For the WHERE clause, a full-text search is performed. Match(note_text) instructs MySQL to perform the search against that named column, and Against('rabbit') specifies the word rabbit as the search text. As two rows contained the word rabbit, those two rows were returned.

Note

Use Full Match() Specification The value passed to Match() must be the same as the one used in the FULLTEXT() definition. If multiple columns are specified, all of them must be listed (and in the correct order).


Note

Searches Are Not Case Sensitive Full-text searches are not case sensitive, unless BINARY mode (not covered in this chapter) is used.


The truth is that the search just performed could just as easily have used a LIKE clause, as seen here:

Input

SELECT note_text FROM productnotes WHERE note_text LIKE '%rabbit%';

Output

+-------------------------------------------------------------------+ | note_text                                                         | +-------------------------------------------------------------------| | Quantity varies, sold by the sack load. All guaranteed to be      | | bright and orange, and suitable for use as rabbit bait.           | | Customer complaint: rabbit has been able to detect trap, food     | | apparently less effective now.                                    | +-------------------------------------------------------------------+

Analysis

This SELECT retrieves the same two rows, but the order is different (although that may not always be the case).

Neither of the two SELECT statements contained an ORDER BY clause. The latter (using LIKE) returns data in no particularly useful order. But the former (using full-text searching) returns data ordered by how well the text matched. Both rows contained the word rabbit, but the row that contained the word rabbit as the third word ranked higher than the row that contained it as the twentieth word. This is important. An important part of full-text searching is the ranking of results. Rows with a higher rank are returned first (as there is a higher degree of likelihood that those are the ones you really wanted).

To demonstrate how ranking works, look at this example:

Input

SELECT note_text,        Match(note_text) Against('rabbit') AS rank FROM productnotes;

Output

+-------------------------------------------------+-----------------+ | note_text                                       | rank            | +-------------------------------------------------+-----------------+ | Customer complaint: Sticks not individually     |               0 | | wrapped, too easy to mistakenly detonate all    |                 | | at once. Recommend individual wrapping.         |                 | | Can shipped full, refills not available. Need   |               0 | | to order new can if refill needed.              |                 | | Safe is combination locked, combination not     |               0 | | provided with safe. This is rarely a problem    |                 | | as safes are typically blown up or dropped by   |                 | | customers.                                      |                 | | Quantity varies, sold by the sack load. All     | 1.5905543170914 | | guaranteed to be bright and orange, and         |                 | | suitable for as rabbit bait.                    |                 | | Included fuses are short and have been known to |               0 | | detonate too quickly for some customers. Longer |                 | | fuses are available (item FU1) and should be    |                 | | recommended.                                    |                 | | Matches not included, recommend purchase of     |               0 | | matches or detonator (item DTNTR).              |                 | | Please note that no returns will be accepted if |               0 | | safe opened using explosives.                   |                 | | Multiple customer returns, anvils failing to    |               0 | | drop fast enough or falling backwards on        |                 | | purchaser. Recommend that customer considers    |                 | | using heavier anvils.                           |                 | | Item is extremely heavy. Designed for dropping, |               0 | | not recommended for use with slings, ropes,     |                 | | pulleys, or tightropes.                         |                 | | Customer complaint: rabbit has been able to     | 1.6408053837485 | | detect trap, food apparently less effective     |                 | | now.                                            |                 | | Shipped unassembled, requires common tools      |               0 | | (including oversized hammer).                   |                 | | Customer complaint: Circular hole in safe floor |               0 | | can apparently be easily cut with handsaw.      |                 | | Customer complaint: Not heavy enough to         |               0 | | generate flying stars around head of victim.    |                 | | If being purchased for dropping, recommend      |                 | | ANV02 or ANV03 instead.                         |                 | | Call from individual trapped in safe plummeting |               0 | | to the ground, suggests an escape hatch be      |                 | | added. Comment forwarded to vendor.             |                 | +-------------------------------------------------+-----------------+

Analysis

Here Match() and Against() are used in the SELECT instead of the WHERE clause. This causes all rows to be returned (as there is no WHERE clause). Match() and Against() are used to create a calculated column (with the alias rank) which contains the ranking value calculated by the full-text search. The ranking is calculated by MySQL based on the number of words in the row, the number of unique words, the total number of words in the entire index, and the number of rows that contain the word. As you can see, the rows that do not contain the word rabbit have a rank of 0 (and were therefore not selected by the WHERE clause in the previous example). The two rows that do contain the word rabbit each have a rank value, and the one with the word earlier in the text has a higher rank value than the one in which the word appeared later.

This helps demonstrate how full-text searching eliminates rows (those with a rank of 0), and how it sorts results (by rank in descending order).

Note

Ranking Multiple Search Terms If multiple search terms are specified, those that contain the most matching words will be ranked higher than those with less (or just a single match).


As you can see, full-text searching offers functionality not available with simple LIKE searches. And as data is indexed, full-text searches are considerably faster, too.

Using Query Expansion

Query expansion is used to try to widen the range of returned full-text search results. Consider the following scenario. You want to find all notes with references to anvils in them. Only one note contains the word anvils, but you also want any other rows that may be related to your search, even if the specific word anvils is not contained within them.

This is a job for query expansion. When query expansion is used, MySQL makes two passes through the data and indexes to perform your search:

  • First, a basic full-text search is performed to find all rows that match the search criteria.

  • Next, MySQL examines those matched rows and selects all useful words (we'll explain how MySQL figures out what is useful and what is not shortly).

  • Then, MySQL performs the full-text search again, this time using not just the original criteria, but also all of the useful words.

Using query expansion you can therefore find results that might be relevant, even if they don't contain the exact words for which you were looking.

Note

MySQL Version 4.1.1 or Later Only Query expansion functionality was introduced in MySQL 4.1.1, and can therefore not be used in prior versions.


Here is an example. First, a simple full-text search, without query expansion:

Input

SELECT note_text FROM productnotes WHERE Match(note_text) Against('anvils');

Output

+-------------------------------------------------------------------+ | note_text                                                         | +-------------------------------------------------------------------+ | Multiple customer returns, anvils failing to drop fast enough or  | | falling backwards on purchaser. Recommend that customer considers | | using heavier anvils.                                             | +-------------------------------------------------------------------+

Analysis

Only one row contains the word anvils, so only one row is returned.

Here is the same search, this time using query expansion:

Input

SELECT note_text FROM productnotes WHERE Match(note_text) Against('anvils' WITH QUERY EXPANSION);

Output

+-------------------------------------------------------------------+ | note_text                                                         | +-------------------------------------------------------------------+ | Multiple customer returns, anvils failing to drop fast enough or  | | falling backwards on purchaser. Recommend that customer considers | | using heavier anvils.                                             | | Customer complaint: Sticks not individually wrapped, too easy to  | | mistakenly detonate all at once. Recommend individual wrapping.   | | Customer complaint: Not heavy enough to generate flying stars     | | around head of victim. If being purchased for dropping, recommend | | ANV02 or ANV03 instead.                                           | | Please note that no returns will be accepted if safe opened using | | explosives.                                                       | | Customer complaint: rabbit has been able to detect trap, food     | | apparently less effective now.                                    | | Customer complaint: Circular hole in safe floor can apparently be | | easily cut with handsaw.                                          | | Matches not included, recommend purchase of matches or detonator  | | (item DTNTR).                                                     | +-------------------------------------------------------------------+

Analysis

This time seven rows were returned. The first contains the word anvils and is thus ranked highest. The second row has nothing to do with anvils, but as it contains two words that are also in the first row (customer and recommend) it was retrieved, too. The third row also contains those same two words, but they are further into the text and further apart, and so it was included, but ranked third. And this third row does indeed refer to anvils (by their product name).

As you can see, query expansion greatly increases the number of rows returned, but in doing so also increases the number of returns that you might not actually want.

Tip

The More Rows the Better The more rows in your table (and the more text within those rows), the better the results returned when using query expansion.


Boolean Text Searches

MySQL supports an additional form of full-text searching called boolean mode. In Boolean mode you may provide specifics as to

  • Words to be matched

  • Words to be excluded (if a row contained this word it would not be returned, even though other specified words were matched)

  • Ranking hints (specifying which words are more important than others so they can be ranked higher)

  • Expression grouping

  • And more

Tip

Useable Even Without a FULLTEXT Index Boolean mode differs from the full-text search syntax used thus far in that it may be used even if no FULLTEXT index is defined. However, this would be a very slow operation (and the performance would degrade further as data volume increased).


To demonstrate what IN BOOLEAN MODE does, here is a simple example:

Input

SELECT note_text FROM productnotes WHERE Match(note_text) Against('heavy' IN BOOLEAN MODE);

Output

+-------------------------------------------------------------------+ | note_text                                                         | +-------------------------------------------------------------------+ | Item is extremely heavy. Designed for dropping, not recommended   | | for use with slings, ropes, pulleys, or tightropes.               | | Customer complaint: Not heavy enough to generate flying stars     | | around head of victim. If being purchased for dropping, recommend | | ANV02 or ANV03 instead.                                           | +-------------------------------------------------------------------+

Analysis

This full-text search retrieves all rows containing the word heavy (there are two of them). The keywords IN BOOLEAN MODE are specified, but no boolean operators are actually specified and so the results are just as if boolean mode had not been specified.

Note

IN BOOLEAN MODE Behaves Differently Although the results in this example are the same as they would be without IN BOOLEAN MODE, there is an important difference in behavior (even if it did not manifest itself in this particular example). I'll point these out in the use notes later in this chapter.


To match the rows that contain heavy but not any word beginning with rope, the following can be used:

Input

SELECT note_text FROM productnotes WHERE Match(note_text) Against('heavy rope*' IN BOOLEAN MODE);

Output

+-------------------------------------------------------------------+ | note_text                                                         | +-------------------------------------------------------------------+ | Customer complaint: Not heavy enough to generate flying stars     | | around head of victim. If being purchased for dropping, recommend | | ANV02 or ANV03 instead.                                           | +-------------------------------------------------------------------+

Analysis

This time only one row is returned. Again, the word heavy is matched, but this time rope* instructs MySQL to explicitly exclude any row that contains rope* (any word beginning with rope, including ropes, which is why one of the rows was excluded).

Note

Code Change Needed in MySQL 4.x If you are using MySQL 4.x, the previous example might not have returned any rows at all. This is the result of a bug in the processing of the * operator. To use this example in MySQL 4.x, use -ropes instead of -rope* (exclude ropes instead of any word beginning with rope).


You have now seen two full-text search boolean operators: - excludes a word and * is the truncation operator (think of it as a wildcard used at the end of a word). Table 18.1 lists all of the supported boolean operators.

Table 18.1. Full-Text Boolean Operators

Privilege

Description

+

Include, word must be present.

-

Exclude, word must not be present.

>

Include, and increase ranking value.

<

Include, and decrease ranking value.

()

Group words into subexpressions (allowing them to be included, excluded, ranked, and so forth as a group).

~

Negate a word's ranking value.

*

Wildcard at end of word.

""

Defines a phrase (as opposed to a list of individual words, the entire phrase is matched for inclusion or exclusion).


Here are some more examples to demonstrate the use of some of these operators:

Input

SELECT note_text FROM productnotes WHERE Match(note_text) Against('+rabbit +bait"' IN BOOLEAN MODE);

Analysis

This search matches rows that contain both the words rabbit and bait.

Input

SELECT note_text FROM productnotes WHERE Match(note_text) Against('rabbit bait' IN BOOLEAN MODE);

Analysis

Without operators specified, this search matches rows that contain at least one of rabbit or bait.

Input

SELECT note_text FROM productnotes WHERE Match(note_text) Against('"rabbit bait"' IN BOOLEAN MODE);

Analysis

This search matches the phrase rabbit bait instead of the two words rabbit and bait.

Input

SELECT note_text FROM productnotes WHERE Match(note_text) Against('>rabbit <carrot' IN BOOLEAN MODE);

Analysis

Match both rabbit and carrot, increasing the rank of the former and decreasing the rank of the latter.

Input

SELECT note_text FROM productnotes WHERE Match(note_text) Against('+safe +(<combination)' IN BOOLEAN MODE);

Analysis

This search matches the words safe and combination, lowering the ranking of the latter.

Note

Ranked, but Not Sorted In boolean mode, rows will not be returned sorted descending by ranking score.


Full-Text Search Usage Notes

Before finishing this chapter, here are some important notes pertaining to the use of full-text searching:

  • When indexing full-text data, short words are ignored and are excluded from the index. Short words are defined as those having three or fewer characters (this number can be changed if needed).

  • MySQL comes with a built-in list of stopwords, words that are always ignored when indexing full-text data. This list can be overridden if needed. (Refer to the MySQL documentation to learn how to accomplish this.)

  • Many words appear so frequently that searching on them would be useless (too many results would be returned). As such, MySQL honors a 50% ruleif a word appears in 50% or more rows, it is treated as a stopword and is effectively ignored. (The 50% rule is not used for IN BOOLEAN MODE).

  • Full-text searching never returns any results if there are fewer than three rows in a table (because every word is always in at least 50% of the rows).

  • Single quote characters in words are ignored. For example, don't is indexed as dont.

  • Languages that don't have word delimiters (including Japanese and Chinese) will not return full-text results properly.

  • As already noted, full-text searching is only supported in the MyISAM database engine.

Note

No Proximity Operators One feature supported by many full-text search engines is proximity searching, the ability to search for words that are near each other (in the same sentence, in the same paragraph, or no more than a specific number of words apart, and so on). Proximity operators are not yet supported by MySQL full-text searching, although this is planned for a future release.





MySQL Crash Course
MySQL Crash Course
ISBN: 0672327120
EAN: 2147483647
Year: 2004
Pages: 214
Authors: Ben Forta

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net