Full-Text Searching


Often, a database query contains a search where you're not exactly sure what you're looking for but you know it might be like this or that. You may have already seen this while using LIKE in a WHERE conditional. An alternative is the full-text search, which has two benefits over LIKE:

  • You can search using multiple words.

  • Full-text searches are much, much faster.

There are four gotchas when it comes to full-text searches:

  • Full-text searches are case-insensitive.

  • If you search using a term (or terms) that appears in more than half of the records, no rows will be returned, because that search will not be specific enough. One workaround is to use Boolean mode, discussed later in this chapter.

  • There is a minimum word length for search terms, by default four characters. Shorter terms will never be found, even when you know they are present.

  • Very popular words, called stopwords, are ignored.

Full-text searching is most important in situations where people will be entering keywords that must be searched against specific fields in a table, such as search engines. I'll first show you how to create the proper FULLTEXT index. Then I'll run through some simple full-text searches. Finally, you'll learn about the full-text Boolean mode, which allows for much more precise searching.

Tips

  • MySQL comes with several hundred stopwords already defined. These are part of the application's source code. You can also specify your own custom stopword file to use as of MySQL 4.0.10.

  • The minimum keyword lengthfour characters by defaultis a configuration setting you can change in MySQL.


Creating a FULLTEXT index

Before performing a full-text search, you need to prepare the database by creating an index of type FULLTEXT on the column or columns to be used in the searches. Currently, you can only create FULLTEXT searches on MyISAM tables.

To create a FULLTEXT index when defining a table, you would do something like this:

CREATE TABLE tablename ( col1 TEXT, col2 TEXT, FULLTEXT (col1, col2) )


If the table already exists, you can add a FULLTEXT index using an ALTER query:

ALTER TABLE tablename ADD FULLTEXT (columns)


I'll modify the expenses table (in the accounting database) to add a FULLTEXT index on the expense_description column.

To create a FULLTEXT index:

1.

Log in to the mysql client as a user that can access and modify the accounting database.

2.

Select the accounting database.

USE accounting;

3.

Make sure the expenses table is well populated (Figure 10.9).

Figure 10.9. You'll need to have quite a bit of data in your table in order to best use full-text searches.


SELECT expense_description FROM    expenses;


It doesn't really matter what SQL you use during these steps as long as you add records with good descriptions to the expenses table. The more records a table has, the more useful and accurate full-text indexing and searching becomes.

4.

Create a full-text index (Figure 10.10).

Figure 10.10. A FULLTEXT index is added to the expenses table.


ALTER TABLE expenses

ADD FULLTEXT (expense_description);

Before I run any full-text searches on a table, I must create a full-text index on the column or columns involved.

5.

Confirm the table's structure by viewing its CREATE syntax (Figure 10.11).

Figure 10.11. The SHOW CREATE TABLE query indicates what command would be used to make the table.


SHOW CREATE TABLE expenses \G

You've already seen a couple of SHOW commands by now, so hopefully this one isn't too surprising. The SHOW CREATE TABLE tablename command asks MySQL to print out the SQL command used to create (or re-create) the named table. Using the \G trick makes the results easier to view.

Tip

  • Inserting records into tables with FULLTEXT indexes can be much slower because of the complex index that's required. Similarly, adding a FULLTEXT index to an existing table can take some time, depending upon how much data is already there.


Performing basic FULLTEXT searches

Once you've established a FULLTEXT index on a column, you can start querying against it, using the MATCH and AGAINST keywords. The syntax for using full-text searching is

SELECT * FROM tablename WHERE MATCH (column) AGAINST ('string')


The result of the query will be the rows returned in order of relevance from most to least. In other words, the rows in which string most matches the values in column will be listed first.

If you want to match against multiple words, you can do so, separating the words by spaces:

SELECT * FROM tablename WHERE MATCH    (column) AGAINST ('word1 word2')


With a query like this, rows that contain both words will rank higher than those that contain only one or the other (which would still qualify as a match). This behavior can be tweaked using the Boolean mode, discussed next in the chapter.

To view the relevance of a row returned by a match, you would select it:

SELECT somecolumn, MATCH (column) AGAINST ('string') FROM tablename


FULLTEXT Indexes, Revisited

Your FULLTEXT index must be created on the same column or combination of columns that you use in your SELECT queries. If you create the index on one column, you can use only that one column in your query. If you create the index on two columns, you must use exactly both those columns in your queries:

ALTER TABLE tablename ADD FULLTEXT (col1, col2) SELECT * FROM tablename WHERE MATCH (col1, col2) AGAINST ('string')



To use full-text searching:

1.

Log in to the mysql client and select the accounting database, if you have not already.

USE accounting;

2.

Make sure the expenses table is well populated.

If you are paying close attention, you'll note that this step was included in the preceding sequence. It's important, though. The more populated a table is, the more useful a full-text search is. In fact, a (non-Boolean mode) full-text search won't even work if you have fewer than three records in the table!

3.

Do a full-text search using the word visual (Figure 10.12).

Figure 10.12. A simple full-text search quickly returns all records that contain the word visual within the expense description.


SELECT expense_id, expense_description FROM expenses WHERE MATCH (expense_description) AGAINST ('visual');


This query will return the expense_id and expense_description columns wherever visual has a positive relevance in the expense_description values. In other words, any record that contains the word visual will be returned.

If you have different data in your table, you'll want to change the searched-for word accordingly.

4.

Run a full-text search using the words visual and guide (Figure 10.13).

Figure 10.13. This search returns every record that contains either visual or guide but ranks records with both higher.


SELECT expense_id, expense_description, MATCH (expense_description) AGAINST ('visual guide') AS rel FROM expenses WHERE MATCH (expense_description) AGAINST ('visual guide') \G


This query differs from that in Step 3 in two ways. First, I'm also selecting the MATCH...AGAINST value so that the calculated relevance number is displayed. Second, I've included a second term to match. Those records with both terms will score higher than those with just one.

Tips

  • You can weed out some results by using the MATCH...AGAINST() value in a WHERE clause:

    SELECT expense_id, expense_description, MATCH (expense_description) AGAINST ('visual guide') FROM expenses WHERE MATCH    (expense_description) AGAINST    ('visual guide') > .001

  • Remember that if a FULLTEXT search returns no records, this means either that no matches were made or that over half of the records match (when not using Boolean mode).

  • For sake of simplicity, I wrote all of the queries in this section as simple SELECT statements. You can certainly use FULLTEXT searches within joins or more complex queries.


Performing Boolean FULLTEXT searches

You can take your full-text searching one step further by using its Boolean mode (as of MySQL version 4.0.1). In Boolean mode the search terms can be preceded by special characters (Table 10.1) to indicate how their presence should be weighted with regard to relevancy.

Table 10.1. These characters are specifically used to affect the importance of terms in full-text searches.

Special Boolean Mode Characters

Character

Meaning

Example

Matches

+

Word is required

+punk rock

punk is required and rock is optional

-

Word must not be present

+punk -rock

punk is required and rock cannot be present

""

A literal phrase

"punk rock"

Occurrences of the phrase punk rock are weighted

<

Less important

<punk +rock

rock is required, and punk is less significant

>

More important

>punk +rock

rock is required, but punk is more significant

()

Creates groups

(>punk roll) +rock

rock is required, both punk and roll are optional, but punk is weighted more

~

Detracts from relevance

+punk ~rock

punk is required, and the presence of rock devalues the relevance (but rock is not excluded)

*

Allows for wildcards

+punk +rock*

punk and rock are required, but rocks, rocker, rocking, etc., are counted


SELECT columns FROM tablename WHERE MATCH (column) AGAINST ('+word1 word2' IN BOOLEAN MODE) SELECT * FROM tablename WHERE MATCH (column) AGAINST ('+database -mysql' IN BOOLEAN MODE)


In the second example, a match will be made if the word database is found and mysql is not present. Alternatively, the tilde (~) is used as a milder form of the minus sign, meaning that the keyword can be present in a match, but such matches should be considered less relevant.

The wildcard character (*) matches variations on a word, so cata* matches catalog, catalina, and so on. Two operators explicitly state what keywords are more (>) or less (<) important. Finally, you can use double quotation marks to hunt for exact phrases. Parentheses can be used to make subexpressions.

The following query would look for records with the phrase Web develop with the word html being required and the word JavaScript detracting from a match's relevance:

SELECT * FROM tablename WHERE MATCH(column) AGAINST('>"Web develop" +html ~JavaScript' IN BOOLEAN MODE)


When using Boolean mode, there are several differences as to how FULLTEXT searches work:

  • If a keyword is not preceded by an operator, the word is optional but a match will be ranked higher if it is present.

  • Results will be returned even if more than 50 percent of the records match the search.

  • The results are not automatically sorted by relevance.

Because of this last fact, you'll also want to sort the returned records by their relevance, as I'll demonstrate in the next sequence of steps. One important rule that remains with Boolean searches is that the minimum word length (four characters by default) still applies. So trying to require a shorter word using a plus sign (+SQL) still won't work.

Control Flow Functions

One advanced topic that I don't cover in this book is the various control flow functions MySQL has to offer.

The basic function is an IF conditional, which acts like a ternary operator:

IF (condition, return_this_if_true, return_this_if_false)


If the condition is true, the second argument is returned. Otherwise, the third argument is returned. This next example returns either odd or even, depending upon whether or not the remainder of dividing a column's value (e.g., 3) by 2 is equal to 1:

SELECT IF( (some_num_col % 2) = 1, 'odd', 'even');


Along with the IF conditional, there are an IFNULL, a NULLIF, and a CASE. Each is really just a specific type of conditional. See the MySQL manual for syntax and examples of these.


To use Boolean mode:

1.

Log in to the mysql client and select the accounting database, if you have not already.

USE accounting;

2.

Do a full-text search requiring both the words visual and guide (Figure 10.14).

Figure 10.14. Both keywords are required, resulting in one less match (compare with Figure 10.13).


SELECT expense_id, expense_description, MATCH (expense_description) AGAINST ('+visual +guide' IN BOOLEAN MODE) AS rel FROM expenses WHERE MATCH (expense_description) AGAINST ('+visual +guide' IN BOOLEAN MODE) ORDER BY rel DESC \G


This query adds two new features. First, the IN BOOLEAN MODE text is added, as well as the plus signs, indicating required words. Second, the results are ordered by their relevance, using an alias.

3.

Do a full-text search requiring both the words visual and guide while stressing the word quickpro (Figure 10.15).

Figure 10.15. The relevance of the results was altered by placing extra importance on quickpro (compare with Figure 10.14).


[View full width]

SELECT expense_id, expense_description, MATCH (expense_description) AGAINST ('+visual +guide >quickpro' IN BOOLEAN MODE) AS rel FROM expenses WHERE MATCH (expense_description) AGAINST ('+visual +guide >quickpro' IN BOOLEAN MODE) ORDER BY rel DESC \G


I add >quickpro to my search terms to give preference to records containing this word.

Tip

  • When you select the MATCH...AGAINST value, make sure that its terms are exactly the same as those in the WHERE clause.





MySQL Visual QuickStart Guide Serie  .Covers My SQL 4 and 5
MySQL, Second Edition
ISBN: 0321375734
EAN: 2147483647
Year: 2006
Pages: 162
Authors: Larry Ullman

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net