ProblemFULLTEXT searches for short words return no rows. SolutionChange the indexing engine's minimum word length parameter. DiscussionIn a text like the KJV, certain words have special significance, such as "God" and "sin." However, if you perform FULLTEXT searches on the kjv table for those words, you'll observe a curious phenomenonboth words appear to be missing from the text entirely: mysql> SELECT COUNT(*) FROM kjv WHERE MATCH(vtext) AGAINST('God'); +----------+ | COUNT(*) | +----------+ | 0 | +----------+ mysql> SELECT COUNT(*) FROM kjv WHERE MATCH(vtext) AGAINST('sin'); +----------+ | COUNT(*) | +----------+ | 0 | +----------+ One property of the indexing engine is that it ignores words that are "too common" (that is, words that occur in more than half the rows). This eliminates words such as "the" or "and" from the index, but that's not what is going on here. You can verify that by counting the total number of rows, and by using SQL pattern matches to count the number of rows containing each word:[*]
mysql> SELECT COUNT(*) AS 'total verses', -> COUNT(IF(vtext LIKE '%God%',1,NULL)) AS 'verses containing "God"', -> COUNT(IF(vtext LIKE '%sin%',1,NULL)) AS 'verses containing "sin"' -> FROM kjv; +--------------+-------------------------+-------------------------+ | total verses | verses containing "God" | verses containing "sin" | +--------------+-------------------------+-------------------------+ | 31102 | 4118 | 1292 | +--------------+-------------------------+-------------------------+ Neither word is present in more than half the verses, so sheer frequency of occurrence doesn't account for the failure of a FULLTEXT search to find them. What's really happening is that by default, the indexing engine doesn't include words less than four characters long. The minimum word length is a configurable parameter, which you can change by setting the ft_min_word_len server variable. For example, to tell the indexing engine to include words as short as three characters, add a line to the [mysqld] group of the /etc/my.cnf file (or whatever option file you put server settings in): [mysqld] ft_min_word_len=3 After making this change, restart the server. Next, rebuild the FULLTEXT index to take advantage of the new setting:[
mysql> REPAIR TABLE kjv QUICK; Finally, try the new index to verify that it includes shorter words: mysql> SELECT COUNT(*) FROM kjv WHERE MATCH(vtext) AGAINST('God'); +----------+ | COUNT(*) | +----------+ | 3878 | +----------+ mysql> SELECT COUNT(*) FROM kjv WHERE MATCH(vtext) AGAINST('sin'); +----------+ | COUNT(*) | +----------+ | 389 | +----------+ That's better! But why do the MATCH( ) queries find 3,878 and 389 rows, whereas the earlier LIKE queries find 4,118 and 1,292 rows? That's because the LIKE patterns match substrings and the FULLTEXT search performed by MATCH( ) matches whole words. |