MyISAM Tables


Many people use MySQL for years without discovering that it offers different table types. These people are using MyISAM tables because this has been the default in all recent versions.

MyISAM tables offer very fast but not transaction-safe storage. They provide high performance in most situations, even if the designer makes mistakes, and in the hands of a skilled administrator, they can handle massive and/or busy databases.

The following code will create a MyISAM table:

 
 create table article (   articleID int not null auto_increment primary key,   title varchar(255),   body text ); 

The final line could optionally have been

 
 ) type=MyISAM; 

but would produce the same result.

MyISAM tables can be one of three types: dynamic, static, or compressed. A table automatically becomes dynamic or static depending on the definition of its columns . Compressed tables must be deliberately created with the myisampack tool.

Tables with fixed-length rows will be created as static tables, and tables with variable-length rows will be created as dynamic tables. How can we tell whether a table has fixed- or variable-length rows?

The char and numeric types all have a fixed size. The size of varchar , text , and blob columns can vary with the size of their contents. A table with only char and numeric columns will be created as a static table, but a table containing any varchar , text , or blob columns will be dynamic.

In the section "Full-Text Searching on MyISAM Tables," we will create an example table called article . This will be created as a dynamic table because it contains a varchar column and a text column. The storage requirement for each row in the table will therefore vary depending on the amount of data in each of these fields.

There are a number of advantages to a static table. It is faster to search than a dynamic table or a compressed table. It is very easy for the database to retrieve a particular record based on an index when each record is at a particular offset from the start of the file. It is very easy to cache. It is less likely to suffer serious corruption in the event of a crash ”the repair facility can usually recover all rows except the damaged one.

The disadvantage to static tables is that forcing real data to fit into fixed-size columns nearly always wastes disk space. This may be a price you are willing to pay for data that varies only a little in size, such as people's names , but are unwilling to pay for data that varies a great deal in size. If you decide that nearly all employee surnames will be less than 80 characters , you may or may not be willing to waste 75 bytes each time you store a Smith.

Dynamic tables need more complex management within MySQL. It is not as straightforward a task for the engine to cache, find, or repair records. This is partly just because they vary in size, but it is also because they can become fragmented . If a row is modified and becomes larger, part of its data will remain at the original location, and part will be stored as a new fragment elsewhere in the file. This means that a segment of a file that has been cached by the operating system cannot be guaranteed to contain all parts of a row. Corruption may also be harder to fix because if fragments or links become lost, it will not be obvious which parts belong to which rows.

To repair or defragment a MyISAM table, you can use the command-line tool myisamchk or the MySQL command REPAIR TABLE . (This is covered in more detail in Chapter 13, "Administering Your Database.") To defragment but not repair, you can use the MySQL command OPTIMIZE TABLE . (This is covered in more detail in Chapter 18, "Optimizing Your Database.")

Compressing MyISAM Tables

Although tables become static or dynamic without your specific request (but in response to your design decisions), tables are not automatically compressed. To compress a table, you need to use the command-line program myisampack. (There is a version of this for pure ISAM tables, should you be using them, which is called pack_isam.)

Compression sounds like a positive thing, but it makes sense only for some applications because compressed tables become read-only. If you need to alter, update, or insert data in the table, you need to uncompress the entire table, make your changes, and then recompress the table.

The compression performed by myisampack includes a mixture of true compression (Huffman coding) and a set of optimizations aimed at shrinking columns, such as converting types to smaller types and converting columns to enums. Because each record is compressed separately, there is only a small overhead to be paid when decompressing a record. This may even be counterbalanced on slow devices by the reduction in data that needs to be read from disk.

Full-Text Searching on MyISAM Tables

One feature that currently comes only with MyISAM tables is full-text searching and indexing. Normal indexes are very good at finding rows where a value in the table matches a given value, but it is common to want to search for words or strings within a block of text. This is where full-text searching comes in handy.

The following SQL will create a MyISAM table with a full-text index:

 
 create table article (   articleID int not null auto_increment primary key,   title varchar(255),   body text,   fulltext (title,body) ); 

The following query will retrieve any records containing the word 'merger' :

 
 select title from article where match (title,body) against ('merger'); 

More complicated searches are supported. The following query will retrieve records containing any of the words 'merge' , 'acquisition' , 'acquire ', or 'takeover' .

 
 select title from article where match (title,body) against ('merge acquisition acquire takeover'); 

Note that we are matching any record that contains at least one of the words. We are not searching for the string or for a record containing every word listed. We can do both of these types of searches using the IN BOOLEAN MODE modifier, which we will come to later.

We needed to search for 'acquire' and 'acquisitions' separately because MySQL does not currently support stemming. Stemming is a technique implemented in many other full-text search systems that recognizes sets of words as having a common stem word. For example, 'acquire' is the stem of many words such as 'acquires' , ' acquired ' , and 'acquisition' .

Each match found is assigned a relevance value, and the results are automatically sorted into relevance order. You may want to see the relevance scores for records. The following query will retrieve an unsorted list of scores for all records. Any records with a score of zero have no similarity and will not be retrieved.

 
 select title, match (title,body) against ('merge acquisition acquire takeover') as relevance from article; 

Note that relevance is not a keyword. It is just an alias for match(title,body) against ('merge acquisition acquire takeover') . We have included it so that the output will be tidier.

More usefully, the following query will retrieve article titles and scores for matched documents. Because we have the MATCH condition in the WHERE clause, the results will be sorted, and unrelated rows will be ignored.

 
 select title, match (title,body) against ('merge acquisition acquire takeover') as relevance from article where match (title,body) against ('merge acquisition acquire takeover'); 

There are certain words you cannot search for. To improve performance, some words are excluded from indexes or are ignored when searching.

Short words are not indexed. By default, words with fewer than four characters are ignored. For some installations, most famously Slashdot.org, this is a problem because three-letter acronyms are often an important part of the content in technical material. You can change this limit by altering the variable ft_min_word_len , but you will need to regenerate your indexes.

Stop words are used by full-text indexes. A stop word is a word with no semantic value. Generally, these are common words that are important for sentence construction, but are unlikely to be important parts of the content. Words like 'the' , 'and' , 'then' , and 'soon' are necessary, but are not usually useful to search for. As well as providing standard lists, MySQL allows you to specify your own list of stop words for each human language you are indexing.

One thing to note about full-text indexing is that because it is a complex task, it has some very real performance limits. When your MySQL tables get large (say more than 1,000,000 rows), full-text search performance slows down. For small applications this should not be a problem, but for larger applications you should keep this fact in mind.

Words that are common in your data are not used when searching. If your table contains company newsletter articles for Acme PTY Ltd., it is likely that many articles would contain the word 'Acme' . Searching for this would generate many results, which is not usually a good thing. If 50% or more of your records contain a word, that word is taken to have no value when calculating relevance.

Boolean Full-Text Search

You can exert greater control over the search by using the IN BOOLEAN MODE modifier.

The following query will match only records than contain the word 'linux' and the string "Open Source" , but not the word 'desktop' . The words 'Java' and 'Oracle' are optional, but when deciding on relevance, finding 'Java' in a record will improve its ranking, whereas finding 'Oracle' will degrade the ranking. The order of words in the search string or the record is not important.

 
 select title from article where match (title,body)       against ('+linux +"Open Source" -desktop Java ~Oracle' IN BOOLEAN MODE); 

The full set of operators is shown in Table 9.1.

Table 9.1. Boolean Mode Search Operators

Operator

Meaning

+

This word is compulsory.

-

This word must not appear.

<

This word is less important.

>

This word is more important.

( )

Group words together as a subexpression.

~

This word may appear, but it has a negative effect on ranking.

*

Wildcard suffix. For example, merge will not match merger , but merge* will match both merge and merger . May be used only at the end of a word.

" "

This is a phrase. Matches only exactly the same content in the same order.

It is not required that you have full-text indexes to do Boolean mode searches. You can search unindexed tables this way, but it will be very slow.

Another small difference between full-text searches and Boolean searches is that when the search is done in Boolean mode, words that are common in your data are not ignored. The 50% rule does not apply. If we were searching Acme PTY Ltd. newsletter articles, the next query would probably return nearly all rows, whereas the one following that would result in an empty result set.

 
 select title from article where match (title,body) against ('Acme' IN BOOLEAN MODE); select title from article where match (title,body) against ('Acme'); 


MySQL Tutorial
MySQL Tutorial
ISBN: 0672325845
EAN: 2147483647
Year: 2003
Pages: 261

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net