Understanding Character Sets and Collation Sequences


Database tables are used to store and retrieve data. Different languages and character sets need to be stored and retrieved differently. As such, MySQL needs to accommodate different character sets (different alphabets and characters) as well as different ways to sort and retrieve data.

When discussing multiple languages and characters sets, you will run into the following important terms:

  • Character sets are collections of letters and symbols.

  • Encodings are the internal representations of the members of a character set.

  • Collations are the instructions that dictate how characters are to be compared.

Note

Why Collations Are Important Sorting text in English is easy, right? Well, maybe not. Consider the words APE, apex, and Apple. Are they in the correct sorted order? That would depend on whether you wanted a case-sensitive or a not case-sensitive sorting. The words would be sorted one way using a case-sensitive collation, and another way using a not case-sensitive collation. And this affects more than just sorting (as in data sorted using ORDER BY); it also affects searches (whether or not a WHERE clause looking for apple finds APPLE, for example). The situation gets even more complex when characters such as the French à or German ö are used, and even more complex when non-Latin-based character sets are used (Japanese, Hebrew, Russian, and so on).


In MySQL there is not much to worry about during regular database activity (SELECT, INSERT, and so forth). Rather, the decision as to which character set and collation to use occurs at the server, database, and table level.




MySQL Crash Course
MySQL Crash Course
ISBN: 0672327120
EAN: 2147483647
Year: 2004
Pages: 214
Authors: Ben Forta

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net