Among the guidelines that you should follow are to always use Unicode data types-if you use multilingual data-and to standardize the site collation where possible. You should also try to deploy a Unicode server and client if you can, consider storage space and performance, and delimit identifiers for objects in system tables.
If you must store character data that reflects multiple languages, you should always use Unicode nchar, nvarchar, and ntext data types instead of the non-Unicode char, varchar, and text data types. There are significant limitations associated with char, varchar, and text data types, since only information from a single code page can be stored.
The other benefit of using Unicode data types is performance gain, which is due to fewer code-page conversions occurring. Here are some representative scenarios:
Note
If you always choose collations with the same case and accent sensitivity, the user will experience consistent behavior across all systems. Most SQL Server 2000 sites choose to be case-insensitive and accent-sensitive. Case sensitivity also applies to the names of SQL Server objects; therefore, if you specify case- sensitive collations, all users must specify the correct case when querying the database.
Always keep in mind that standardizing the site collation helps eliminate the need for explicitly specifying a collation in a character or Unicode expression, because the default collation in an expression can vary depending on the current database setting for the connection. This means that the result of a Transact-SQL statement can have different collations when executed in the context of different databases. If you inevitably have to work with objects that have different collations and code pages, you must code your queries to comply with the rules of collation coercion.
In a very small number of cases, everything you do with a SQL Server will happen on the same computer on which the server resides, and only SQL Server tools such as SQL Query Analyzer or SQL Server Enterprise Manager will be used. In most cases, however, the server will be interacting with other servers or clients, and might be using one or more data-access standards. SQL Server has basically two types of clients:
The possible scenarios of server and client connections, along with certain areas you will need to consider, are presented in the sections that follow.
This is the ideal type of configuration. By keeping data in Unicode throughout the process, you can guarantee the best performance and protection from corruption of retrieved data. This is the case with Microsoft ActiveX Data Objects (ADO) and OLE DB.
In this type of configuration, you might not have any problems storing data, but there is obviously a serious limitation when it comes to bringing the data to the client and using it. The client code page must be used to convert the Unicode data at some point.
This is not an ideal configuration for multilingual data, because you will not be able to keep such data on the server. At the very least, however, you can be sure the data will show up properly. This configuration has all the limitations of the previous case, without the risk of corrupting received data due to invalid conversions. An example is when a SQL Server 2000 database defines a linked server to a Microsoft SQL Server 6.5 database. All information that is received from the server running SQL Server 6.5 will be valid, but do not try to insert any data that is outside of the code page!
This is the most limiting configuration, because you are basically restricted to a single code page at all times.
Ideally, every column is defined with one of the Unicode data types; however, doing so when you do not need to support multilingual data can create issues related to storage space and speed.
The actual amount of space required for Unicode data types is 2 bytes per character; the amount of space for non-Unicode data types is 1 byte for all non-DBCS text and 2 bytes for Asian languages that use DBCS. Therefore, unless your data is on one of the Asian code pages, you will be using twice as much space to store the data. This must be considered when you upgrade existing databases or when you are deciding on the proper data types of new projects. If you are storing data only in a column that is on a single (non-Asian) code page, you might prefer not to use Unicode so you can save disk space and memory.
In addition to storage space, you will also need to consider how performance is affected when you use Unicode data types unnecessarily. The speed issue is a complicated one. Here are some of the issues:
Important
The system tables in SQL Server 2000 store all the data they contain as Unicode. This minimizes the problems that can occur with different collations between databases and columns. There is no other way to deal with the fact that different databases on the same server can have any mix of Unicode and non-Unicode column names. Even if you are supporting only a single language now, SQL Server must be ready to support any language you might choose to support in the future.
When you are converting databases and servers from SQL Server 6.5 or earlier, it is easy to become concerned about the metadata that is being converted; however, there is no need to worry. The conversion to Unicode is straightforward because these earlier versions of SQL Server are only in one code page or collation, each of which is defined at the server level.
One important issue involves using identifiers for objects in the system tables. SQL Server 2000 uses the Unicode 2.0 character property definition to create the list of valid characters in identifiers. (Unicode 3.0 was not released when SQL Server 2000 development was completed.) To avoid running into issues with international characters that are not defined in the Unicode 2.0 character property definition, you should delimit your identifiers with brackets ([]) or double quotation marks ("). This prevents the server from checking for valid characters.