National Character Set

National character sets are alternative character sets that allow you to store Unicode character data that does not have a comparable Unicode database character set. This is only one of the reasons to consider a different national character set from the database character set. Others include differences in properties that have a different character-encoding scheme that might be more desirable for extensive processing operations and ease of programming in the chosen national character set.

Oracle has provided the NCHAR, NVARCHAR2, and NCLOB data types in the Unicode encoding schemes.

NCHAR, NVARCHAR2, and NCLOB are the only data types that Oracle 9i has that are Unicode compliant.

Table 7.1 provides information on the differences between database character sets and national character sets.

Table 7.1. Database and National Character Set Characteristics
Database Character Sets	National Character Sets
Defined at database creation time	Defined at database creation time
May not be changed without re-creation of the database,with limited exceptions	May not be changed without with re-creation of the database, limited exceptions
Stores data in columns of type LONG, CHAR, VARCHAR2, and CLOB	Stores data in columns of type NCHAR, NVARCHAR2, and NCLOB
Can store varying-width character sets	Can store Unicode using either the AL16UTF16 or UTF8 character sets

There are only two choices for national character setsUTF8 and AL16UTF16. When making the decision as to which to use, you need to determine whether either space or performance is expected to be an issue. You may not be able to make this determination at this point, but you should hazard the best guess you can to make the best decision.

AL16UTF16 stores characters in a fixed-width, 2-byte manner. UTF8 is a variable-width character set and stores its characters in exactly 2 bytes.

UTF8 provides more options and flexibility than AL16UTF16 but also has characteristics that make it often less desirable. UTF8 takes up more space than does AL16UTF16, and is also less efficient in terms of performance (this is because AL16UTF16 is fixed width and always takes up 2 bytes).

Should You Use Unicode?

Okay, so Unicode character set schemes are a great thing, but when should you use them? What are the advantages of implementing a Unicode character set? The following sections discuss some of the advantages of using a Unicode national character set.

Ease of Migration

With a Unicode character set, you have easier code migration for Java and PL/SQL code and to and from ASCII-based data. Minimal changes are necessary when implementing multiple languages in a multinational database. This is because with Unicode, when you have to deal with Java and PL/SQL code and are storing multiple languages in existing SQL CHAR data types, you don't need to recode your programs to use the NCHAR data types. This means less recoding and more reusability. Also, if your current data set is strictly US7ASCII, the database can be migrated with a simple ALTER DATABASE statement to use a Unicode character set.

Data Distribution

With a Unicode character set, multilingual data is more evenly distributed. If you have multilingual data, and that data is distributed throughout your database, the choice of Unicode as a national character set solution means that you do not have to identify what columns might be storing multilingual data.

InterMedia Text Search

Unicode character sets facilitate InterMedia Text search. If you have multilingual data in your database, and you are storing it in BLOBs accessed by Oracle text, your only solution is to use a Unicode character set to allow for the InterMedia Text search on that data.

Incremental Addition of Multilingual Support

If you already have an existing database with a database character set that constrains the number and kind of multilingual languages that you can easily store in the database, you can use Unicode to add functionality for these languages without having to migrate the database. After you have added Unicode functionality, you can add the NCHAR data types to new tables and to existing tables in your database.

Creation of Packaged Applications

You can use the NCHAR data type behind many packaged applications because it is a reliable Unicode data type that allows data to be always stored in Unicode standards with length of the data always specified in the UTF-16 code units. Because of this, you would only need to test an application once on a database with Unicode standards and Unicode data types and be assured that the application will run on a customer's database regardless of what database character set the customer has implemented.

Performance

Remember, for performance concerns, you might consider a single-byte character set for the database character set, but if you require the ability to support multiple languages in the database, you can provide the functionality that you need with the AL16UTF16 for the multilingual data and still provide the best performance. There is performance overhead when your database uses UTF-8 encoding. This overhead deals with the fact that you have variable-width format.

There are performance gains to be seen in fixed-width, single-byte or fixed-width, multibyte character sets. The fixed nature means that Oracle always knows exactly how many bytes a character will consume, and fetches will be more efficient.

Many data behaviors are language dependent. These behaviors and the means to deal with them and control how your database deals with them are discussed in the following sections.

Now let's look at what kind of language-dependent behavior we can expect to see in relation to Oracle 9i and globalization.