Lesson 3: Globalization Issues
The .NET Framework handles many globalization issues automatically, based on the setting in the CurrentCulture property. However, globalization issues also affect some basic programming techniques. In this lesson, you ll learn about some general programming issues you need to think about when developing global applications. You ll also learn about issues that arise when storing and presenting the extended character sets used by some languages.
After this lesson, you will be able to
Discuss some of the general programming issues affected by globalization
Save Web forms containing special characters from other languages
Use Web forms stored in different character encodings
Detect the user s requested character encoding
Create responses using different character encodings
Estimated lesson time: 10 minutes
General Programming Issues
The Best Practices for Developing World-Ready Applications topic in the Visual Studio .NET online Help provides a helpful list of items that you should consider when developing Web applications that will be translated into different languages. The following sections provide more detail on points from that topic that are relevant to Web applications.
Sorting and Comparing Strings
Sorting and string comparison you perform using the .NET Framework classes give the correct result based on the current culture. You need to worry about globalization issues only when implementing your own techniques for those tasks, such as when you are implementing custom sorting procedures. In those cases, use the System.Globalization namespace s CompareInfo class to perform culturally aware string comparisons.
Custom Validation Controls
ASP.NET server controls acquire specific properties based on the current culture. This includes the validation controls. You do need to consider globalization issues when creating custom validation controls, however.
For example, if you are writing a custom validation procedure that verifies that input is numeric, don t rely on the numeric value of the input characters. Instead, use a culturally aware test such as using the Convert class within an exception-handling structure to test whether the value is numeric.
Building Strings
Because some languages read from right to left, and because some words change meaning when in proximity to other words, it s difficult to build strings using concatenation in a way that works for all cultures. Instead, you store completed strings in a resource file and retrieve the strings as needed by using the Resource Manager.
Getting Substrings
In some languages, character symbols (called glyphs) are composed of two characters. In those cases, you might need to adjust how you get substrings. For example, getting the first character of a user s response would be meaningless in those cases.
Character Encoding
Characters within strings can be represented many different ways within a computer. These different ways of representing characters are called character encodings because they encode characters into numbers that the computer can understand.
The common language runtime (CLR) uses the UTF-16 character encoding, which maps characters to 16-bit numbers. These 16-bit numbers provide enough room to accommodate the characters for almost all written languages. Unfortunately, that s about double the size that other character encoding schemes use, which makes the UTF-16 character encoding less than optimal for transmitting text over the Internet.
Therefore, ASP.NET uses the UTF-8 character encoding by default when interpreting requests and composing responses. UTF-8 is optimized for the lower 127 ASCII characters, which means it provides an efficient way to encode languages that use the Latin alphabet. UTF-8 is also backward-compatible with the ASCII character encoding, meaning that UTF- 8 readers can interpret ASCII files.
The CLR handles the conversion between encodings by using classes from the System.Text namespace. You usually don t need to be aware that any of this is going on, with the following exceptions:
If you go to save a Web form or an HTML file that includes extended characters from the UTF-8 encoding
If you need to support specific encodings other than UTF-8
Saving Encoded Files
When you create a Web form that contains characters other than the first 127 ASCII characters, Visual Studio .NET prompts you to save the file using encoding, as shown in Figure 15-9.
Figure 15-9. Saving files with character encoding
To save a file with encodings, follow these steps:
From the File menu, choose Save As, click the Save drop-down arrow, and then select Save With Encoding. Visual Studio .NET displays the Advanced Save Options dialog box, as shown in Figure 15-10.
Figure 15-10. Specifying encoding while saving a file
Choose the encoding you want use for saving the file. Click OK.
Saving the file using the UTF-8 character encoding with signature allows ASP.NET to automatically detect the encoding of the file. If you don t choose to include the signature, or if you use another encoding, you must specify the fileEncoding attribute in the Web.config file, as shown here:
<globalization requestEncoding="utf-8" responseEncoding="utf-8" fileEncoding="utf-8" />
Using Other Encodings
By default, Web applications interpret requests and compose responses using the UTF-8 character encoding. These defaults are set in the globalization element of the Web.config file, as shown in the preceding section.
If a Web application receives a request for an encoding other than UTF-8, ASP.NET interprets that request using the request s encoding. However, ASP.NET does not automatically generate the response in that encoding.
To detect when a request has a specific encoding, get the Request object s ContentEncoding property. For example, the following code displays the encoding of a request on a Web form:
Visual Basic .NET
Response.Write(Request.ContentEncoding.WebName)
Visual C#
Response.Write(Request.ContentEncoding.WebName);
To specify a specific encoding for a response, set the Response object s ContentEncoding property using the System.Text namespace s Encoding class, or use the value returned from the Request object s ContentEncoding property. For example, the following code creates a response using the shift_JIS character encoding:
Visual Basic .NET
Imports System.Text Private Sub Page_Load(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles MyBase.Load Response.ContentEncoding = Encoding.GetEncoding("shift_JIS") Response.Write("This response uses the character encoding: ") Response.Write(Response.ContentEncoding.WebName) End Sub
Visual C#
using System.Text; private void Page_Load(object sender, System.EventArgs e) { Response.ContentEncoding = Encoding.GetEncoding("shift_JIS"); Response.Write("This response uses the character encoding: "); Response.Write(Response.ContentEncoding.WebName); }
For more information about character encoding, see the following online Help topics:
Unicode in the .NET Framework
Encoding Base Types
System.Text Namespace
You can also find information about character encoding on the following Web sites:
http://www.unicode.org
http://www.w3.org/TR/REC-html40/charset.html