International Domain Name Mapping


When the World Wide Web was first developed, the Domain Name Service upon which it is based was rooted firmly in the ASCII character set. This meant that all domain names had to conform to 7-bit ASCII. This tiny range (U+0000 to U+007f) covers the English language and very few others. The problem faced by the rest of the world was how to create domain names that used characters outside of this range yet still worked with the antiquated ASCII DNS.

Before we look at the solution, let's be a little clearer about what the problem is from the developer's point of view. Open Internet Explorer 6 or earlier and navigate to www.i18ncafé.com. Internet Explorer will be unable to navigate to this page because the domain name cannot be resolved; it contains an "e" with an acute accent (é), which is outside of the 7-bit ASCII range. If you use Internet Explorer 7, FireFox, Mozilla, Opera, or Safari, you will be able to successfully navigate to this page because all of these browsers support international domain names (IDN). Developers need to care about this problem because if your application needs to navigate to such a page using Internet Explorer 6 or earlier, or to send an e-mail to a person on such a domain name, or interact with such a domain in any way that uses DNS, you will need to know how to convert the name to its ASCII equivalent.

In 2003, the IETF published the "Internationalizing Domain Names in Applications (IDNA)" standard (RFC 3490) to provide an interim solution until DNS fully supports Unicode. Remember that the Internet is the world's largest legacy system, so upgrading it is not a fast process. IDNA is an encoding mechanism that converts Unicode domain names into ASCII domain names that can be recognized by Domain Name Servers everywhere. The .NET Framework 2.0 includes the IdnMapping class, which is an encapsulation of the IDNA encoding mechanism. Figure 6.9 shows a Windows Forms application that illustrates the IDN problem and solution.

Figure 6.9. IDN Mapping Problem


The "Go" button next to the International Domain Name TextBox contains the following code:

 webBrowser1.Navigate(textBoxIDN.Text); 


When the button is pressed, the WebBrowser at the bottom of the form fails to navigate to the domain. The IDN To ASCII button contains the following code:

 IdnMapping idnMapping = new IdnMapping(); textBoxASCII.Text = idnMapping.GetAscii(textBoxIDN.Text); 


When the button is pressed, the ASCII TextBox is filled with the encoded domain name "http://www.xni18ncaf-hya.com". The second "Go" button contains the following code:

 webBrowser1.Navigate(textBoxASCII.Text); 


The browser can successfully navigate to the ASCII domain name. To convert in the other direction, from ASCII to Unicode, you use the IdnMapping.GetUnicode method:

 IdnMapping idnMapping = new IdnMapping(); textBoxIDN.Text = idnMapping.GetUnicode(textBoxASCII.Text); 


The strategy is that, in your application, you use the IdnMapping class to show the Unicode domain names in the user interface, but the ASCII domain names in any programmatic operation (such as navigating to a page or sending an email).

Table 6.18 shows a number of examples of Unicode domain names and their ASCII-encoded equivalents. You should be able to appreciate from the list that Anglicizing all domain names is an unacceptable solution to people who do not use English as a primary language. It is also worth noting that names that do not need any conversion do not get any conversion, so it is safe to use the IdnMapping class everywhere without fear of it breaking existing code.

Table 6.18. International Domain Name Examples

Unicode Domain Name

Non-ASCII Characters

ASCII Domain Name

www.microsoft.com

None

www.microsoft.com

www.Bodenschätze.de

German

www.xnbodenschtze-s8a.de

www.münchhausen.at

German

www.xnmnchhausen-9db.at

www..com

Japanese

www.xn5m4ayq.com

www..com

Korean

www.xnor3bi2dc4z.com

www.gulvmiljø.no

Norwegian

www.xngulvmilj-d5a.no

www.españa.com

Spanish

www.xnespaa-rta.com


International Domain Names and Visual Spoofing

One of the concerns that you will often see raised with regard to international domain names is visual spoofing. Table 6.19 illustrates the problem.

Table 6.19. Visual Spoofing Using International Domain Names

Original Domain Name

Spoofed Domain Name

ASCII Spoofed Domain Name

www.microsoft.com

www.microsoft.com

www.xnmirsft-k0eb08i.xnm-rmb30b

www.gotdotnet.com

www.gotdotnet.com

www.xngtdtnt-i0ec87h.xnm-rmb30b

www.cnn.com

www.cnn.com

www.xnnn-nmc.xnm-rmb30b

www.addisonwesley.com

www.addisonwesley.com

www.xnddisnwsly-t1g25j6bc.xnm-rmb30b


Can you see a difference between the original domain name and the spoofed domain name? No, neither can Ibut they are different, as you can see by the ASCII domain name of the spoofed domain. The difference is that certain letters have been replaced with different letters that are visually identical in certain fonts. The Latin Small Letter O (U+006F), for example, has been replaced with the Greek Small Letter Omicron (U+03BF). In general, spoof characters are drawn from Cherokee, Cyrillic, or Greek characters. The problem is that people see links to Web sites and e-mail addresses (often in spam emails), and trust them to be genuine because they look genuine. The problem itself isn't new, but international domain names make the scope of the problem much wider. The problem itself doesn't have any impact on the steps you need to take when internationalizing your applications, but you should be aware of this security issue (see http://www.unicode.org/reports/tr36/ for more details).




.NET Internationalization(c) The Developer's Guide to Building Global Windows and Web Applications
.NET Internationalization: The Developers Guide to Building Global Windows and Web Applications
ISBN: 0321341384
EAN: 2147483647
Year: 2006
Pages: 213

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net