IIS and its related technologies-such as ASP-are key to developing a non-English Web site. Some of the international features associated with IIS include its code-page settings, request collections, Uniform Resource Identifier (URI) processing, and logging-among other features discussed in the sections that follow.
Glossary
When the server receives a request for an ASP file, it processes server-side scripts contained in the file to build the Web page that is sent to the browser. In IIS 6, ASP and the script engines it supports use Unicode internally. If you author all of your pages in the default code page of the Web server, ASP automatically converts strings to Unicode. (Again, this is true only for IIS 6.) If your script was not created for the Web server's or the browser's default code page, you need to specify the code page. This will allow strings to be correctly converted as they are passed between the script and the ASP engine, between the ASP engine and the browser, and between the ASP engine and COM components. To specify the code page for an ASP page, you can use the Response.CodePage property, the Session.CodePage property, the AspCodePage metabase property, the @CODE-PAGE directive, locale ID (LCID) settings, and the HTTP Charset attribute. (For more information on Response.CodePage and Session.CodePage, see Chapter 3, "Unicode." ) Since IIS does not offer its own code-page or locale support-such as National Language Support (NLS) features-the support available within Windows is used. Consequently, when you select a code page (charset) or locale for an ASP page, session, or response, you must have the appropriate Windows support installed on the server computer.
When a script is executed, Response.CodePage determines how characters are encoded. If your ASP page runs on IIS 5.1 or later, it is always better to set the value of Response.CodePage explicitly. This way, you eliminate the implicit behaviorthat can cause text transformations you do not expect. If Response.CodePage is not set explicitly in a Web page, then it is set implicitly with this hierarchy:
Response.CodePage allows applications that don't enable session state to specify a code page dynamically. For run-time execution of ASP, the following logic for initializing Response.CodePage is used.
If session state is disabled:
if (@CODEPAGE defined) Response.CodePage = @CODEPAGE value else if (AspCodePage property present) Response.CodePage = value at AspCodePage else Response.CodePage = CP_ACP
Note
If session state is enabled:
if (Session.CodePage set explicitly (via script execution)) Response.CodePage = Session.CodePage else if (@CODEPAGE defined) Response.CodePage = @CODEPAGE value else if (AspCodePage property present (implicitly sets Session.CodePage)) Response.CodePage = value at AspCodePage else Response.CodePage = CP_ACP
If session state is enabled and the value of Response.CodePage is not set explicitly, the value of Session.CodePage is used during script execution to dynamically specify how the strings coming back from the script engine are to be sent to the client.
Residing at the application level in the metabase, AspCodePage acts as the default value for any of the initial values of Session.CodePage, Response.CodePage, and @CODEPAGE. However, this default value can be overridden by @CODEPAGE when compiling an ASP page. The compile-time code page for Global.asa is specified by using AspCodePage.
This value is used to compile the ASP file, such as in converting the contents of the ASP file to Unicode, as required by the script-engine interface. Specifying a code page in this manner ensures that your literal strings are converted correctly. The value is used for run-time execution, but has no life beyond the executing script. For compiling the ASP page, the order is:
if (@CODEPAGE) else if (AspCodePage property is present) else use CP_ACP
However, you should be aware that if Response.CodePage is initialized by Server.Execute using a different value for @CODEPAGE than that of the calling ASP, the encoding is not reset to the calling ASP upon return from the child ASP. Instead, the encoding stays in the new code page.
The same basic logic that Response.CodePage, Session.CodePage, and @CODEPAGE use applies to the LCID settings Response.LCID, Session.LCID, the AspLCID metabase property, and @LCID. If Session.LCID is not explicitly set in a page, it is implicitly set by the AspLCID metabase property. If AspLCID is not set, or if it is set to zero, Session.LCID is set by the default system locale. Session.LCID can be set multiple times in one Web page and used to format data each time. (For more information on LCIDs, see Chapter 4, "Locale and Cultural Awareness." )
The Charset attribute, which specifies the character set to be used in an HTML page, remains unchanged for IIS 6. The Indexing Service uses the character set specified by Charset to properly display text in the HTML page. It is recommended that you set Response.CodePage, Response.CharSet, and the Charset attribute to the same value. In a file written in Japanese, you would set the Charset attribute to shift_jis as shown in the following example:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=shift_jis">
Another international feature of IIS involves request collections. The following section discusses how these collections work.
Because customers were once obliged to choose non-Unicode character sets instead of UTF-8 to build content on IIS 4 and IIS 5, and because ASP execution on IIS 4 and IIS 5 didn't fully support UTF-8, support for Windows code pages is still necessary. The processing code page used by the request collections is set with the Response.CodePage command. In addition, UTF-8 support has been added for the five types of request collections in ASP: ClientCertificate, Cookies, Form, QueryString, andServerVariables.
Windows XP security fully supports ClientCertificate encoded in UTF-8.
IIS 6 will convert cookie parameters to and from Unicode using the current Response.CodePage. This affects NAME, VALUE, PATH, and DOMAIN elements in the cookie. If an element of the cookie contains nonalphanumeric characters, it will be URL-encoded with % escaping and hexadecimal codes after WideChar-ToMultiByte conversion.
All properties of the Form collection support Windows code pages and UTF-8.
All properties of the QueryString collection support Windows code pages and UTF-8.
In IIS 6, all file names and metabase paths are in Unicode. This is a change from IIS 4 and IIS 5 where file names and paths were dependent on CP_ACP. In addition, the core Web server maintains them as Unicode, and ASP now flows through into the scripting engines. To provide for these changes, the ServerVariables in Table 15-2 now also support Unicode. (Go to http://msdn.microsoft.com to learn more about each variable.)
Table 15-2 Unicode-supported ServerVariables.
APP_POOL_ID | LOGON_USER |
APPL_MD_PATH | PATH_INFO |
APPL_PHYSICAL_PATH | PATH_TRANSLATED |
AUTH_USER | REMOTE_HOST |
CERT_ISSUER | REMOTE_USER |
CERT_SERVER_ISSUER | SCRIPT_NAME |
CERT_SERVER_SUBJECT | SCRIPT_TRANSLATED |
CERT_SUBJECT | SERVER_NAME |
HTTPS_SERVER_ISSUER | UNMAPPED_REMOTE_USER |
HTTPS_SERVER_SUBJECT | URL |
INSTANCE_META_PATH |
Another international feature of IIS 6 is URI processing. ASP can now receive code page-based or UTF-8 URIs from the client. When a client sends a URI that contains characters from the double-byte character set (DBCS), IIS first tries to resolve the URI as UTF-8. If IIS cannot connect to that URI, the URI is assumed to be encoded in the active Windows code page of the server. If it's not a UTF-8 string, MultiByteToWideChar is called to convert the URI string to Unicode. When a client sends a UTF-8 URI, IIS looks at the URI and tries to determine if it is a UTF-8 URI. If it is, the URI is simply converted back to a Unicode string.
On servers where the active code page is a DBCS encoding, the algorithm first tries to resolve the URI as a DBCS string in the current system locale, and not as a UTF-8 string. The rest of the algorithm is not changed.
IIS 6 now supports Windows "ANSI" code-page or UTF-8 parameters for the methods and components shown in Table 15-3. (Go to http://msdn.microsoft.com to learn more about these methods and components.)
Table 15-3 Methods and components supported by UTF-8.
Server.Execute | ContRot.dll |
Server.MapPath | NextLink.dll |
Server.Transfer | FileSystemObject |
Server.URLEncode | Logging Utility |
Server-Side Includes | Page Counter Component |
Global.asa | Permission Checker Component |
AdRot.dll | Tools Component |
IIS 6 also supports many new Unicode server-support functions. UNICODE_ server variables are not exposed to ASP. These variables should be used in programs that directly access Internet Server Application Programming Interface (ISAPI) extensions, a topic that is beyond the scope of this book. If these variables are used in an ASP page, an error will be returned. Since ASP developers operate in a Unicode world already, there is no need to support this new functionality.
Because of the need to handle content that is increasingly international, logging has met this challenge. The following section explains the support that logging offers.
To be more internationally flexible, IIS 6 features the ability to write out log files in UTF-8. By default, this feature is turned off, and files are encoded as the active Windows "ANSI" Code Page (ACP). There is a check box at the server level that tells the system whether to use UTF-8 when logging. Once the Web or FTP services have been started, this switch cannot be changed. To modify its setting, these services must be stopped. The names of Web and FTP services in the list of Windows services appear, respectively, as "World Wide Web Publishing Service" and the "FTP Publishing Service."
It is important to be aware that there are a few limitations associated with IIS 4 and IIS 5 in terms of their international support. The following will show you what these issues are and how to handle them.