Recipe12.8.Supporting Character Sets


Recipe 12.8. Supporting Character Sets

Problem

Your Struts application needs to display characters from any language correctly.

Solution

Use Tomcat's SetCharacterEncoding filter shown in Example 12-7.

Example 12-7. Using a filter to set the character encoding
/* * Copyright 2004 The Apache Software Foundation * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * *     http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package filters; import java.io.IOException; import javax.servlet.Filter; import javax.servlet.FilterChain; import javax.servlet.FilterConfig; import javax.servlet.ServletException; import javax.servlet.ServletRequest; import javax.servlet.ServletResponse; import javax.servlet.UnavailableException; /**  * <p>Example filter that sets the character encoding to be used in parsing  * the incoming request, either unconditionally or only if the client did not  * specify a character encoding.  Configuration of this filter is based on  * the following initialization parameters:</p>  * <ul>  * <li><strong>encoding</strong> - The character encoding to be configured  *     for this request, either conditionally or unconditionally based on  *     the <code>ignore</code> initialization parameter.  This parameter  *     is required, so there is no default.</li>  * <li><strong>ignore</strong> - If set to "true", any character encoding  *     specified by the client is ignored, and the value returned by the  *     <code>selectEncoding( )</code> method is set.  If set to "false,  *     <code>selectEncoding( )</code> is called <strong>only</strong> if the  *     client has not already specified an encoding.  By default, this  *     parameter is set to "true".</li>  * </ul>  *  * <p>Although this filter can be used unchanged, it is also easy to  * subclass it and make the <code>selectEncoding( )</code> method more  * intelligent about what encoding to choose, based on characteristics of  * the incoming request (such as the values of the <code>Accept-Language  * </code> and <code>User-Agent</code> headers, or a value stashed   * in the current user's session.</p>  *  * @author Craig McClanahan  * @version $Revision: 1.5 $ $Date: 2005/03/21 18:08:09 $  */ public class SetCharacterEncodingFilter implements Filter {     // ------------------------------------------------- Instance Variables     /**      * The default character encoding to set for requests that pass through      * this filter.      */     protected String encoding = null;     /**      * The filter configuration object we are associated with.  If this value      * is null, this filter instance is not currently configured.      */     protected FilterConfig filterConfig = null;     /**      * Should a character encoding specified by the client be ignored?      */     protected boolean ignore = true;     // ----------------------------------------------------- Public Methods     /**      * Take this filter out of service.      */     public void destroy( ) {         this.encoding = null;         this.filterConfig = null;     }     /**      * Select and set (if specified) the character encoding to be used to      * interpret request parameters for this request.      *      * @param request The servlet request we are processing      * @param result The servlet response we are creating      * @param chain The filter chain we are processing      *      * @exception IOException if an input/output error occurs      * @exception ServletException if a servlet error occurs      */     public void doFilter(ServletRequest request, ServletResponse response,                          FilterChain chain)         throws IOException, ServletException {         // Conditionally select and set the character encoding to be used         if (ignore || (request.getCharacterEncoding( ) == null)) {             String encoding = selectEncoding(request);             if (encoding != null)                 request.setCharacterEncoding(encoding);         }         // Pass control on to the next filter         chain.doFilter(request, response);     }     /**      * Place this filter into service.      *      * @param filterConfig The filter configuration object      */     public void init(FilterConfig filterConfig) throws ServletException {         this.filterConfig = filterConfig;         this.encoding = filterConfig.getInitParameter("encoding");         String value = filterConfig.getInitParameter("ignore");         if (value == null)             this.ignore = true;         else if (value.equalsIgnoreCase("true"))             this.ignore = true;         else if (value.equalsIgnoreCase("yes"))             this.ignore = true;         else             this.ignore = false;     }     // -------------------------------------------------- Protected Methods     /**      * Select an appropriate character encoding to be used, based on the      * characteristics of the current request and/or filter initialization      * parameters.  If no character encoding should be set, return      * <code>null</code>.      * <p>      * The default implementation unconditionally returns the value configured      * by the <strong>encoding</strong> initialization parameter for this      * filter.      *      * @param request The servlet request we are processing      */     protected String selectEncoding(ServletRequest request) {         return (this.encoding);     } }

Then declare the filter in your web.xml file, setting filter to use "UTF-8" and mapping the filter to all URLs:

<filter>     <filter-name>SetCharacterEncodingFilter</filter-name>     <filter-class>         filters.SetCharacterEncodingFilter     </filter-class>     <init-param>         <param-name>encoding</param-name>         <param-value>UTF-8</param-value>      </init-param>     <init-param>         <param-name>ignore</param-name>         <param-value>true</param-value>      </init-param> </filter> <filter-mapping>     <filter-name>SetCharacterEncodingFilter</filter-name>     <url-pattern>/*</url-pattern>   </filter-mapping>

Discussion

You can ensure your application will accept any character encoding using a filter. The Tomcat distribution includes an example servlet filter that sets the servlet request character encoding to any desired value. Specifying an encoding of UTF-8, a well-supported charset of Unicode, ensures all character sets can be handled.

For web applications, character encoding problems typically occur with forms. The user inputs text on a form using non-Western characters, such as in Russian (Cyrllic) as shown in Figure 12-3, and submits the form.

Figure 12-3. Form fields containing Russian (Cyrillic) characters


But when the input data is displayed on a successive page, the characters appear as gibberish as in Figure 12-4. When the server received the data, it didn't know how to translate the byte sequence into the correct Cyrillic characters.

Figure 12-4. Incorrectly encoded characters


However, if you use the SetCharacterEncoding filter, configured to set the character encoding to UTF-8, the page will display correctly, as in Figure 12-5.

Figure 12-5. Correctly encoded Cyrillic characters


Browser and operating system support for non-Western character encodings varies by vendor and version. Most modern browsers allow you to set the default character encoding to UTF-8. Likewise, most operating systems allow you to input text using non-Western characters. It can be a challenge to keep it all straight, but for your application, this servlet filter solution eliminates a lot of the frustration.

See Also

I18nGurus.com (The Open Internationalization Resources Directory) has a boatload of information internationalization topics. The links on using character sets and character encoding can be found at http://www.i18ngurus.com/docs/984813247.html.



    Jakarta Struts Cookbook
    Jakarta Struts Cookbook
    ISBN: 059600771X
    EAN: 2147483647
    Year: 2005
    Pages: 200

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net