Section 4.11. Using UTF-8 with APIs


4.11. Using UTF-8 with APIs

An API has two vectors over which you're going to need to enforce a character set and encoding: input and output. (Throughout this book, the term API refers to external web services APIs, unless otherwise noted. We're not talking about language tools or classes.)

As far as the output goes, you probably already have it covered. If API responses are XML based, then you can use the same HTTP and XML headers as we previously discussed. If your output is HTML based, the HTTP header and <meta> tag combination will work fine.

For other custom outputs, using a BOM can be a good idea if you have some way to determine the start of a stream. If you can't or don't want to use a BOM, nothing beats just documenting what you're sending. Making your output character set and encoding explicit early on will guard against people developing applications that work at first but crash when they finally encounter some foreign text.

Input to APIs can be a bigger problem. As the saying goes, the only things less intelligent than computers are their users. If you expose a public API to your application, you can't guarantee that the text sent will be in the correct character set. As with all input vectors, it's extremely important to verify that all input is both valid and goodsomething we're going to look at in detail in the next chapter.



Building Scalable Web Sites
Building Scalable Web Sites: Building, Scaling, and Optimizing the Next Generation of Web Applications
ISBN: 0596102356
EAN: 2147483647
Year: 2006
Pages: 119
Authors: Cal Henderson

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net