Section 20.2. Dealing with Unicode in TurboGears


20.2. Dealing with Unicode in TurboGears

A lot of effort has gone into TurboGears to make sure it plays nicely with respect to Unicode. Rather than make you responsible for handling everything yourself, TurboGears transparently encodes and decodes strings for youmost of the time.

For example, whenever you use turbogears.flash(), you pass a Unicode string to it that is correctly encoded to UTF-8 to set a cookie and then sent to the browser. When TurboGears receives the cookie back from the browser on a later request, it turns the cookie back into a Unicode string so that it can be processed in Python. Then, when it's returned to the browser for display, it's re-encoded in UTF-8 in order to display correctly to the user.

20.2.1. SQLObject and Unicode

SQLObject contains _UnicodeCol_, which provides transparent encoding/decoding of Unicode values. It's just like _StringCol_ (in fact, it's a subclass of it) except that SQLObject encodes Python Unicode strings on the way to the database and decodes them on the way back. By default, UTF-8 encoding in the database table is assumed, which usually "just works."

If you have the following SQLObject declaration, you'll get a database column that holds Unicode strings in UTF-8 form:

class T1(SQLObject):     name = UnicodeCol(length=40, alternateID=True)


The one caveat here is that your database of choice, and your database driver of choice, have to handle UTF-8. SQLObject helps you out where it can; but it can't get water from a stone or Unicode strings from an ASCII-only database!

Here's a sample usage:

>>> omega = T1(name=u'Greek \u03a9') >>> omega <T1 1 name=u'Greek \u03a9'> >>> omega = T1.byName(u'Greek \u03a9') >>> omega.name u'Greek \u03a9'


As just noted, there is one important caveat to remember: transparent encoding/decoding breaks when you use custom SQL queries that you have built either with sqlbuilder or by hand. For example, if you submit the following query, you will get a UnicodeEncodeError.

>>> list(T1.select(T1.q.name==u'Greek \u03a9')) Traceback (most recent call last): ... UnicodeEncodeError: 'ascii' codec can't encode character u'\u03a9' in position 54: ordinal not in range(128)


What happened? Well, the ASCII code used in the database driver can't handle all the Unicode characters you threw at it, and it blew up.

You can use Unicode in TurboGears with all of the DB-API drivers that support Unicode queries, but Unicode Columns break if you use a database driver that doesn't support Unicode. You will likely get a different error if your database doesn't support Unicode strings, so you might have to watch out for that, too.

The lack of helpful error messages when the Unicode support is missing from your database drivers, or your database itself, is a known problem and might be fixed by the time you read this; however, you'd better double-check. Right now you're likely to have trouble with Unicode and SQLite or the psycopg driver for postgresbut if you are using postgres, a simple switch to psycopg2 will likely resolve your Unicode-related problems.

To work around this problem, you can encode your Unicode strings manually:

>>> list(T1.select(T1.q.name==u'Greek \u03a9'.encode('utf8'))) [<T1 1 name=u'Greek \u03a9'>]


20.2.2. Kid Templates

When you you use Unicode strings with Kid everything usually "just works."

TurboGears uses the _kid.encoding_ configuration option, which defaults to UTF-8 to automatically encode Kid HTML/XML output. The same option is used to set the "Content-Type" HTTP header so that the browser will display it correctly.

One pitfall to watch for is Kid file encoding. Just like any other XML file, the encoding is typically declared in the XML header line of your Kid file. Make sure that the declared encoding matches the actual file content. If the header line and file content don't match, you might get strange "input file not well-formed" error messages from Kid even though the file is okay, just saved with the wrong encoding.

Here's an example of a Kid template and usage:

Controller:

  @expose(template="ch18.templates.welcome")   def index(self):       names = [           ('Cyrillic A', u'\u0410'),           ('Cyrillic BE', u'\u0411'),           ('Cyrillic VE', u'\u0412'),       ]       return dict(names=names)


Kid:

<p py:for="(title,s) in names"><em>${s}</em> &mdash; ${title}</p>


Browser displays:

ACyrillic A

ECyrillic BE

BCyrillic VE

You will also want to check out the kid.encoding configuration option, which serves several different purposes.

First, if you want to compose your Kid templates using an encoding different from ASCII or UTF-8, say, latin1, you can set kid.encoding to specify that. Do not forget to put the corresponding XML declarations in the file itself.

<?xml version="1.0" encoding="iso-8859-1"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/ TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:py="http://purl.org/kid/ns#"     py:extends="'master.kid'"> <body>   ... Latin1 text goes here ... </body> </html>


The confuguration option kid.encoding is also used to guess the proper Content-Type to set when rendering the Kid template to the browser.

20.2.3. Unicode in CherryPy's Request/Response Cycle

In the default configuration, TurboGears converts all incoming parameters that are passed to your controller into Unicode strings. Here is a small test case:

  @expose()   def process(self, name=None):       return "Name is %r" % name       # -> yields something like "Name is u'Smarty'"


Usually you would just keep it that way. If for some reason you disable this with the decoding_filter option, you might want to use turbogears.validators.UnicodeString to decode the selected parameters by hand:

  @expose()   @turbogears.validate(validators={'name':validators.UnicodeString()})   def process(self, name=None, name2=None):       return "Name is %r while name2 is %r" % name       # -> yields something like "Name is u'Smarty' while name2 is 'Smarty'"


TurboGears also takes care of encoding Unicode on output:

  @expose()   def process(self, name=None):       return u'Smarty'       # -> browser receives string "Smarty" encoded into utf-8 encoding


The same is true if you use Kid or some other templating enginethe output will be correctly encoded and the Content-Type header will be set correctly.




Rapid Web Applications with TurboGears(c) Using Python to Create Ajax-Powered Sites
Rapid Web Applications with TurboGears: Using Python to Create Ajax-Powered Sites
ISBN: 0132433885
EAN: 2147483647
Year: 2006
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net