Recipe1.22.Printing Unicode Charactersto Standard Output


Recipe 1.22. Printing Unicode Charactersto Standard Output

Credit: David Ascher

Problem

You want to print Unicode strings to standard output (e.g., for debugging), but they don't fit in the default encoding.

Solution

Wrap the sys.stdout stream with a converter, using the codecs module of Python's standard library. For example, if you know your output is going to a terminal that displays characters according to the ISO-8859-1 encoding, you can code:

import codecs, sys sys.stdout = codecs.lookup('iso8859-1')[-1](sys.stdout)

Discussion

Unicode strings live in a large space, big enough for all of the characters in every language worldwide, but thankfully the internal representation of Unicode strings is irrelevant for users of Unicode. Alas, a file stream, such as sys.stdout, deals with bytes and has an encoding associated with it. You can change the default encoding that is used for new files by modifying the site module. That, however, requires changing your entire Python installation, which is likely to confuse other applications that may expect the encoding you originally configured Python to use (typically the Python standard encoding, which is ASCII). Therefore, this kind of modification is not to be recommended.

This recipe takes a sounder approach: it rebinds sys.stdout as a stream that expects Unicode input and outputs it in ISO-8859-1 (also known as "Latin-1"). This approach doesn't change the encoding of any previous references to sys.stdout, as illustrated here. First, we keep a reference to the original, ASCII-encoded sys.stdout:

>>> old = sys.stdout

Then, we create a Unicode string that wouldn't normally be able to go through sys.stdout:

>>> char = u"\N{LATIN SMALL LETTER A WITH DIAERESIS}" >>> print char Traceback (most recent call last):   File "<stdin>", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128)

If you don't get an error from this operation, it's because Python thinks it knows which encoding your "terminal" is using (in particular, Python is likely to use the right encoding if your "terminal" is IDLE, the free development environment that comes with Python). But, suppose you do get this error, or get no error but the output is not the character you expected, because your "terminal" uses UTF-8 encoding and Python does not know about it. When that is the case, we can just wrap sys.stdout in the codecs stream writer for UTF-8, which is a much richer encoding, then rebind sys.stdout to it and try again:

>>> sys.stdout = codecs.lookup('utf-8')[-1](sys.stdout) >>> print char ä

This approach works only if your "terminal", terminal emulator, or other window in which you're running the interactive Python interpreter supports the UTF-8 encoding, with a font rich enough to display all the characters you need to output. If you don't have such a program or device available, you may be able to find a suitable one for your platform in the form of a free program downloadable from the Internet.

Python tries to determine which encoding your "terminal" is using and sets that encoding's name as attribute sys.stdout.encoding. Sometimes (alas, not always) it even manages to get it right. IDLE already wraps your sys.stdout, as suggested in this recipe, so, within the environment's interactive Python shell, you can directly print Unicode strings.

See Also

Documentation for the codecs and site modules, and setdefaultencoding in module sys, in the Library Reference and Python in a Nutshell; Recipe 1.20 and Recipe 1.21.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net