C.5 Finding Codepoints

Each Unicode character is identified by a unique codepoint. You can find information on character codepoints on official Unicode Web sites, but a quick way to look at visual forms of characters is by generating an HTML page with charts of Unicode characters. The script below does this:

 # Create an HTML chart of Unicode characters by codepoint import sys head = '<html><head><title>Unicode Code Points</title>\n' +\        '<META HTTP-EQUIV="Content-Type" ' +\              'CONTENT="text/html; charset=UTF-8">\n' +\        '</head><body>\n<h1>Unicode Code Points</h1>' foot = '</body></html>' fp = sys.stdout fp.write(head) num_blocks = 32 # Up to 256 in theory, but IE5.5 is flaky for block in range(0,256*num_blocks,256):     fp.write('\n\n<h2>Range %5d-%5d</h2>' % (block,block+256))     start = unichr(block).encode('utf-16')     fp.write('\n<pre>     ')     for col in range(16): fp.write(str(col).ljust(3))     fp.write('</pre>')     for offset in range(0,256,16):         fp.write('\n<pre>')         fp.write('+'+str(offset).rjust(3)+' ')         line = '  '.join([unichr(n+block+offset) for n in range(16)])         fp.write(line.encode('UTF-8'))         fp.write('</pre>') fp.write(foot) fp.close() 

Exactly what you see when looking at the generated HTML page depends on just what Web browser and OS platform the page is viewed on as well as on installed fonts and other factors. Generally, any character that cannot be rendered on the current browser will appear as some sort of square, dot, or question mark. Anything that is rendered is generally accurate. Once a character is visually identified, further information can be generated with the unicodedata module:

 >>> import unicodedata >>> unicodedata.name(unichr(1488)) 'HEBREW LETTER ALEF' >>> unicodedata.category(unichr(1488)) 'Lo' >>> unicodedata.bidirectional(unichr(1488)) 'R' 

A variant here would be to include the information provided by unicodedata within a generated HTML chart, although such a listing would be far more verbose than the example above.

Text Processing in Python
Text Processing in Python
ISBN: 0321112547
EAN: 2147483647
Year: 2005
Pages: 59
Authors: David Mertz

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net