The htmlentitydefs Module

The htmlentitydefs module contains a dictionary with many ISO Latin-1 character entities used by HTML. Its use is demonstrated in Example 5-10.

Example 5-10. Using the htmlentitydefs Module


import htmlentitydefs

entities = htmlentitydefs.entitydefs

for entity in "amp", "quot", "copy", "yen":
 print entity, "=", entities[entity]

amp = &
quot = "
copy = 302251
yen = 302245

Example 5-11 shows how to combine regular expressions with this dictionary to translate entities in a string (the opposite of cgi.escape).

Example 5-11. Using the htmlentitydefs Module to Translate Entities


import htmlentitydefs
import re
import cgi

pattern = re.compile("&(w+?);")

def descape_entity(m, defs=htmlentitydefs.entitydefs):
 # callback: translate one entity to its ISO Latin value
 return defs[]
 except KeyError:
 return # use as is

def descape(string):
 return pattern.sub(descape_entity, string)

print descape("<spam&eggs>")
print descape(cgi.escape(""))

Finally, Example 5-12 shows how to use translate reserved XML characters and ISO Latin-1 characters to an XML string. This is similar to cgi.escape, but it also replaces non-ASCII characters.

Example 5-12. Escaping ISO Latin-1 Entities


import htmlentitydefs
import re, string

# this pattern matches substrings of reserved and non-ASCII characters
pattern = re.compile(r"[&<>"x80-xff]+")

# create character map
entity_map = {}

for i in range(256):
 entity_map[chr(i)] = "&%d;" % i

for entity, char in htmlentitydefs.entitydefs.items():
 if entity_map.has_key(char):
 entity_map[char] = "&%s;" % entity

def escape_entity(m, get=entity_map.get):
 return string.join(map(get,, "")

def escape(string):
 return pattern.sub(escape_entity, string)

print escape("")
print escape("303245 i 303245a 303244 e 303266")

å i åa ä e ö

Core Modules

More Standard Modules

Threads and Processes

Data Representation

File Formats

Mail and News Message Processing

Network Protocols


Multimedia Modules

Data Storage

Tools and Utilities

Platform-Specific Modules

Implementation Support Modules

Other Modules

Python Standard Library
Python Standard Library (Nutshell Handbooks) with
ISBN: 0596000960
EAN: 2147483647
Year: 2000
Pages: 252
Authors: Fredrik Lundh

Similar book on Amazon © 2008-2017.
If you may any questions please contact us: