The htmlentitydefs module contains a dictionary with many ISO Latin-1 character entities used by HTML. Its use is demonstrated in Example 5-10.
Example 5-10. Using the htmlentitydefs Module
File: htmlentitydefs-example-1.py import htmlentitydefs entities = htmlentitydefs.entitydefs for entity in "amp", "quot", "copy", "yen": print entity, "=", entities[entity] amp = & quot = " copy = 302251 yen = 302245
Example 5-11 shows how to combine regular expressions with this dictionary to translate entities in a string (the opposite of cgi.escape).
Example 5-11. Using the htmlentitydefs Module to Translate Entities
File: htmlentitydefs-example-2.py
import htmlentitydefs
import re
import cgi
pattern = re.compile("&(w+?);")
def descape_entity(m, defs=htmlentitydefs.entitydefs):
# callback: translate one entity to its ISO Latin value
try:
return defs[m.group(1)]
except KeyError:
return m.group(0) # use as is
def descape(string):
return pattern.sub(descape_entity, string)
print descape("<spam&eggs>")
print descape(cgi.escape(""))
Finally, Example 5-12 shows how to use translate reserved XML characters and ISO Latin-1 characters to an XML string. This is similar to cgi.escape, but it also replaces non-ASCII characters.
Example 5-12. Escaping ISO Latin-1 Entities
File: htmlentitydefs-example-3.py
import htmlentitydefs
import re, string
# this pattern matches substrings of reserved and non-ASCII characters
pattern = re.compile(r"[&<>"x80-xff]+")
# create character map
entity_map = {}
for i in range(256):
entity_map[chr(i)] = "&%d;" % i
for entity, char in htmlentitydefs.entitydefs.items():
if entity_map.has_key(char):
entity_map[char] = "&%s;" % entity
def escape_entity(m, get=entity_map.get):
return string.join(map(get, m.group()), "")
def escape(string):
return pattern.sub(escape_entity, string)
print escape("")
print escape("303245 i 303245a 303244 e 303266")
<spam&eggs>
å i åa ä e ö
Core Modules
More Standard Modules
Threads and Processes
Data Representation
File Formats
Mail and News Message Processing
Network Protocols
Internationalization
Multimedia Modules
Data Storage
Tools and Utilities
Platform-Specific Modules
Implementation Support Modules
Other Modules