Recipe 12.6. Removing Whitespace-only Text Nodes from an XML DOM Node's SubtreeCredit: Brian Quinlan, David Wilson ProblemYou want to remove, from the DOM representation of an XML document, all the text nodes within a subtree, which contain only whitespace. SolutionXML parsers consider several complex conditions when deciding which whitespace-only text nodes to preserve during DOM construction. Unfortunately, the result is often not what you want, so it's helpful to have a function to remove all whitespace-only text nodes from among a given node's descendants: def remove_whilespace_nodes(node): """ Removes all of the whitespace-only text decendants of a DOM node. """ # prepare the list of text nodes to remove (and recurse when needed) remove_list = [ ] for child in node.childNodes: if child.nodeType == dom.Node.TEXT_NODE and not child.data.strip( ): # add this text node to the to-be-removed list remove_list.append(child) elif child.hasChildNodes( ): # recurse, it's the simplest way to deal with the subtree remove_whilespace_nodes(child) # perform the removals for node in remove_list: node.parentNode.removeChild(node) node.unlink( ) DiscussionThis recipe's code works with any correctly implemented Python XML DOM, including the xml.dom.minidom that is part of the Python Standard Library and the more complete DOM implementation that comes with PyXML. The implementation of function remove_whitespace_node is quite simple but rather instructive: in the first for loop we build a list of all child nodes to remove, and then in a second, separate loop we do the removal. This precaution is a good example of a general rule in Python: do not alter the very container you're looping onsometimes you can get away with it, but it is unwise to count on it in the general case. On the other hand, the function can perfectly well call itself recursively within its first for loop because such a call does not alter the very list node.childNodes on which the loop is iterating (it may alter some items in that list, but it does not alter the list object itself). See AlsoLibrary Reference and Python in a Nutshell document the built-in XML support in the Python Standard Library. |