Accessing child nodes in a parsed DOM tree can be managed in several different ways. This phrase discusses how to access them using a direct reference, looking up the object by tag name and simply walking the DOM tree. The first step is to parse the XML document using the minidom.parse(file) function to create a DOM tree object. The child nodes of the DOM tree can be accessed directly using the childNodes attribute, which is a list of the child nodes at the root of the tree. Because the childNodes attribute is a list, nodes can be accessed directly using the following syntax: childNodes[index]. Note The first node in the childNodes list of the DOM tree object will be the DTD node. To search for nodes by their tag name, use the getElementsByTagName(tag) of the node object. The getElementsByTagName function accepts a string representation of the tag name for child nodes and returns a list of all child nodes with that tag. You can also walk the DOM tree recursively by defining a recursive function that will accept a node list; then, call that function and pass the childNodes attribute of the DOM tree object. Finally, recursively call the function again with the childNodes attribute of each child node in the node list, as shown in the sample phrase. from xml.dom import minidom #Parse XML file to DOM tree xmldoc = minidom.parse('emails.xml') #Get nodes at root of tree cNodes = xmldoc.childNodes #Direct Node Access print "DTD Node\n=================" print cNodes[0].toxml() #Find node by name print "\nTo Addresses\n===================" nList = cNodes[1].getElementsByTagName("to") for node in nList: eList = node.getElementsByTagName("addr") for e in eList: print e.toxml() print "\nFrom Addresses\n===================" nList = cNodes[1].getElementsByTagName("from") for node in nList: eList = node.getElementsByTagName("addr") for e in eList: print e.toxml() #Walk node tree def printNodes (nList, level): for node in nList: print (" ")*level, node.nodeName, \ node.nodeValue printNodes(node.childNodes, level+1) print "\nNodes\n===================" printNodes(xmldoc.childNodes, 0) xml_child.py DTD Node ================= <!DOCTYPE emails [ <!ELEMENT email (to, from, subject, date, body)> <!ELEMENT to (addr+)> <!ELEMENT from (addr)> <!ELEMENT subject (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ELEMENT addr (#PCDATA)> <!ATTLIST addr type (FROM | TO | CC | BC) "none"> ]> To Addresses =================== <addr type="TO">bwdayley@novell.com</addr> <addr type="CC">bwdayley@sfcn.org</addr> <addr type="TO">bwdayley@novell.com</addr> <addr type="BC">bwdayley@sfcn.org</addr> From Addresses =================== <addr type="FROM">ddayley@sfcn.org</addr> <addr type="FROM">cdayley@sfcn.org</addr> Nodes =================== emails None emails None #text email None #text to None #text addr None #text bwdayley@novell.com #text addr None #text bwdayley@sfcn.org #text #text from None #text addr None #text ddayley@sfcn.org #text #text subject None #text Update List #text body None #text Please add me to the list. #text #text . . . Output from xml_child.py code. |