The structure of an XML document is based on its elements, so it should come as no surprise that the Element class is one of the larger and more important classes in JDOM. Because JDOM has no generic Node class or interface, the Element class is the primary means by which a program navigates the tree to find particular content. Each Element object has the following seven basic properties:
There are additional properties that are not independent of the above seven. For example, the prefix, namespace URI, and fully qualified name are separately readable through the getNamespaceURI() , getNamespacePrefix() , and getQualifiedName() convenience methods: public String getNamespaceURI () public String getNamespacePrefix () public String getQualifiedName () These simply return the relevant parts of the element's namespace and name. All of these getter methods behave pretty much like any other getter methods. That is, they return an object of the relevant type, generally a String , and do not throw any exceptions. The setter methods are more unusual, however. This is one of the few areas in which JDOM does not follow standard Java conventions. Instead of returning void , these methods all return the Element object that invoked the method. That is, a.set Foo (b) returns a . Many other methods you naturally would expect to return void also do this. The purpose is to allow setters to be chained. For example, the following code fragment can build up an entire channel element in just a couple of statements: Element channel = (new Element("channel")) .addContent((new Element("title")).setText("Cafe con Leche")) .addContent((new Element("link")) .setText("http://www.cafeconleche.org/")) .addContent((new Element("description")) .setText("XML News")); Caution I must say that I personally don't find this style of code easier to write or read than the multi-statement approach. However, this is why the adder and setter methods all return the object that did the adding or setting, so I felt compelled to show it to you. But I really recommend strongly that you don't use it. ConstructorsThe four public Element constructors all require you to specify a local name as a String . If the element is in a namespace, then you also need to specify the namespace URI as a String or a Namespace object. Alternatively, you can specify the prefix as a String or a piece of a Namespace object. public Element (String localName ) throws IllegalNameException public Element (String localName, Namespace namespace ) throws IllegalNameException public Element (String localName, String namespaceURI ) throws IllegalNameException public Element (String localName, String prefix, String namespaceURI ) throws IllegalNameException For example, this code fragment creates four Element objects using the various constructors: Element xmlRPCRoot = new Element("methodCall"); Element xhtmlRoot = new Element("html", "http://www.w3.org/1999/xhtml"); Element soapRoot = new Element("Envelope", "SOAP-ENV", "http://schemas.xmlsoap.org/soap/envelope/"); Namespace xsd = Namespace.getNamespace("xsd", "http://www.w3.org/2001/XMLSchema"); Element schemaRoot = new Element("schema", xsd); Navigation and SearchAs you learned in Chapter 14, the getContent() method is the fundamental means of navigating through an XML document with JDOM. This method returns a live List that includes all the children of an element, including comments, processing instructions, text nodes, and elements. To search deeper, you apply getContent() to the child elements of the current element, normally through recursion. Example 15.2 is a simple program that walks the XML document tree, starting at the root element, and prints out the content of the various properties of each element. This is not the most interesting program in the book, but it does demonstrate all of the major getter methods and basic navigation. Pay special attention to the process() method, as you will need to write a method very much like this for any JDOM program that needs to search an entire XML document. It begins with an Element (normally the root element) and recursively applies itself to all of the child elements of the root element. The instanceof operator tests each object in the content list of the Element to determine its type and dispatch it to the right method. Here, TreePrinter dispatches Element objects to the process() method recursively, ignoring all other objects. Example 15.2 Inspecting Elementsimport org.jdom.*; import org.jdom.input.SAXBuilder; import java.io.IOException; import java.util.*; public class TreePrinter { // Recursively descend the tree public static void process(Element element) { inspect(element); List content = element.getContent(); Iterator iterator = content.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); if (o instanceof Element) { Element child = (Element) o; process(child); } } } // Print the properties of each element public static void inspect(Element element) { if (!element.isRootElement()) { // Print a blank line to separate it from the previous // element. System.out.println(); } String qualifiedName = element.getQualifiedName(); System.out.println(qualifiedName + ":"); Namespace namespace = element.getNamespace(); if (namespace != Namespace.NO_NAMESPACE) { String localName = element.getName(); String uri = element.getNamespaceURI(); String prefix = element.getNamespacePrefix(); System.out.println(" Local name: " + localName); System.out.println(" Namespace URI: " + uri); if (!"".equals(prefix)) { System.out.println(" Namespace prefix: " + prefix); } } List attributes = element.getAttributes(); if (!attributes.isEmpty()) { Iterator iterator = attributes.iterator(); while (iterator.hasNext()) { Attribute attribute = (Attribute) iterator.next(); String name = attribute.getName(); String value = attribute.getValue(); Namespace attributeNamespace = attribute.getNamespace(); if (attributeNamespace == Namespace.NO_NAMESPACE) { System.out.println(" " + name + "=\"" + value + "\""); } else { String prefix = attributeNamespace.getPrefix(); System.out.println( " " + prefix + ":" + name + "=\"" + value + "\""); } } } List namespaces = element.getAdditionalNamespaces(); if (!namespaces.isEmpty()) { Iterator iterator = namespaces.iterator(); while (iterator.hasNext()) { Namespace additional = (Namespace) iterator.next(); String uri = additional.getURI(); String prefix = additional.getPrefix(); System.out.println( " xmlns:" + prefix + "=\"" + uri + "\""); } } } public static void main(String[] args) { if (args.length <= 0) { System.out.println("Usage: java TreePrinter URL"); return; } String url = args[0]; try { SAXBuilder parser = new SAXBuilder(); // Parse the document Document document = parser.build(url); // Process the root element process(document.getRootElement()); } catch (JDOMException e) { System.out.println(url + " is not well-formed."); } catch (IOException e) { System.out.println( "Due to an IOException, the parser could not encode " + url ); } } // end main } Following is the beginning of output when I fed this chapter's XML source code into TreePrinter . DocBook doesn't use namespaces, but the XInclude elements do. The root element has some attributes, but most of the structure is based on element name alone. D:\books\XMLJAVA> java TreePrinter jdom_model.xml chapter: revision="20020430" status="rough" id="ch_jdom_model" xmlns:xinclude="http://www.w3.org/2001/XInclude" title: para: para: itemizedlist: listitem: para: classname: ... While in theory you could navigate and query a document using only the List objects returned by getContent() , JDOM provides many methods to simplify the process for special cases, some of which include methods that
Child ElementsThe Element class has two methods (five total when you count overloaded variants separately) that operate only on the child elements of an element, not on other content such as processing intructions and text nodes. These are getChildren() and removeChildren() : public List getChildren () public List getChildren (String name ) public List getChildren (String name , Namespace namespace ) public List removeChildren (String name ) public List removeChildren (String name , Namespace namespace ) These methods are similar to getContent() and removeContent() except that the lists returned only contain child elements, never other kinds of children such as comments and processing instructions. [1] The getChildren() methods simply ignore nonelements. For example, the earlier ElementLister in Example 14.9 only considered elements. Consequently, it could use the getChildren() method instead of getContent() :
public static void process(Element element) { inspect(element); List content = element.getChildren(); Iterator iterator = content.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); Element child = (Element) o; process(child); } } This eliminates one instanceof check and one if block. This is not a huge savings, I admit; but the code is marginally more readable. However, because JDOM uses Java's Object -based List class, you still have to cast all of the items in the list that getChildren() returns to Element . The removeChildren() methods remove all of the elements that match the specified name and namespace URI. If no namespace URI is given, then it removes elements with the given name in no namespace. Other contentcomments, processing instructions, text, and so onis not touched. For example, the following method recursively descends through an element, cutting out all of the note elements: public static void cutNotes(Element element) { List notes = element.getChildren("note"); element.removeChildren(notes); // The element's children have changed so we have to call // getChildren() again List children = element.getChildren(); Iterator iterator = children.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); Element child = (Element) o; cutNotes(child); } } It's important to remember that when an element is removed, the entire element is removed, not just its start-tags and end-tags. Any content inside the element is lost, including, in this case, elements that aren't named note . Single ChildrenOften you want to follow a very specific path through a document. Consider the XML-RPC request document in Example 15.3. A program that reads this is probably primarily concerned with the content of the string element. Example 15.3 An XML-RPC Request Document<?xml version="1.0"?> <methodCall> <methodName>getQuote</methodName> <params> <param> <value><string>RHAT</string></value> </param> </params> </methodCall> To get the string element, you'll ask for the string child element of the value child element, of the param child element, of the params child element, of the root element. Rather than iterating through a list of all the child elements when there's only one of each of these, you can ask for the one you want directly using one of the getChild() methods: public Element getChild (String name ) public Element getChild (String name , Namespace namespace ) For example, Element root = document.getRootElement(); Element params = root.getChild("params"); Element param = params.getChild("param"); Element value = params.getChild("value"); Element symbol = params.getChild("string"); Or, more concisely, Element symbol = document.getRootElement() .getChild("params") .getChild("param"); .getChild("value") .getChild("string"); This method has one nasty problem. It returns only the first such child. If there's more than one child element with the specified name and namespace, you still only get the first one. This is a very real possibility in many applications, including XML-RPC; therefore, you should normally prefer getChildren() unless you've used some form of schema or DTD to verify that there's exactly one of each child you address with these methods. Similarly, you can remove a single named child element with one of the two removeChild() methods, each of which returns the removed Element , in case you want to save it for later use: public Element removeChild (String name ) public Element removeChild (String name , Namespace namespace ) The removeChild() method shares with getChild() the problem of operating on only the first such element. However, after you've removed the first child, the second child is now the first. After you've removed that one, the original third child is now the first, and so on. Thus, there is one option that doesn't work with getChild() . You can simply call removeChild() repeatedly until it returns null, indicating that there was no further such child. For example, the following code fragment removes all of the immediate note children of the Element named element : while (element.removeChild("note") != null) ; However, unlike the earlier example with removeChildren() , this is not recursive and therefore will not find note elements deeper in the tree. Getting and Setting the Text of an ElementSometimes what you want is the text of an element. For this purpose, JDOM provides these four methods: public String getText () public String getTextTrim () public String getTextNormalize () public Element setText (String text ) The getText() method returns the PCDATA content of the element. The getTextTrim() method returns pretty much the same content, except that all leading and trailing white space has been removed. The getTextNormalize() method not only strips all leading and trailing whitespace, but also converts all runs of spaces to a single space. For example, consider this street element: <street> 135 Airline Highway </street> For this element, getText() returns " 135 Airline Highway " with the white space unchanged. However, getTextTrim() returns " 135 Airline Highway ," and getTextNormalize() returns " 135 Airline Highway ." You will need to decide at the application level which one you want. This is trickier than you might think at first glance. For example, consider this street element: <street>135<!-- The building doesn't actually have a number. It's next door to 133 -->Airline Highway</street> getText() returns "135Airline Highway." It ignores comments and processing instructions as if they weren't there. For the most part, that seems reasonable. Now consider this street element: <street>135 Airline Highway <apartment>2B</apartment></street> getText() returns " 135 Airline Highway ." The content in the child apartment element is lost completely. This is really not a good thing. (I argued about this in the JDOM group , but I lost.) Before you can reliably use any of the getText() , getTextTrim() , or getTextNormalize() methods, you need to be very sure that the element does not have any child elements. One way to do this is to test if the number of child elements is zero before invoking the text getter. For example, if (element.getChildren().size() == 0) { String result = element.getText(); // work with result ... } else { // do something more complex ... } An alternative is to write your own method that recursively descends through the element, accumulating all of its text. I'll demonstrate this in the section on the Text Class later in this chapter. Do not use any of these getter methods unless you have first validated the document against a DTD or schema that explicitly requires the element only to contain #PCDATA. Do not assume that you "know" that this is true in your domain without individually testing each document. Invariably, sooner or later, you will encounter a document that purports to adhere to the implicit schema, and indeed is very close to it, but does not quite match what you were assuming . Explicit validation is necessary. The setText() method is a little less fraught with pitfalls. You can set the text content of any element to whatever text you desire . For example, the following code fragment sets the text of the street element to the string "3520 Airline Drive": street.setText("3520 Airline Drive"); This completely wipes out any existing content the element has: child elements, descendants, comments, processing instructions, other text, and so on. If you just want to append the string to the existing text, then use the addContent() method instead. Getting Child TextOne common pattern in XML documents is an element that contains only other elements, all of which contain only PCDATA, such as this channel element from Slashdot's RSS file: <channel> <title>Slashdot: News for nerds, stuff that matters</title> <link>http://slashdot.org/</link> <description>News for nerds, stuff that matters</description> </channel> Given such an element, JDOM provides six convenience methods for extracting the text, the trimmed text, and the normalized text from these child elements: public String getChildText (String name ) public String getChildText (String name , Namespace namespace ) public String getChildTextTrim (String name ) public String getChildTextTrim (String name, Namespace namespace ) public String getChildTextNormalize (String name ) public String getChildTextNormalize (String name, Namespace namespace ) For example, assuming that the Element object channel represents the just-mentioned channel element, this code fragment retrieves the content of the title , link , and description elements: String title = channel.getChildText("title"); String description = channel.getChildText("description"); String link = channel.getChildText("link"); There are two things I really don't like about these methods. First, like the getText() , getTextTrim() , and getTextNormalize() methods, they all fail unexpectedly and silently if any of the child elements unexpectedly contain child elements. For example, the preceding code fragment fails massively if Slashdot changes its format and instead begins distributing content like this instead: <channel> <title> <trademark>Slashdot</trademark> <trademark>News for nerds, stuff that matters</trademark> </title> <link>http://slashdot.org/</link> <description> <trademark>News for nerds, stuff that matters</trademark> </description> </channel> Second, these methods fail unexpectedly and silently if the any of the child elements are repeated. For example, suppose instead that the channel element has three link children, like this: <channel> <title>Slashdot: News for nerds, stuff that matters</title> <link>http://slashdot.org/</link> <link>http://www.slashdot.org/</link> <link>http://slashdot.com/</link> <description>News for nerds, stuff that matters</description> </channel> All three methods return the text from the first link element, and neglect to inform the client program that there are more it may be interested in. As with getText() , getTextTrim() , and getTextNormalize() , do not use any of these methods without first validating the document against a DTD or schema that explicitly requires the child elements only to contain #PCDATA and to occur exactly once each in each parent element. FiltersYou can pass an org.jdom.filter.Filter object to the getContent() method to limit the content returned by the method. This interface, shown in Example 15.4, determines whether an object can be added to, removed from, or included in a particular list. For the purposes of navigation and search, only the matches() method really matters. It determines whether or not any particular object is included in the List returned by getContent() . The canAdd() and canRemove() methods test whether a particular object can be added to or removed from the list. However, in the two default implementations of this class in ElementFilter and ContentFilter , both of these methods just call matches() . Example 15.4 The JDOM Filter Interfacepackage org.jdom.filter; public interface Filter { public boolean canAdd(Object o); public boolean canRemove(Object o); public boolean matches(Object o); } The org.jdom.filter package includes two implementations of this interface, ContentFilter (Example 15.5) and ElementFilter (Example 15.6). The ContentFilter class allows you to specify the visibility of different JDOM node types such as ProcessingInstruction and Text . The ElementFilter class allows you to select elements with certain names or namespaces. Finally, you can write your own custom implementations that filter according to application-specific criteria. Example 15.5 The JDOM ContentFilter Classpackage org.jdom.filter; public class ContentFilter implements Filter { public static final int ELEMENT = 1; public static final int CDATA = 2; public static final int TEXT = 4; public static final int COMMENT = 8; public static final int PI = 16; public static final int ENTITYREF = 32; public static final int DOCUMENT = 64; protected int filterMask; public ContentFilter(); public ContentFilter(boolean allVisible); public ContentFilter(int mask); public int getFilterMask(); public void setFilterMask(int mask); public void setDefaultMask(); public void setDocumentContent(); public void setElementContent(); public void setElementVisible(boolean visible); public void setCDATAVisible(boolean visible) public void setTextVisible(boolean visible); public void setCommentVisible(boolean visible); public void setPIVisible(boolean visible); public void setEntityRefVisible(boolean visible); public boolean canAdd(Object o); public boolean canRemove(Object o); public boolean matches(Object o); public boolean equals(Object o); } For example, suppose your application only needs to concern itself with elements and text, but can completely skip all comments and processing instructions. You can simplify the code by using an appropriately configured ContentFilter . The most convenient approach is to construct a filter that filters out all nodes by passing false to the constructor, and then turn on only the types you want to let through, as follows : // Filter out everything by default Filter filter = new ContentFilter(false); // Allow elements through the filter filter.setElementVisible(true); // Allow text nodes through the filter filter.setTextVisible(true); You'll need to pass filter to getContent() every time you call it, like so: Filter filter; // set up in constructor public static void process(Element element) { List children = element.getContent(filter); Iterator iterator = children.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); if (o instanceof Element) { Element child = (Element) o; process(element); } else {// Due to filter, the only other possibility is Text Text text = (Text) o; handleText(text); } } } Generally, you will want to allow elements to pass the filter, even if you're only looking at other things like Text . In JDOM, recursing through the Element objects is the only way to search a complete tree. If you filter out the Element s, then you won't be able to go more than one level deep from where you start. If you only want to select elements, you can use an ElementFilter instead. This can be set up to select all elements, elements with a certain name, elements in a certain namespace, or elements with a certain name in a certain namespace. Example 15.6 The JDOM ElementFilter Classpackage org.jdom.filter; public class ElementFilter implements Filter { protected String name; protected Namespace namespace; public ElementFilter(); public ElementFilter(String name); public ElementFilter(Namespace namespace); public ElementFilter(String name, Namespace namespace); public boolean canAdd(Object o); public boolean canRemove(Object o); public boolean matches(Object o); public boolean equals(Object o); } For example, the following code fragment uses an ElementFilter to create a List named content that only contains XSLT elements: Namespace xslt = Namespace.getNamespace( "http://www.w3.org/1999/XSL/Transform"); Filter filter = new ElementFilter(xslt); List content = element.getContent(filter); Once again, however, this method proves to be less generally useful than the DOM equivalents, because the getContent() method only returns children, not all descendants. For example, you couldn't really use this to select the XSLT elements or the non-XSLT elements in a stylesheet, because each type can appear as children of the other type. Filters also work in the Document class, pretty much the same way as they work in the Element class. For example, suppose you want to find all the processing instructions in the Document object doc outside the root element. The following code fragment creates a List containing those: // Filter out everything by default Filter pisOnly = new ContentFilter(false); // Allow processing instructions through the filter pisOnly.setPIVisible(true); // Get the content List pis = doc.getContent(pisOnly); If you want something a little more useful, such as a filter that selects all xml-stylesheet processing instructions in the prolog only, then you need to write a custom implementation of Filter . Example 15.7 demonstrates . Example 15.7 A Filter for xml-stylesheet Processing Instructions in the Prologimport org.jdom.filter.Filter; import org.jdom.*; import java.util.List; public class StylesheetFilter implements Filter { // This filter is read-only. Nothing can be added or removed. public boolean canAdd(Object o) { return false; } public boolean canRemove(Object o) { return false; } public boolean matches(Object o) { if (o instanceof ProcessingInstruction) { ProcessingInstruction pi = (ProcessingInstruction) o; if (pi.getTarget().equals("xml-stylesheet")) { // Test to see if we're outside the root element if (pi.getParent() == null) { Document doc = pi.getDocument(); Element root = doc.getRootElement(); List content = doc.getContent(); if (content.indexOf(pi) < content.indexOf(root)) { // In prolog return true; } } } } return false; } } Adding and Removing ChildrenYou can append any legal node to an Element using the six-way overloaded addContent() methods: public Element addContent (String s ) public Element addContent (Text text ) throws IllegalAddException public Element addContent (Element element ) throws IllegalAddException public Element addContent (ProcessingInstruction instruction ) throws IllegalAddException public Element addContent (EntityRef ref ) throws IllegalAddException public Element addContent (Comment comment ) throws IllegalAddException These methods append their argument to the child list of Element . Except for addContent(String) , they all throw an IllegalAddException if the argument already has a parent element. (The addContent(String) method is just a convenience that creates a new Text node behind the scenes. It does not actually add a String object to the content list.) All return the same Element object that invoked them, which allows for convenient chaining. Each of these methods adds the new node to the end of the Element list. To insert a node in a different position, you'll have to retrieve the List object itself. For example, the following code fragment creates the same channel element by inserting all the child nodes in reverse order at the beginning of the list using the add(int index, Object o) method: Element channel = new Element("channel"); Element link = new Element("link"); Element description = new Element("description"); Element title = new Element("title"); title.setText("Slashdot"); link.setText("http://slashdot.org/"); description.setText("News for nerds"); List content = channel.getContent(); content.add(0, description); content.add(0, link); content.add(0, title); There are six removeContent() methods that remove a node from the list, wherever it resides: public Element removeContent (Text text ) public Element removeContent (CDATA cdata ) public Element removeContent (Element element ) public Element removeContent (ProcessingInstruction instruction ) public Element removeContent (EntityRef ref ) public Element removeContent (Comment comment ) Alternatively, you can retrieve the List from the Element with getContent() and remove elements by position using the list's remove() and removeAll() methods, although doing so is relatively rare. In most cases, you have or can easily obtain a reference to the specific node you want to remove. For example, the following code fragment deletes the first link child element of the channel element: channel.removeChild(channel.getChild("link")); There currently is no method to remove all of the content from an Element . Instead, just pass null to setContent() . That is, element.setContent(null); Parents and AncestorsSo far we've focused on moving down the tree using methods that return children and recursion, but JDOM can move up the tree as well. [2] As with the child-returning methods, you can only jump one level at a time; that is, you can only get the parent directly. To get other ancestor elements, you need to ask for the parent's parent, the parent of the parent's parent, and so forth, until eventually you find an element whose parent is null, which is of course the root of the tree.
Each Element object has zero or one parents. If the Element is the root element of the document (or at least the root of the tree, in the event that the Element is not currently part of a Document ), then this parent is null. Otherwise, it is another Element object. JDOM does not consider the owner document to be the parent of the root element. The following three methods enable you to determine whether or not an Element object represents a root element, and what its parent and owner document are: public Document getDocument () public boolean isRootElement () public Element getParent () Unlike DOM Element s, JDOM Element s are not irrevocably tied to their owner document. An Element may be in no document at all (in which case getDocument() returns null); and it may be moved from one document to another. However, a JDOM Element cannot have more than one parent at a time. Before you can move an element to a different Document or to a different position in the same Document , you must first detach it from its current parent by invoking the detach() method: public Element detach () After you've called detach() , you are free to add the Element to any other Element or Document . Example 15.8 loads the XML document at http://www.slashdot.org/slashdot.rdf, detaches all the link elements from that document, and inserts them in a new linkset element, which it then outputs. Without the call to detach() , this would fail with an IllegalAddException . Example 15.8 Moving Elements between Documentsimport org.jdom.*; import org.jdom.input.SAXBuilder; import org.jdom.output.XMLOutputter; import java.io.IOException; import java.util.*; public class Linkset { public static void main(String[] args) { String url = "http://www.slashdot.org/slashdot.rdf"; try { SAXBuilder parser = new SAXBuilder(); // Parse the document Document document = parser.build(url); Element oldRoot = document.getRootElement(); Element newRoot = new Element("linkset"); List content = oldRoot.getChildren(); Iterator iterator = content.iterator(); while (iterator.hasNext()) { Object next = iterator.next(); Element element = (Element) next; Element link = element.getChild("link", Namespace.getNamespace( "http://my.netscape.com/rdf/simple/0.9/")); link.detach(); newRoot.addContent(link); } XMLOutputter outputter = new XMLOutputter(" ", true); outputter.output(newRoot, System.out); } catch (JDOMException e) { System.out.println(url + " is not well-formed."); } catch (IOException e) { System.out.println( "Due to an IOException, the parser could not read " + url ); } } // end main } As usual, this only affects the JDOM Document object in memory. It has no effect on the original document read from the remote URL. Another natural limitation is that an element cannot be its own parent or ancestor, directly or indirectly. Trying to add an element where it would violate this restriction throws an IllegalAddException . You can test whether one element is an ancestor of another using the isAncestor() method: public boolean isAncestor (Element element ) AttributesThe Element class has 13 methods that read and write the values of the various attributes of the element. Except for certain unusual cases (mostly involving attribute types), these 13 methods are all you need to handle attributes. You rarely need to concern yourself with the Attribute class directly. public Attribute getAttribute (String name ) public Attribute getAttribute (String name , Namespace namespace ) public String getAttributeValue (String name ) public String getAttributeValue (String name , Namespace namespace ) public String getAttributeValue (String name, String default ) public String getAttributeValue (String name , Namespace namespace, String default ) public Element setAttributes (List attributes ) throws IllegalAddException public Element setAttribute (String name, String value ) throws IllegalNameException, IllegalDataException public Element setAttribute (String name, String value, Namespace namespace ) throws IllegalNameException, IllegalDataException public Element setAttribute (Attribute attribute ) throws IllegalAddException public boolean removeAttribute (String name, String value ) public boolean removeAttribute (String name, Namespace namespace ) public boolean removeAttribute (Attribute attribute ) These methods all follow the same basic rules. If an attribute is in a namespace, specify the local name and namespace to access it. If the attribute is not in a namespace, then only use the name. The setters must also specify the value to set the attribute to. The getters optionally may specify a default value used if the attribute is not found. Alternately, you can use an Attribute object to replace all of these. Most of the time, however, strings are more convenient. The getAttributeValue() methods all return the String value of the attribute. If the attribute was read by a parser, then the value will be normalized according to its type. However, attributes added in memory with setAttribute( ) and its ilk will not be normalized. The setter methods all return the Element object itself so that the objects can be used in a chain. The remove methods all return a boolean true if the attribute was removed, false if it wasn't. As with most other constructs, JDOM checks all of the attributes you set for well- formedness and throws an exception if anything looks amiss. In particular, it verifies the following:
For example, suppose you want to process a RDDL document to find resources related to a particular namespace URI. Each of these is enclosed in a rddl:resource element like this one from the RDDL specification itself: <rddl:resource xlink:type="simple" xlink:title="RDDL Natures" xlink:role="http://www.rddl.org/" xlink:arcrole="http://www.rddl.org/purposes#directory" xlink:href="http://www.rddl.org/natures" > <div class="resource"> <p>It is anticipated that many related-resource natures will be well known. A list of well-known natures may be found in the RDDL directory <a href= "http://www.rddl.org/natures">http://www.rddl.org/natures</a>. </p> </div> </rddl:resource> All of the information required to locate the resources is included in the attributes of the rddl:resource elements. The rest of the content in the document is relevant only to a browser showing the document to a human reader. Most software will want to read the rddl:resource elements and ignore the rest of the document. Example 15.9 is such a program. It searches a document for related resources and outputs an HTML table containing their information. The xlink: href attribute becomes an HTML hyperlink. The other URLs in the xlink:role and xlink:arcrole attributes are purely descriptive (like namespace URLs) and not intended to be resolved, so they're merely output as plain text. Example 15.9 Searching for RDDL Resourcesimport org.jdom.*; import org.jdom.input.SAXBuilder; import org.jdom.output.XMLOutputter; import java.util.*; import java.io.IOException; public class RDDLLister { public final static Namespace XLINK_NAMESPACE = Namespace.getNamespace("xl", "http://www.w3.org/1999/xlink"); public final static String RDDL_NAMESPACE = "http://www.rddl.org/"; public static void main(String[] args) { if (args.length <= 0) { System.out.println("Usage: java RDDLLister url"); return; } SAXBuilder builder = new SAXBuilder(); try { // Prepare the output document Element html = new Element("html"); Element body = new Element("body"); Element table = new Element("table"); html.addContent(body); body.addContent(table); Document output = new Document(html); // Read the entire document into memory Document doc = builder.build(args[0]); Element root = doc.getRootElement(); processElement(root, table); // Serialize the output document XMLOutputter outputter = new XMLOutputter(" ", true); outputter.output(output, System.out); } catch (JDOMException e) { System.err.println(e); } catch (IOException e) { System.err.println(e); } } // end main public static void processElement(Element input, Element output) { if (input.getName().equals("resource") && input.getNamespaceURI().equals(RDDL_NAMESPACE)) { String href = input.getAttributeValue("href", XLINK_NAMESPACE); String title = input.getAttributeValue("title", XLINK_NAMESPACE); String role = input.getAttributeValue("role", XLINK_NAMESPACE); String arcrole = input.getAttributeValue("arcrole", XLINK_NAMESPACE); // Wrap this up in a table row Element tr = new Element("tr"); Element titleCell = new Element("td"); titleCell.setText(title); tr.addContent(titleCell); Element hrefCell = new Element("td"); Element a = new Element("a"); a.setAttribute("href", href); a.setText(href); hrefCell.addContent(a); tr.addContent(hrefCell); Element roleCell = new Element("td"); roleCell.setText(role); tr.addContent(roleCell); Element arcroleCell = new Element("td"); arcroleCell.setText(arcrole); tr.addContent(arcroleCell); output.addContent(tr); } // Recurse List content = input.getContent(); Iterator iterator = content.iterator(); while (iterator.hasNext()) { Object o = iterator.next(); if (o instanceof Element) { processElement((Element) o, output); } } // end while } } The main() method builds the general outline of a well-formed HTML document and then parses the input RDDL document in the usual fashion. It retrieves the root element with getRootElement() and then passes this root element and the table element to the processElement() method. First processElement() checks to see if the element is a rddl:resource element. If it is, then processElement() extracts the four XLink attributes using getAttributeValue() . Each of these is then inserted in a td element, which is appended to a tr element, which is added to the table element. The setAttribute() method attaches an href attribute to the a element that defines the HTML link. Finally, the processElement() method is invoked on all child elements of the current elements to find any rddl:resource elements that are deeper down the tree. Following is the beginning output when I ran this program against the RDDL specification itself: D:\books\XMLJAVA> java RDDLLister http://www.rddl.org <?xml version="1.0" encoding="UTF-8"?> <html> <body> <table> <tr> <td>RDDL Natures</td> <td> <a href="http://www.rddl.org/natures"> http://www.rddl.org/natures</a> </td> <td>http://www.rddl.org/</td> <td>http://www.rddl.org/purposes#directory</td> </tr> <tr> <td>RDDL Purposes</td> <td> <a href="http://www.rddl.org/purposes"> http://www.rddl.org/purposes</a> </td> <td>http://www.rddl.org/</td> <td>http://www.rddl.org/purposes#directory</td> </tr> ... |