The Element Class | Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX

The structure of an XML document is based on its elements, so it should come as no surprise that the Element class is one of the larger and more important classes in JDOM. Because JDOM has no generic Node class or interface, the Element class is the primary means by which a program navigates the tree to find particular content.

Each Element object has the following seven basic properties:

Local Name

A String that is initialized when the Element is constructed , and which can never be null or the empty string. It is accessible through the setName() and getName() methods :
 public Element  setName  (String  name  ) throws IllegalNameException  public String  getName  () 
Namespace

A Namespace object that encapsulates both the namespace URI and an optional prefix. This can be the named constant Namespace.NO_NAMESPACE if the element does not have a namespace. A namespace is always set when the Element is constructed, but it can be changed by setNamespace() . It can be read by the getNamespace() method.
 public Element  setNamespace  (Namespace  namespace  )  public String  getNamespace  () 
Content

A List with no duplicates that contains all of the element's children in order. This is accessible through the getContent() and setContent() methods. The list is live; therefore, you can change the contents of the Element using the methods of the List class.
 public Element  setContent  (List  list  ) throws IllegalAddException  public List  getContent  () 
In addition, you can add or remove individual nodes from the list via the addContent() and removeContent() methods.

Parent

The parent Element that contains this Element . It will be null if this is the root element, and may be null if this Element is not currently part of a Document . This is accessible through the getParent() method:
 public Element  getParent  () 
You can change the parent only by adding the Element to a new parent using the parent's addContent() method. This is possible only if the Element does not already have a parent. Before a parent can adopt a child Element , the child's detach() method must be invoked to remove it from its current parent:
 public Element  detach  () 
Owner Document

The Document that contains this Element . It will be null if this Element is not currently part of a Document . It is possible to read it through the getDocument() method:
 public Document  getDocument  () 
You can change it by adding the Element to a new document after first detaching it from its previous parent with the detach() method.

Attributes

A List containing Attribute objects, one for each of the element's attributes. Although JDOM stores attributes in a list for convenience, order is not significant, and is not likely to be the same as the order in which the attributes appeared in the original document. The list is accessible through the getAttributes() and setAttributes() methods:
 public Element  setAttributes  (List  attributes  )  throws IllegalAddException public List  getAttributes  () 
You can read and modify the items in this list via the getAttribute() , getAttributeValue() , and setAttribute() methods. Attributes that declare namespaces are not included in this list.

Additional Namespaces

A List that contains Namespace objects, one for each additional namespace prefix declared by the element (that is, other than those that declare the namespace of the element and the namespaces of its attributes). As with the list of attributes, order is not significant. The entire list is accessible through the getAdditionalNamespaces() method:
 public List  getAdditionalNamespaces  () 
You can add and remove namespaces from the list using the addNamespaceDeclaration() , and removeNamespaceDeclaration() methods:
 public Element  addNamespaceDeclaration  (Namespace  namespace  )  public Element  removeNamespaceDeclaration  (Namespace  namespace  ) 

There are additional properties that are not independent of the above seven. For example, the prefix, namespace URI, and fully qualified name are separately readable through the getNamespaceURI() , getNamespacePrefix() , and getQualifiedName() convenience methods:

 public String  getNamespaceURI  () public String  getNamespacePrefix  () public String  getQualifiedName  ()

These simply return the relevant parts of the element's namespace and name.

All of these getter methods behave pretty much like any other getter methods. That is, they return an object of the relevant type, generally a String , and do not throw any exceptions. The setter methods are more unusual, however. This is one of the few areas in which JDOM does not follow standard Java conventions. Instead of returning void , these methods all return the Element object that invoked the method. That is, a.set Foo (b) returns a . Many other methods you naturally would expect to return void also do this. The purpose is to allow setters to be chained. For example, the following code fragment can build up an entire channel element in just a couple of statements:

 Element channel = (new Element("channel"))  .addContent((new Element("title")).setText("Cafe con Leche")) .addContent((new Element("link"))  .setText("http://www.cafeconleche.org/")) .addContent((new Element("description"))  .setText("XML News"));

Caution

I must say that I personally don't find this style of code easier to write or read than the multi-statement approach. However, this is why the adder and setter methods all return the object that did the adding or setting, so I felt compelled to show it to you. But I really recommend strongly that you don't use it.

Constructors

The four public Element constructors all require you to specify a local name as a String . If the element is in a namespace, then you also need to specify the namespace URI as a String or a Namespace object. Alternatively, you can specify the prefix as a String or a piece of a Namespace object.

 public  Element  (String  localName  ) throws IllegalNameException public  Element  (String  localName,  Namespace  namespace  )  throws IllegalNameException public  Element  (String  localName,  String  namespaceURI  )  throws IllegalNameException public  Element  (String  localName,  String  prefix,  String  namespaceURI  ) throws IllegalNameException

For example, this code fragment creates four Element objects using the various constructors:

 Element xmlRPCRoot = new Element("methodCall");  Element xhtmlRoot = new Element("html",  "http://www.w3.org/1999/xhtml"); Element soapRoot = new Element("Envelope", "SOAP-ENV",  "http://schemas.xmlsoap.org/soap/envelope/"); Namespace xsd = Namespace.getNamespace("xsd",  "http://www.w3.org/2001/XMLSchema"); Element schemaRoot = new Element("schema", xsd);

Navigation and Search

As you learned in Chapter 14, the getContent() method is the fundamental means of navigating through an XML document with JDOM. This method returns a live List that includes all the children of an element, including comments, processing instructions, text nodes, and elements. To search deeper, you apply getContent() to the child elements of the current element, normally through recursion.

Example 15.2 is a simple program that walks the XML document tree, starting at the root element, and prints out the content of the various properties of each element. This is not the most interesting program in the book, but it does demonstrate all of the major getter methods and basic navigation. Pay special attention to the process() method, as you will need to write a method very much like this for any JDOM program that needs to search an entire XML document. It begins with an Element (normally the root element) and recursively applies itself to all of the child elements of the root element. The instanceof operator tests each object in the content list of the Element to determine its type and dispatch it to the right method. Here, TreePrinter dispatches Element objects to the process() method recursively, ignoring all other objects.

Example 15.2 Inspecting Elements

 import org.jdom.*; import org.jdom.input.SAXBuilder; import java.io.IOException; import java.util.*; public class TreePrinter {   // Recursively descend the tree   public static void process(Element element) {     inspect(element);     List content = element.getContent();     Iterator iterator = content.iterator();     while (iterator.hasNext()) {       Object o = iterator.next();       if (o instanceof Element) {         Element child = (Element) o;         process(child);       }     }   }   // Print the properties of each element   public static void inspect(Element element) {     if (!element.isRootElement()) {       // Print a blank line to separate it from the previous       // element.       System.out.println();     }     String qualifiedName = element.getQualifiedName();     System.out.println(qualifiedName + ":");     Namespace namespace = element.getNamespace();     if (namespace != Namespace.NO_NAMESPACE) {       String localName = element.getName();       String uri = element.getNamespaceURI();       String prefix = element.getNamespacePrefix();       System.out.println("  Local name: " + localName);       System.out.println("  Namespace URI: " + uri);       if (!"".equals(prefix)) {         System.out.println("  Namespace prefix: " + prefix);       }     }     List attributes = element.getAttributes();     if (!attributes.isEmpty()) {       Iterator iterator = attributes.iterator();       while (iterator.hasNext()) {         Attribute attribute = (Attribute) iterator.next();         String name = attribute.getName();         String value = attribute.getValue();         Namespace attributeNamespace = attribute.getNamespace();         if (attributeNamespace == Namespace.NO_NAMESPACE) {           System.out.println("  " + name + "=\"" + value + "\"");         }         else {           String prefix = attributeNamespace.getPrefix();           System.out.println(            "  " + prefix + ":" + name + "=\"" + value + "\"");         }       }     }     List namespaces = element.getAdditionalNamespaces();     if (!namespaces.isEmpty()) {       Iterator iterator = namespaces.iterator();       while (iterator.hasNext()) {         Namespace additional = (Namespace) iterator.next();         String uri = additional.getURI();         String prefix = additional.getPrefix();           System.out.println(            "  xmlns:" + prefix + "=\"" + uri + "\"");       }     }   }   public static void main(String[] args) {     if (args.length <= 0) {       System.out.println("Usage: java TreePrinter URL");       return;     }     String url = args[0];     try {       SAXBuilder parser = new SAXBuilder();       // Parse the document       Document document = parser.build(url);       // Process the root element       process(document.getRootElement());     }     catch (JDOMException e) {       System.out.println(url + " is not well-formed.");     }     catch (IOException e) {       System.out.println(        "Due to an IOException, the parser could not encode " + url       );     }   } // end main }

Following is the beginning of output when I fed this chapter's XML source code into TreePrinter . DocBook doesn't use namespaces, but the XInclude elements do. The root element has some attributes, but most of the structure is based on element name alone.

 D:\books\XMLJAVA>  java TreePrinter jdom_model.xml  chapter:   revision="20020430"   status="rough"   id="ch_jdom_model"   xmlns:xinclude="http://www.w3.org/2001/XInclude" title: para: para: itemizedlist: listitem: para: classname: ...

While in theory you could navigate and query a document using only the List objects returned by getContent() , JDOM provides many methods to simplify the process for special cases, some of which include methods that

Return lists containing child elements only
Return particular named child elements
Return the complete text of an element
Return the text of a child element
Remove children identified by name and reference
Return the first child of an element

Child Elements

The Element class has two methods (five total when you count overloaded variants separately) that operate only on the child elements of an element, not on other content such as processing intructions and text nodes. These are getChildren() and removeChildren() :

 public List  getChildren  () public List  getChildren  (String  name  ) public List  getChildren  (String  name  , Namespace  namespace  ) public List  removeChildren  (String  name  ) public List  removeChildren  (String  name  , Namespace  namespace  )

These methods are similar to getContent() and removeContent() except that the lists returned only contain child elements, never other kinds of children such as comments and processing instructions. ^[1] The getChildren() methods simply ignore nonelements. For example, the earlier ElementLister in Example 14.9 only considered elements. Consequently, it could use the getChildren() method instead of getContent() :

^[1] The name is a little misleading. An earlier beta version called these methods getChildElements() and removeChildElements() , much better names in my opinion.

 public static void process(Element element) {   inspect(element);   List content = element.getChildren();   Iterator iterator = content.iterator();   while (iterator.hasNext()) {     Object o = iterator.next();     Element child = (Element) o;     process(child);   } }

This eliminates one instanceof check and one if block. This is not a huge savings, I admit; but the code is marginally more readable. However, because JDOM uses Java's Object -based List class, you still have to cast all of the items in the list that getChildren() returns to Element .

The removeChildren() methods remove all of the elements that match the specified name and namespace URI. If no namespace URI is given, then it removes elements with the given name in no namespace. Other contentcomments, processing instructions, text, and so onis not touched.

For example, the following method recursively descends through an element, cutting out all of the note elements:

 public static void cutNotes(Element element) {   List notes = element.getChildren("note");   element.removeChildren(notes);   // The element's children have changed so we have to call   // getChildren() again   List children = element.getChildren();   Iterator iterator = children.iterator();   while (iterator.hasNext()) {     Object o = iterator.next();     Element child = (Element) o;     cutNotes(child);   } }

It's important to remember that when an element is removed, the entire element is removed, not just its start-tags and end-tags. Any content inside the element is lost, including, in this case, elements that aren't named note .

Single Children

Often you want to follow a very specific path through a document. Consider the XML-RPC request document in Example 15.3. A program that reads this is probably primarily concerned with the content of the string element.

Example 15.3 An XML-RPC Request Document

 <?xml version="1.0"?> <methodCall>   <methodName>getQuote</methodName>   <params>     <param>       <value><string>RHAT</string></value>     </param>   </params> </methodCall>

To get the string element, you'll ask for the string child element of the value child element, of the param child element, of the params child element, of the root element. Rather than iterating through a list of all the child elements when there's only one of each of these, you can ask for the one you want directly using one of the getChild() methods:

 public Element  getChild  (String  name  ) public Element  getChild  (String  name  , Namespace  namespace  )

For example,

 Element root   = document.getRootElement();  Element params = root.getChild("params"); Element param  = params.getChild("param"); Element value  = params.getChild("value"); Element symbol = params.getChild("string");

Or, more concisely,

 Element symbol = document.getRootElement()                    .getChild("params")                   .getChild("param");                   .getChild("value")                   .getChild("string");

This method has one nasty problem. It returns only the first such child. If there's more than one child element with the specified name and namespace, you still only get the first one. This is a very real possibility in many applications, including XML-RPC; therefore, you should normally prefer getChildren() unless you've used some form of schema or DTD to verify that there's exactly one of each child you address with these methods.

Similarly, you can remove a single named child element with one of the two removeChild() methods, each of which returns the removed Element , in case you want to save it for later use:

 public Element  removeChild  (String  name  )  public Element  removeChild  (String  name  , Namespace  namespace  )

The removeChild() method shares with getChild() the problem of operating on only the first such element. However, after you've removed the first child, the second child is now the first. After you've removed that one, the original third child is now the first, and so on. Thus, there is one option that doesn't work with getChild() . You can simply call removeChild() repeatedly until it returns null, indicating that there was no further such child. For example, the following code fragment removes all of the immediate note children of the Element named element :

 while (element.removeChild("note") != null) ;

However, unlike the earlier example with removeChildren() , this is not recursive and therefore will not find note elements deeper in the tree.

Getting and Setting the Text of an Element

Sometimes what you want is the text of an element. For this purpose, JDOM provides these four methods:

 public String  getText  ()  public String  getTextTrim  () public String  getTextNormalize  () public Element  setText  (String  text  )

The getText() method returns the PCDATA content of the element. The getTextTrim() method returns pretty much the same content, except that all leading and trailing white space has been removed. The getTextNormalize() method not only strips all leading and trailing whitespace, but also converts all runs of spaces to a single space. For example, consider this street element:

 <street> 135  Airline  Highway </street>

For this element, getText() returns " 135 Airline Highway " with the white space unchanged. However, getTextTrim() returns " 135 Airline Highway ," and getTextNormalize() returns " 135 Airline Highway ." You will need to decide at the application level which one you want.

This is trickier than you might think at first glance. For example, consider this street element:

 <street>135<!-- The building doesn't actually have a number.                  It's next door to 133 -->Airline Highway</street>

getText() returns "135Airline Highway." It ignores comments and processing instructions as if they weren't there. For the most part, that seems reasonable.

Now consider this street element:

 <street>135 Airline Highway <apartment>2B</apartment></street>

getText() returns " 135 Airline Highway ." The content in the child apartment element is lost completely. This is really not a good thing. (I argued about this in the JDOM group , but I lost.) Before you can reliably use any of the getText() , getTextTrim() , or getTextNormalize() methods, you need to be very sure that the element does not have any child elements. One way to do this is to test if the number of child elements is zero before invoking the text getter. For example,

 if (element.getChildren().size() == 0) {    String result = element.getText();    // work with result ... } else {   // do something more complex ... }

An alternative is to write your own method that recursively descends through the element, accumulating all of its text. I'll demonstrate this in the section on the Text Class later in this chapter.

Do not use any of these getter methods unless you have first validated the document against a DTD or schema that explicitly requires the element only to contain #PCDATA. Do not assume that you "know" that this is true in your domain without individually testing each document. Invariably, sooner or later, you will encounter a document that purports to adhere to the implicit schema, and indeed is very close to it, but does not quite match what you were assuming . Explicit validation is necessary.

The setText() method is a little less fraught with pitfalls. You can set the text content of any element to whatever text you desire . For example, the following code fragment sets the text of the street element to the string "3520 Airline Drive":

 street.setText("3520 Airline Drive");

This completely wipes out any existing content the element has: child elements, descendants, comments, processing instructions, other text, and so on. If you just want to append the string to the existing text, then use the addContent() method instead.

Getting Child Text

One common pattern in XML documents is an element that contains only other elements, all of which contain only PCDATA, such as this channel element from Slashdot's RSS file:

 <channel>    <title>Slashdot: News for nerds, stuff that matters</title>   <link>http://slashdot.org/</link>   <description>News for nerds, stuff that matters</description> </channel>

Given such an element, JDOM provides six convenience methods for extracting the text, the trimmed text, and the normalized text from these child elements:

 public String  getChildText  (String  name  )  public String  getChildText  (String  name  , Namespace  namespace  ) public String  getChildTextTrim  (String  name  ) public String  getChildTextTrim  (String  name,  Namespace  namespace  ) public String  getChildTextNormalize  (String  name  ) public String  getChildTextNormalize  (String  name,  Namespace  namespace  )

For example, assuming that the Element object channel represents the just-mentioned channel element, this code fragment retrieves the content of the title , link , and description elements:

 String title = channel.getChildText("title");  String description = channel.getChildText("description"); String link = channel.getChildText("link");

There are two things I really don't like about these methods. First, like the getText() , getTextTrim() , and getTextNormalize() methods, they all fail unexpectedly and silently if any of the child elements unexpectedly contain child elements. For example, the preceding code fragment fails massively if Slashdot changes its format and instead begins distributing content like this instead:

 <channel>    <title>     <trademark>Slashdot</trademark>     <trademark>News for nerds, stuff that matters</trademark>   </title>   <link>http://slashdot.org/</link>   <description>     <trademark>News for nerds, stuff that matters</trademark>   </description> </channel>

Second, these methods fail unexpectedly and silently if the any of the child elements are repeated. For example, suppose instead that the channel element has three link children, like this:

 <channel>    <title>Slashdot: News for nerds, stuff that matters</title>   <link>http://slashdot.org/</link>   <link>http://www.slashdot.org/</link>   <link>http://slashdot.com/</link>   <description>News for nerds, stuff that matters</description> </channel>

All three methods return the text from the first link element, and neglect to inform the client program that there are more it may be interested in.

As with getText() , getTextTrim() , and getTextNormalize() , do not use any of these methods without first validating the document against a DTD or schema that explicitly requires the child elements only to contain #PCDATA and to occur exactly once each in each parent element.

Filters

You can pass an org.jdom.filter.Filter object to the getContent() method to limit the content returned by the method. This interface, shown in Example 15.4, determines whether an object can be added to, removed from, or included in a particular list. For the purposes of navigation and search, only the matches() method really matters. It determines whether or not any particular object is included in the List returned by getContent() . The canAdd() and canRemove() methods test whether a particular object can be added to or removed from the list. However, in the two default implementations of this class in ElementFilter and ContentFilter , both of these methods just call matches() .

Example 15.4 The JDOM Filter Interface

 package org.jdom.filter; public interface Filter {   public boolean canAdd(Object o);   public boolean canRemove(Object o);   public boolean matches(Object o); }

The org.jdom.filter package includes two implementations of this interface, ContentFilter (Example 15.5) and ElementFilter (Example 15.6). The ContentFilter class allows you to specify the visibility of different JDOM node types such as ProcessingInstruction and Text . The ElementFilter class allows you to select elements with certain names or namespaces. Finally, you can write your own custom implementations that filter according to application-specific criteria.

Example 15.5 The JDOM ContentFilter Class

 package org.jdom.filter; public class ContentFilter implements Filter {   public static final int ELEMENT   = 1;   public static final int CDATA     = 2;   public static final int TEXT      = 4;   public static final int COMMENT   = 8;   public static final int PI        = 16;   public static final int ENTITYREF = 32;   public static final int DOCUMENT  = 64;   protected int filterMask;   public ContentFilter();   public ContentFilter(boolean allVisible);   public ContentFilter(int mask);   public int  getFilterMask();   public void setFilterMask(int mask);   public void setDefaultMask();   public void setDocumentContent();   public void setElementContent();   public void setElementVisible(boolean visible);   public void setCDATAVisible(boolean visible)   public void setTextVisible(boolean visible);   public void setCommentVisible(boolean visible);   public void setPIVisible(boolean visible);   public void setEntityRefVisible(boolean visible);   public boolean canAdd(Object o);   public boolean canRemove(Object o);   public boolean matches(Object o);   public boolean equals(Object o); }

For example, suppose your application only needs to concern itself with elements and text, but can completely skip all comments and processing instructions. You can simplify the code by using an appropriately configured ContentFilter . The most convenient approach is to construct a filter that filters out all nodes by passing false to the constructor, and then turn on only the types you want to let through, as follows :

 // Filter out everything by default  Filter filter = new ContentFilter(false); // Allow elements through the filter filter.setElementVisible(true); // Allow text nodes through the filter filter.setTextVisible(true);

You'll need to pass filter to getContent() every time you call it, like so:

 Filter filter; // set up in constructor  public static void process(Element element) {   List children = element.getContent(filter);   Iterator iterator = children.iterator();   while (iterator.hasNext()) {     Object o = iterator.next();     if (o instanceof Element) {       Element child = (Element) o;       process(element);     }     else {// Due to filter, the only other possibility is Text       Text text = (Text) o;       handleText(text);     }   } }

Generally, you will want to allow elements to pass the filter, even if you're only looking at other things like Text . In JDOM, recursing through the Element objects is the only way to search a complete tree. If you filter out the Element s, then you won't be able to go more than one level deep from where you start.

If you only want to select elements, you can use an ElementFilter instead. This can be set up to select all elements, elements with a certain name, elements in a certain namespace, or elements with a certain name in a certain namespace.

Example 15.6 The JDOM ElementFilter Class

 package org.jdom.filter; public class ElementFilter implements Filter {   protected String    name;   protected Namespace namespace;   public ElementFilter();   public ElementFilter(String name);   public ElementFilter(Namespace namespace);   public ElementFilter(String name, Namespace namespace);   public boolean canAdd(Object o);   public boolean canRemove(Object o);   public boolean matches(Object o);   public boolean equals(Object o); }

For example, the following code fragment uses an ElementFilter to create a List named content that only contains XSLT elements:

 Namespace xslt = Namespace.getNamespace(                     "http://www.w3.org/1999/XSL/Transform"); Filter filter = new ElementFilter(xslt); List content = element.getContent(filter);

Once again, however, this method proves to be less generally useful than the DOM equivalents, because the getContent() method only returns children, not all descendants. For example, you couldn't really use this to select the XSLT elements or the non-XSLT elements in a stylesheet, because each type can appear as children of the other type.

Filters also work in the Document class, pretty much the same way as they work in the Element class. For example, suppose you want to find all the processing instructions in the Document object doc outside the root element. The following code fragment creates a List containing those:

 // Filter out everything by default  Filter pisOnly = new ContentFilter(false); // Allow processing instructions through the filter pisOnly.setPIVisible(true); // Get the content List pis = doc.getContent(pisOnly);

If you want something a little more useful, such as a filter that selects all xml-stylesheet processing instructions in the prolog only, then you need to write a custom implementation of Filter . Example 15.7 demonstrates .

Example 15.7 A Filter for xml-stylesheet Processing Instructions in the Prolog

 import org.jdom.filter.Filter; import org.jdom.*; import java.util.List; public class StylesheetFilter implements Filter {   // This filter is read-only. Nothing can be added or removed.   public boolean canAdd(Object o) {     return false;   }   public boolean canRemove(Object o) {     return false;   }   public boolean matches(Object o) {     if (o instanceof ProcessingInstruction) {       ProcessingInstruction pi = (ProcessingInstruction) o;       if (pi.getTarget().equals("xml-stylesheet")) {         // Test to see if we're outside the root element         if (pi.getParent() == null) {           Document doc = pi.getDocument();           Element root = doc.getRootElement();           List content = doc.getContent();           if (content.indexOf(pi) < content.indexOf(root)) {             // In prolog             return true;           }         }       }     }     return false;   } }

Adding and Removing Children

You can append any legal node to an Element using the six-way overloaded addContent() methods:

 public Element  addContent  (String  s  ) public Element  addContent  (Text  text  ) throws IllegalAddException public Element  addContent  (Element  element  )  throws IllegalAddException public Element  addContent  (ProcessingInstruction  instruction  ) throws IllegalAddException public Element  addContent  (EntityRef  ref  )  throws IllegalAddException public Element  addContent  (Comment  comment  )  throws IllegalAddException

These methods append their argument to the child list of Element . Except for addContent(String) , they all throw an IllegalAddException if the argument already has a parent element. (The addContent(String) method is just a convenience that creates a new Text node behind the scenes. It does not actually add a String object to the content list.) All return the same Element object that invoked them, which allows for convenient chaining.

Each of these methods adds the new node to the end of the Element list. To insert a node in a different position, you'll have to retrieve the List object itself. For example, the following code fragment creates the same channel element by inserting all the child nodes in reverse order at the beginning of the list using the add(int index, Object o) method:

 Element channel     = new Element("channel");  Element link        = new Element("link"); Element description = new Element("description"); Element title       = new Element("title"); title.setText("Slashdot"); link.setText("http://slashdot.org/"); description.setText("News for nerds"); List content = channel.getContent(); content.add(0, description); content.add(0, link); content.add(0, title);

There are six removeContent() methods that remove a node from the list, wherever it resides:

 public Element  removeContent  (Text  text  )  public Element  removeContent  (CDATA  cdata  ) public Element  removeContent  (Element  element  ) public Element  removeContent  (ProcessingInstruction  instruction  ) public Element  removeContent  (EntityRef  ref  ) public Element  removeContent  (Comment  comment  )

Alternatively, you can retrieve the List from the Element with getContent() and remove elements by position using the list's remove() and removeAll() methods, although doing so is relatively rare. In most cases, you have or can easily obtain a reference to the specific node you want to remove. For example, the following code fragment deletes the first link child element of the channel element:

 channel.removeChild(channel.getChild("link"));

There currently is no method to remove all of the content from an Element . Instead, just pass null to setContent() . That is,

 element.setContent(null);

Parents and Ancestors

So far we've focused on moving down the tree using methods that return children and recursion, but JDOM can move up the tree as well. ^[2] As with the child-returning methods, you can only jump one level at a time; that is, you can only get the parent directly. To get other ancestor elements, you need to ask for the parent's parent, the parent of the parent's parent, and so forth, until eventually you find an element whose parent is null, which is of course the root of the tree.

^[2] Sideways movement, for example, getting the previous or next sibling, is noticeably lacking. For this, you normally use List and Iterator .

Each Element object has zero or one parents. If the Element is the root element of the document (or at least the root of the tree, in the event that the Element is not currently part of a Document ), then this parent is null. Otherwise, it is another Element object. JDOM does not consider the owner document to be the parent of the root element. The following three methods enable you to determine whether or not an Element object represents a root element, and what its parent and owner document are:

 public Document  getDocument  () public boolean  isRootElement  () public Element  getParent  ()

Unlike DOM Element s, JDOM Element s are not irrevocably tied to their owner document. An Element may be in no document at all (in which case getDocument() returns null); and it may be moved from one document to another. However, a JDOM Element cannot have more than one parent at a time. Before you can move an element to a different Document or to a different position in the same Document , you must first detach it from its current parent by invoking the detach() method:

 public Element  detach  ()

After you've called detach() , you are free to add the Element to any other Element or Document . Example 15.8 loads the XML document at http://www.slashdot.org/slashdot.rdf, detaches all the link elements from that document, and inserts them in a new linkset element, which it then outputs. Without the call to detach() , this would fail with an IllegalAddException .

Example 15.8 Moving Elements between Documents

 import org.jdom.*; import org.jdom.input.SAXBuilder; import org.jdom.output.XMLOutputter; import java.io.IOException; import java.util.*; public class Linkset {   public static void main(String[] args) {     String url = "http://www.slashdot.org/slashdot.rdf";     try {       SAXBuilder parser = new SAXBuilder();       // Parse the document       Document document = parser.build(url);       Element oldRoot = document.getRootElement();       Element newRoot = new Element("linkset");       List content = oldRoot.getChildren();       Iterator iterator = content.iterator();       while (iterator.hasNext()) {         Object next = iterator.next();         Element element = (Element) next;         Element link = element.getChild("link",          Namespace.getNamespace(          "http://my.netscape.com/rdf/simple/0.9/"));         link.detach();         newRoot.addContent(link);       }       XMLOutputter outputter = new XMLOutputter("  ", true);       outputter.output(newRoot, System.out);     }     catch (JDOMException e) {       System.out.println(url + " is not well-formed.");     }     catch (IOException e) {       System.out.println(        "Due to an IOException, the parser could not read " + url       );     }   } // end main }

As usual, this only affects the JDOM Document object in memory. It has no effect on the original document read from the remote URL.

Another natural limitation is that an element cannot be its own parent or ancestor, directly or indirectly. Trying to add an element where it would violate this restriction throws an IllegalAddException . You can test whether one element is an ancestor of another using the isAncestor() method:

 public boolean  isAncestor  (Element  element  )

Attributes

The Element class has 13 methods that read and write the values of the various attributes of the element. Except for certain unusual cases (mostly involving attribute types), these 13 methods are all you need to handle attributes. You rarely need to concern yourself with the Attribute class directly.

 public Attribute  getAttribute  (String  name  )  public Attribute  getAttribute  (String  name  , Namespace  namespace  ) public String  getAttributeValue  (String  name  ) public String  getAttributeValue  (String  name  , Namespace  namespace  ) public String  getAttributeValue  (String  name,  String  default  ) public String  getAttributeValue  (String  name  , Namespace  namespace,  String  default  ) public Element  setAttributes  (List  attributes  )  throws IllegalAddException public Element  setAttribute  (String  name,  String  value  )  throws IllegalNameException, IllegalDataException public Element  setAttribute  (String  name,  String  value,  Namespace  namespace  ) throws IllegalNameException, IllegalDataException public Element  setAttribute  (Attribute  attribute  )  throws IllegalAddException public boolean  removeAttribute  (String  name,  String  value  ) public boolean  removeAttribute  (String  name,  Namespace  namespace  ) public boolean  removeAttribute  (Attribute  attribute  )

These methods all follow the same basic rules. If an attribute is in a namespace, specify the local name and namespace to access it. If the attribute is not in a namespace, then only use the name. The setters must also specify the value to set the attribute to. The getters optionally may specify a default value used if the attribute is not found. Alternately, you can use an Attribute object to replace all of these. Most of the time, however, strings are more convenient.

The getAttributeValue() methods all return the String value of the attribute. If the attribute was read by a parser, then the value will be normalized according to its type. However, attributes added in memory with setAttribute( ) and its ilk will not be normalized. The setter methods all return the Element object itself so that the objects can be used in a chain. The remove methods all return a boolean true if the attribute was removed, false if it wasn't.

As with most other constructs, JDOM checks all of the attributes you set for well- formedness and throws an exception if anything looks amiss. In particular, it verifies the following:

The local name must be a noncolonized name.
The value must not contain any illegal characters such as null or the byte order mark.
The attribute must not be a namespace declaration such as xmlns or xmlns: prefix ( JDOM stores these separately).

For example, suppose you want to process a RDDL document to find resources related to a particular namespace URI. Each of these is enclosed in a rddl:resource element like this one from the RDDL specification itself:

 <rddl:resource xlink:type="simple"          xlink:title="RDDL Natures"         xlink:role="http://www.rddl.org/"         xlink:arcrole="http://www.rddl.org/purposes#directory"         xlink:href="http://www.rddl.org/natures" > <div class="resource"> <p>It is anticipated that many related-resource natures will be    well known. A list of well-known natures may be found in the    RDDL directory <a href=    "http://www.rddl.org/natures">http://www.rddl.org/natures</a>. </p> </div> </rddl:resource>

All of the information required to locate the resources is included in the attributes of the rddl:resource elements. The rest of the content in the document is relevant only to a browser showing the document to a human reader. Most software will want to read the rddl:resource elements and ignore the rest of the document. Example 15.9 is such a program. It searches a document for related resources and outputs an HTML table containing their information. The xlink: href attribute becomes an HTML hyperlink. The other URLs in the xlink:role and xlink:arcrole attributes are purely descriptive (like namespace URLs) and not intended to be resolved, so they're merely output as plain text.

Example 15.9 Searching for RDDL Resources

 import org.jdom.*; import org.jdom.input.SAXBuilder; import org.jdom.output.XMLOutputter; import java.util.*; import java.io.IOException; public class RDDLLister {   public final static Namespace XLINK_NAMESPACE =    Namespace.getNamespace("xl", "http://www.w3.org/1999/xlink");   public final static String RDDL_NAMESPACE    = "http://www.rddl.org/";   public static void main(String[] args) {     if (args.length <= 0) {       System.out.println("Usage: java RDDLLister url");       return;     }     SAXBuilder builder = new SAXBuilder();     try {       // Prepare the output document       Element html = new Element("html");       Element body = new Element("body");       Element table = new Element("table");       html.addContent(body);       body.addContent(table);       Document output = new Document(html);       // Read the entire document into memory       Document doc = builder.build(args[0]);       Element root = doc.getRootElement();       processElement(root, table);       // Serialize the output document       XMLOutputter outputter = new XMLOutputter("  ", true);       outputter.output(output, System.out);     }     catch (JDOMException e) {       System.err.println(e);     }     catch (IOException e) {       System.err.println(e);     }   } // end main   public static void processElement(Element input, Element output)    {     if (input.getName().equals("resource")      && input.getNamespaceURI().equals(RDDL_NAMESPACE)) {        String href = input.getAttributeValue("href",         XLINK_NAMESPACE);        String title = input.getAttributeValue("title",         XLINK_NAMESPACE);        String role  = input.getAttributeValue("role",         XLINK_NAMESPACE);        String arcrole = input.getAttributeValue("arcrole",         XLINK_NAMESPACE);        // Wrap this up in a table row        Element tr = new Element("tr");        Element titleCell = new Element("td");        titleCell.setText(title);        tr.addContent(titleCell);        Element hrefCell = new Element("td");        Element a = new Element("a");        a.setAttribute("href", href);        a.setText(href);        hrefCell.addContent(a);        tr.addContent(hrefCell);        Element roleCell = new Element("td");        roleCell.setText(role);        tr.addContent(roleCell);        Element arcroleCell = new Element("td");        arcroleCell.setText(arcrole);        tr.addContent(arcroleCell);        output.addContent(tr);     }     // Recurse     List content = input.getContent();     Iterator iterator = content.iterator();     while (iterator.hasNext()) {       Object o = iterator.next();       if (o instanceof Element) {         processElement((Element) o, output);       }     } // end while   } }

The main() method builds the general outline of a well-formed HTML document and then parses the input RDDL document in the usual fashion. It retrieves the root element with getRootElement() and then passes this root element and the table element to the processElement() method.

First processElement() checks to see if the element is a rddl:resource element. If it is, then processElement() extracts the four XLink attributes using getAttributeValue() . Each of these is then inserted in a td element, which is appended to a tr element, which is added to the table element. The setAttribute() method attaches an href attribute to the a element that defines the HTML link. Finally, the processElement() method is invoked on all child elements of the current elements to find any rddl:resource elements that are deeper down the tree.

Following is the beginning output when I ran this program against the RDDL specification itself:

 D:\books\XMLJAVA>  java RDDLLister http://www.rddl.org  <?xml version="1.0" encoding="UTF-8"?> <html>   <body>     <table>       <tr>         <td>RDDL Natures</td>         <td>           <a href="http://www.rddl.org/natures">            http://www.rddl.org/natures</a>         </td>         <td>http://www.rddl.org/</td>         <td>http://www.rddl.org/purposes#directory</td>       </tr>       <tr>         <td>RDDL Purposes</td>         <td>           <a href="http://www.rddl.org/purposes">            http://www.rddl.org/purposes</a>         </td>         <td>http://www.rddl.org/</td>         <td>http://www.rddl.org/purposes#directory</td>       </tr> ...