Building Hierarchical Structures from Flat Data


Sometimes you want to convert only a subset of the available data. For example, you might want to store each year in a separate XML document. With a few tweaks to the basic code, this is not hard to do. Another possibility is to arrange the data hierarchically, with Agency elements containing Bureau elements, which contain Account elements, which contain Subfunction elements. This is a little more complex, but far from impossible . Or perhaps you want to massage the data while converting it, for example by changing the amounts of thousands of dollars to amounts of single dollars. Or you could do all three, in which case the hierarchy would be arranged something like in Example 4.5.

Example 4.5 A Hierarchical Arrangement of the Budget Data
 <?xml version="1.0"?> <Budget year="2001">  <Agency>   <Name>Farm Credit System Financial Assistance Corporation</Name>   <Code>354</Code>    <Bureau>    <Name>Farm Credit System Financial Assistance Corporation</Name>       <Code>00</Code>       <Account>         <Name>Financial assistance corporation trust fund</Name>         <Code>8202</Code>         <BEACategory>Mandatory</BEACategory>         <Subfunction>           <Title>Farm income stabilization</Title>           <Code>351</Code>           <Amount>-1000</Amount>         </Subfunction>       </Account>     </Bureau>   </Agency>   <Agency>     <Name>Department of Education</Name>     <Code>018</Code>     <Bureau>       <Name>Office of Elementary and Secondary Education</Name>       <Code>10</Code>       <Account>         <Name>Reading excellence</Name>         <Code>0011</Code>         <BEACategory>Discretionary</BEACategory>         <Subfunction>           <Title>             Elementary, secondary, and vocational education           </Title>           <Code>501</Code>           <Amount>286000000</Amount>         </Subfunction>       </Account>       <Account>         <Name>Indian education</Name>         <Code>0101</Code>         <BEACategory>Discretionary</BEACategory>         <Subfunction>           <Title>             Elementary, secondary, and vocational education           </Title>           <Code>501</Code>           <Amount>116000000</Amount>         </Subfunction>       </Account>       <!-- more accounts...-->     </Bureau>     <Bureau>       <Name>       Office of Bilingual Education and Minority Languages Affairs       </Name>       <Code>15</Code>       <Account>         <Name>Bilingual and immigrant education</Name>         <Code>1300</Code>         <BEACategory>Discretionary</BEACategory>         <Subfunction>           <Title>             Elementary, secondary, and vocational education           </Title>           <Code>501</Code>           <Amount>460000000</Amount>         </Subfunction>       </Account>     </Bureau>     <!-- more bureaus...-->   </Agency>   <!-- many more agencies... --> </Budget> 

There are two basic approaches to imposing hierarchy on the flat structures we're starting with. One is to build the hierarchy into the Java data structures in memory, enabling these data structures to be output in XML quite simply. The alternative is to leave the data structures flat in memory, but to try to insert the hierarchy as the output code is being written. The latter approach is not too difficult when the data can be sorted easily, but most of the time I find the former approach to be simpler.

Although all the data could still be stored in lists of lists, and lists of maps, and maps of lists, and so forth, I really do find that at this point the data structures are clearer if we name them. And in Java, there isn't any real difference between a named data structure and a class. Thus we can describe the structure as classes and objects. Informally, the structure looks like this:

 class Budget {   String year;   List   agencies; } class Agency {   String code;   String treasuryCode;   String name;   List   bureaus; } class Bureau {   String code;   String name;   List   accounts; } class Account {   String code;   String name;   String BEACategory;   List   subfunctions; } class Subfunction {   String code;   String title;   long   amount; } 

Of course, we'll also need a number of methods in these classes. For the most part, we will use factory methods instead of constructors to guarantee that 19 IRS Bureau objects aren't created just because the IRS has 19 line items in the budget. (One IRS is bad enough. :-) ) Each class will have a getXML() method that returns a String containing the XML form of the object. And each class will have a method to add a child node of the appropriate type. Figure 4.2 illustrates a UML static structure diagram for the complete group of classes.

Figure 4.2. A UML Diagram for the Budget Class Hierarchy

graphics/04fig02.gif

Note

I must admit that the lack of generic types (templates to C++ programmers) hurts a little here. For example, I want to say that an Agency contains a list of Bureau s. However, all I can really say is that an Agency contains a list of objects, and this list happens to be named bureaus . It's not a huge problemif it were, I could always define my own custom list classes with the appropriate typesbut it does lead to a less natural solution than generic types do. I thought seriously about using generics for this example, but that would limit this code to Java 1.4. Maybe for the second edition.


The Budget class, show in Example 4.6, is at the top of the tree. It contains a list of agencies and a year. Line items encoded as maps by BudgetData are fed into the tree through the add() method. This method breaks up that map into the relevant parts , encodes each as an object, adds each of those objects to the tree in the right place, and stores the agency in the list of agencies. The various add() methods in both this and the other classes are responsible for putting each piece in the right place.

Example 4.6 The Budget Class
 import java.util.*; public class Budget {   private List   agencies = new ArrayList();   private String year;   public Budget(String year) {     this.year = year;   }   public void add(Agency agency) {     if (!agencies.contains(agency)) agencies.add(agency);   }   public void add(Map lineItem) {     String agencyName = (String) lineItem.get("AgencyName");     agencyName = escapeText(agencyName);     String agencyCode = (String) lineItem.get("AgencyCode");     String treasuryAgencyCode      = (String) lineItem.get("TreasuryAgencyCode");     Agency agency = Agency.getInstance(agencyName, agencyCode,      treasuryAgencyCode, year);     this.add(agency);     String bureauName = (String) lineItem.get("BureauName");     bureauName = escapeText(bureauName);     String bureauCode = (String) lineItem.get("BureauCode");     Bureau bureau = Bureau.getInstance(bureauName, bureauCode,      agencyCode, year);     agency.add(bureau);     // Names and codes of two accounts in different bureaus     // can be the same     String accountName = (String) lineItem.get("AccountName");     accountName = escapeText(accountName);     String accountCode = (String) lineItem.get("AccountCode");     String category    = (String) lineItem.get("BEACategory");     Account account = Account.getInstance(accountName,      accountCode, category, bureauCode, agencyCode, year);     bureau.add(account);     // Names and codes of two subfunctions in different accounts     // can be the same     String subfunctionTitle      = escapeText((String) lineItem.get("SubfunctionTitle"));     String subfunctionCode      = (String) lineItem.get("SubfunctionCode");     String yearKey = year;     if (!yearKey.equals("TransitionQuarter")) {       yearKey = "FY" + year;     }     long amount      = 1000L * Long.parseLong((String) lineItem.get(yearKey));     Subfunction subfunction = new Subfunction(subfunctionTitle,      subfunctionCode, amount);     account.add(subfunction);   }   public String getXML() {     StringBuffer result = new StringBuffer("<Budget year='"      + this.year +"'>\r\n");     Iterator iterator = agencies.iterator();     while (iterator.hasNext()) {       Agency agency = (Agency) iterator.next();       result.append(agency.getXML());     }     result.append("</Budget>\r\n");     return result.toString();   }   public static String escapeText(String s) {     if (s.indexOf('&') != -1  s.indexOf('<') != -1       s.indexOf('>') != -1  s.indexOf('"') != -1       s.indexOf('\'') != -1 ) {       StringBuffer result = new StringBuffer(s.length() + 6);       for (int i = 0; i < s.length(); i++) {         char c = s.charAt(i);         if (c == '&') result.append("&amp;");         else if (c == '<') result.append("&lt;");         else if (c == '"') result.append("&quot;");         else if (c == '\'') result.append("&apos;");         else if (c == '>') result.append("&gt;");         else result.append(c);       }       return result.toString();     }     else {       return s;     }   } } 

Most of the other classes are structurally the same but with subtle variations. Each has a list of its children, a private constructor, a public factory method, an add() method to add a child of the appropriate type, and a getXML() method that returns the XML representation of the object as a String . I'll begin with the Agency class in Example 4.7.

Example 4.7 The Agency Class
 import java.util.*; public class Agency {   private String code;   private String name;   private String treasuryCode;   private String year;   private List   bureaus = new ArrayList();   private static Map instances = new HashMap();   // A private constructor so instantiators   // have to use the factory method   private Agency(String name, String code, String treasuryCode,        String year) {     this.name = name;     this.code = code;     this.treasuryCode = treasuryCode;     this.year = year;   }   public static Agency getInstance(String name, String code,     String treasuryCode, String year) {     // Agencies can be uniquely identified by code alone     String key = code+" "+year;     Agency agency = (Agency) instances.get(key);     if (agency == null) {       agency = new Agency(name, code, treasuryCode, year);       instances.put(key, agency);     }     return agency;   }   public void add(Bureau b) {     if (!bureaus.contains(b)) {         bureaus.add(b);     }   }   public String getXML() {     StringBuffer result = new StringBuffer("  <Agency>\r\n");     result.append("    <Name>" + name + "</Name>\r\n");     result.append("    <Code>" + code + "</Code>\r\n");     result.append("    <TreasuryAgencyCode>" + treasuryCode      + "</TreasuryAgencyCode>\r\n");     Iterator iterator = bureaus.iterator();     while (iterator.hasNext()) {       Bureau bureau = (Bureau) iterator.next();       result.append(bureau.getXML());     }     result.append("  </Agency>\r\n");     return result.toString();   } } 

This is not a general-purpose class. The public interface is limited to a factory method, a method to add a Bureau to the Agency , and a method to get the XML representation of the object as a String . There aren't even any getter methods. I've deliberately chosen not to burden this and the other classes with a lot of error checking. I'm assuming that will be done both in the database before the data is input here, and later through validation after the XML document has been produced. Doing it here too seemed a tad superfluous.

Clients retrieve instances of the class using the getInstance() factory method. The first time that this class is asked to get a particular agency, it constructs it using the private constructor, stores it in the static instances map, and returns a reference to it. Subsequent requests will return the formerly created object from the map. The key for the map is the concatenation of the agency's code and year.

The add() method simply stores Bureau objects in the bureaus list. Since bureaus is private and accessed only through this add() method, we can be certain that no non- Bureau objects will ever get added to this list. We can further guarantee that the list doesn't contain any duplicates.

The getXML() method returns an XML representation of this Agency object. It loops through all of the bureaus the agency contains, and uses their getXML() methods to obtain the XML representations for each of them. The XML is returned as a String . An alternative approach that might be more memory efficient and perhaps a little faster, especially in streaming applications, would be to pass the OutputStream or Writer onto which the XML should be written to the getXML() method. In essence, this would be a writeXML() method rather than a getXML() method.

The code here is far from the most efficient possible. A lot of optimizations could be done, including many that would build far fewer intermediate string representations of each element. For example, you could pass in the StringBuffer to append to to each getXML() method. However, because this is intended primarily as a one-off solution, the optimization didn't seem to be worth the investment of time or the added complexity.

The Bureau class, shown in Example 4.8, is much the same, except that it contains a list of accounts. Furthermore, both an agency code and a bureau code are required to uniquely identify bureaus. The bureau code alone is not sufficient, so the keys for the instances map are formed by concatenating the agency code, the bureau code, and the year.

Example 4.8 The Bureau Class
 import java.util.*; public class Bureau {   // Agency code plus bureau code uniquely identify a bureau.   // Bureau code alone is definitely not sufficient.   private String code;   private String name;   private String year;   private String agencyCode;   private List   accounts = new ArrayList();   private static Map instances = new HashMap();   // Use a private constructor so instantiators   // have to use the factory method   private Bureau(String name, String code, String agencyCode,     String year) {     this.name = name;     this.code = code;     this.agencyCode = agencyCode;     this.year = year;   }   public static Bureau getInstance(String name, String code,    String agencyCode, String year) {     String key = agencyCode+" "+code+" "+year;     Bureau bureau = (Bureau) instances.get(key);     if (bureau == null) {       bureau = new Bureau(name, code, agencyCode, year);       instances.put(key, bureau);     }     return bureau;   }   public void add(Account account) {     if (!accounts.contains(account)) accounts.add(account);   }   public String getXML() {     StringBuffer result = new StringBuffer("    <Bureau>\r\n");     result.append("      <Name>" + name + "</Name>\r\n");     result.append("      <Code>" + code + "</Code>\r\n");     Iterator iterator = accounts.iterator();     while (iterator.hasNext()) {       Account account = (Account) iterator.next();       result.append(account.getXML());     }     result.append("    </Bureau>\r\n");     return result.toString();   } } 

The Account class, shown in Example 4.9, is similar to Bureau and Agency . Accounts are uniquely identified by an account code, a bureau code, an agency code, and a BEA code. Otherwise, this class is structured the same as the preceding two.

Example 4.9 The Account Class
 import java.util.*; public class Account {   // An account is uniquely identified by account code,   // bureau code, agency code and BEA category   private String code;   private String name;   private String BEACategory;   private String bureauCode;   private String agencyCode;   private String year;   private List   subfunctions = new ArrayList();   private static Map instances = new HashMap();   // Use a private constructor so clients   // have to use the factory method   private Account(String name, String code, String BEACategory,    String bureauCode, String agencyCode, String year) {     this.name = name;     this.code = code;     this.BEACategory = BEACategory;     this.bureauCode = bureauCode;     this.agencyCode = agencyCode;     this.year = year;   }   public static Account getInstance(String name, String code,    String BEACategory, String bureauCode, String agencyCode,    String year) {     String key = code + " " + BEACategory + " " + bureauCode      + " " + agencyCode + " " + year;     Account account = (Account) instances.get(key);     if (account == null) {       account = new Account(name, code, BEACategory, bureauCode,        agencyCode, year);       instances.put(key, account);     }     return account;   }   public void add(Subfunction sfx) {     if (!subfunctions.contains(sfx)) subfunctions.add(sfx);   }   public String getXML() {     StringBuffer result = new StringBuffer();     result.append("      <Account>\r\n");     result.append("        <Name>" + name + "</Name>\r\n");     result.append("        <Code>" + code + "</Code>\r\n");     result.append("        <BEACategory>" + BEACategory      + "</BEACategory>\r\n");     Iterator iterator = subfunctions.iterator();     while (iterator.hasNext()) {       Subfunction subfunction = (Subfunction) iterator.next();       result.append(subfunction.getXML());     }     result.append("      </Account>\r\n");     return result.toString();   } } 

Finally, the Subfunction class, shown in Example 4.10, contains a title, a code, and an amount of money for a given year. Unlike agencies, bureaus, and accounts, subfunctions are not necessarily unique. For example, the "Department of Defense-Military" subfunction appears in 222 different accounts. Therefore, this class does not use a factory method, just a regular constructor.

Example 4.10 The Subfunction Class
 import java.util.*; public class Subfunction {   private String code;   private String title;   private long   amount;   public Subfunction(String title, String code, long amount) {     this.title  = title;     this.code   = code;     this.amount = amount;   }   public String getXML() {     StringBuffer result       = new StringBuffer("        <Subfunction>\r\n");     result.append("          <Name>" + title + "</Name>\r\n");     result.append("          <Code>" + code + "</Code>\r\n");     result.append("          <Amount>");     result.append(amount + "</Amount>\r\n");     result.append("        </Subfunction>\r\n");     return result.toString();   } } 

The ultimate plan is to read all the records of the budget data, and then use that information to create objects of these classes. As each object is created, it is added to its parent object's list. When all records have been processed , the getXML() method of the Budget object is invoked to retrieve the XML representation of the budget. This XML can be written onto an output stream. Example 4.11 accomplishes this. It has a main() method to provide a simple user interface. This simply reads the year and the input and output file names from the user , and calls convert() . The convert() method delegates reading the input to the BudgetData class, and the data-structure building and most of the output to the Budget class.

Example 4.11 The Driver Class That Builds the Data Structure and Writes It Out Again
 import java.io.*; import java.util.*; public class HierarchicalXMLBudget {   public static void convert(List budgetData, String year,    OutputStream out) throws IOException {     Budget budget = new Budget(year);     Iterator records = budgetData.iterator();     while (records.hasNext()) {       Map lineItem = (Map) records.next();       budget.add(lineItem);     }     Writer wout = new OutputStreamWriter(out, "UTF8");     wout.write("<?xml version=\"1.0\"?>\r\n");     wout.write(budget.getXML());     wout.flush();   }   public static void main(String[] args) {     try {       if (args.length < 2) {         System.out.println(          "Usage: HierarchicalXMLBudget year infile outfile");         return;       }       // simple error checking on the year value       try {         if (!args[0].equals("TransitionQuarter")) {           Integer.parseInt(args[0]);         }       }       catch (NumberFormatException e) {         System.out.println(          "Usage: HierarchicalXMLBudget year infile outfile");         return;       }       InputStream in = new FileInputStream(args[1]);       OutputStream out;       if (args.length < 3) {         out = System.out;       }       else {         out = new FileOutputStream(args[2]);       }       List results = BudgetData.parse(in);       convert(results, args[0], out);     }     catch (IOException e) {       System.err.println(e);     }   } } 

Here's the start of a sample run:

 C:\XMLJAVA>  java HierarchicalXMLBudget 2002 budauth.txt  <?xml version="1.0"?> <Budget year='2002'>    <Agency>      <Name>Legislative Branch</Name>      <Code>001</Code>      <TreasuryAgencyCode></TreasuryAgencyCode>      <Bureau>        <Name>Legislative Branch</Name>        <Code>00</Code>        <Account>          <Name>Receipts, Central fiscal operations</Name>          <Code></Code>          <BEACategory>Mandatory</BEACategory>          <Subfunction>            <Name>Central fiscal operations</Name>            <Code>803</Code>            <Amount>0</Amount>          </Subfunction>        </Account>        <Account>          <Name>Receipts, Central fiscal operations</Name>          <Code></Code>          <BEACategory>Net interest</BEACategory>          <Subfunction> ... 

The technique used here is extremely powerful. The data is read one item at a time. Each item is stored in a data structure, which automatically moves each item into its appropriate slot. The final result is generated by a single method call to the root of the data structure.

Note

A similar approach would be to define classes that represent the XML elements and attributes rather than the input data. Indeed this is the approach taken by the Document Object Model (DOM). I'll explore this alternative in later chapters.




Processing XML with Java. A Guide to SAX, DOM, JDOM, JAXP, and TrAX
Processing XML with Javaв„ў: A Guide to SAX, DOM, JDOM, JAXP, and TrAX
ISBN: 0201771861
EAN: 2147483647
Year: 2001
Pages: 191

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net