Aggregation and Conglomeration | Professional Java Servlets 2.3

< Free Open Study >

If I wanted to purchase a book I could visit several web sites and purchase the book from the store that gave me the best price. Better yet, I could create a program that visits many web sites for me. This program would get the price for the book that I am planning to buy from each web site. I can then look at and compare all of these prices and determine where to go and purchase my book.

Consider another example in which I'm interested in monitoring the performance of the companies in my stock portfolio. There are of course, sites that provide a wealth of information about publicly traded companies. I can get the stock ticker price, the opening and closing prices, the company's financials and lists of news items from several different sites. This time, instead of visiting several different site to make sure that I have the information that I need, I could create a site that combines that information into one convenient site.

In each case we could use a servlet agent to visit a number of other sites to retrieve information. The agent could then combine the information in a way that it more valuable to us. This gathering and combining of information is known as aggregation and conglomeration:

Aggregation is the combination of similar types of information collected from different sources. An example of aggregation is comparison shopping by collecting the prices of similar items from several different vendors.
Conglomeration is the combination of different types of information. A portal showing news headlines and stock prices is an example of conglomeration.

Next we'll look at how we can create an aggregating servlet agent.

Aggregation Example

This aggregating agent builds and displays a table comparing exchange rates:

    import java.io.*;    import java.net.*;    import java.util.*;    import javax.servlet.*;    import javax.servlet.http.*;    public class AggregateServlet extends HttpServlet {

The doPost() method calls buildTable(), which builds the table for the response:

      public void doPost (HttpServletRequest request,                          HttpServletResponse response)          throws ServletException, IOException  {        PrintWriter out = response.getWriter();        out.println("<html>");        out.println("<head>");        out.println("</head>");        out.println("<body>");        buildTable(out);        out.println("</body>");        out.close();      }

The buildTable() method does most of the work by calling buildRow() and buildCell():

      private void buildTable(PrintWriter out) {        out.println("<table border=\"1\">");        out.print("<tr><td>&nbsp;&nbsp;</td>");        out.println("<th>CA</th><th>EU</th><th>JP</th>" +                    "<th>MX</th><th>UK</th><th>US</th></tr>");        buildRow(out, "CA");        buildRow(out, "EU");        buildRow(out, "JP");        buildRow(out, "MX");        buildRow(out, "UK");        buildRow(out, "US");        out.println("</table>");      }

The buildRow() method handles the building of each row of the table by calling buildCell() for each column:

      private void buildRow(PrintWriter out, String country) {        out.println("<tr>");        out.print("<th>"+country+"</th>");        buildCell(out, "CA", country);        buildCell(out, "EU", country);        buildCell(out, "JP", country);        buildCell(out, "MX", country);        buildCell(out, "UK", country);        buildCell(out, "US", country);        out.println("</tr>");      }

The buildCell() method gets the exchange rate for a particular cell based on the countries for the row and column and calls a static method in a helper class (similar to the caching example):

      private void buildCell(PrintWriter out, String country1,                             String country2) {        out.print("<td>");        out.print(RateHelper.getExchangeRate(country1, country2));        out.println("</td>");      }

The RateHelper class is based on the ExchangeRateCache class. To improve performance RateHelper caches the rates in order to minimize the number of remote calls made:

    import java.io.*;    import java.net.*;    import java.util.*;    import org.apache.soap.*;    import org.apache.soap.rpc.*;    public class RateHelper {      private static Hashtable rate_table = new Hashtable(6);

We have provided two versions of the getExchangeRate() method. One version takes a single argument and returns the rate per US dollar. It first looks in the cache and if the rate is not found calls the callExchangeRateService() method to fill the cache:

      public static double getRate(String country) {        Double rate = (Double)rate_table.get(country);        if(rate == null) {          try {            rate = callExchangeRateService(country);          } catch(Exception e) {            e.printStackTrace();            rate = new Double(0.0);          }        }        return rate.doubleValue();      }

The second implementation uses the first to calculate the rate between any two supported currencies by first converting to dollars:

      public static double getRate(String country1, String country2) {        double rval;        double currency1_per_usdollar = getRate(country1);        double currency2_per_usdollar = getRate(country2);        if(currency2_per_usdollar > 0.0) {          rval = currency1_per_usdollar/currency2_per_usdollar;        } else {          rval = 0.0;        }        return rval;      }

This is the method that will actually be called by our aggregating client:

      public static String getExchangeRate(String country1, String country2) {        double rate = getRate(country1, country2);        return Double.toString(rate);      }

Finally, we have modified the callExchangeRateService() method so that it now returns a double. It also stores doubles into the cache instead of strings and passes exceptions through by re-throwing them:

      public static synchronized Double callExchangeRateService(String country)          throws Exception {        Double rate = (Double)rate_table.get(country);        if(rate != null) {          return rate;        }        try {          URL url = new URL("http://localhost:8088/soap/servlet/rpcrouter");          String encodingStyleURI = Constants.NS_URI_SOAP_ENC;          URLConnection connection = url.openConnection();          Call call = new Call ();          call.setTargetObjectURI ("urn:SoapExchangeRate");          call.setMethodName ("getExchangeRate");          Vector params = new Vector ();          call.setEncodingStyleURI(encodingStyleURI);            params.addElement(new Parameter("country", String.class,                                            country, null));            call.setParams (params);            Response resp = call.invoke (url, "");            if (resp.generatedFault ()) {              Fault fault = resp.getFault ();              String faultString = fault.getFaultString ();              System.out.println( faultString);            } else {              Parameter result = resp.getReturnValue ();              rate = (Double)result.getValue();              rate_table.put(country, rate);              return rate;            }          } catch(Exception e) {            throw(e);          }          return rate;        }      }

Navigate to http://localhost:8080/webservices/servlet/AggregateServlet to run this servlet:

click to expand

Of course, our servlet could have got this information from a number of different web services and used the different exchange rates to form a comparison service.

Sampling Applications

A sampling application is one that samples several sites for similar information. It is a special case of an aggregation. For example, a shopping bot that samples book prices for the same title at several sites is a sampling application. An aggregation agent would show the prices for all the sites at once. A sampling agent might just show the top three. A sampling agent may have a list of 100 stores to sample, but may only sample 25 of them at any time. The decision on which stores to sample might be based on previous history, or might be made randomly.

Design Considerations

There are a number of issues to consider when designing an agent that performs aggregation or conglomeration are:

Is the information cacheable?
Does all information have to be refreshed each time?
Can the information be collected in advance?
Is the order that the information is retrieved important?
Can the information be retrieved in a reasonable amount of time?

It is very likely that for some agents, some information will be cacheable and some will not be. Or that some of the information caches will need to be refreshed at a different rate than others. For example, if you are trying to match 15-minute stock quotes with daily news items, the stock quotes have a cache aging cycle of no more than 15 minutes. The news items may only need to be refreshed once or twice a day.

Collecting information in advance prevents delays for those first requestors. In this case, it makes sense to have a separate daemon or process that runs externally to the application server and refreshes the data cache.

Dependencies may exist between data items. For example, if the agent is to give the monthly lease price of a vehicle, it must get lease rates and vehicle prices. Often lease rates are based on the value of what is being leased. This means the prices must be available to provide the lease quoting services before the lease rates can be determined.

The amount of time it takes an agent to build its response is important. If it takes too long, the information may no longer be valid or useful. Dependencies between services and non-cacheable items can create long assembly times and when possible these should be worked around or avoided.

Parallel vs. Sequential Processing

An agent that is performing aggregation and conglomeration typically needs to perform has several tasks that could be performed independently of each other. If the queries to the remote systems are independent of each other then these queries can be done in parallel, by allocating each query to a separate thread. As each thread spends most of the time waiting for a response, this improves performance even on a single CPU system.

< Free Open Study >