A Distributed System in the PubSub Domain

A Distributed System in the Pub/Sub Domain

In the last example, the messages in the PTP domain corresponded to tasks and it was (effectively) the responsibility of the JMS message server to ensure that each of these tasks was executed exactly once. In contrast, messages in the Pub/Sub domain often contain information that should be distributed to as many interested clients as necessary.

A classic example, and probably one of the most common applications of Pub/Sub messaging, is the distribution of financial data. In this section we will develop an example of a stock price distribution system for a financial trading room. This example draws on elements of several real financial information systems, but is simplified to serve as an architectural example without getting bogged down in too many details.

The initial system will be very skeletal in that it only distributes raw stock prices from market data feeds to traders' desktops as well as middle and back office systems that also require access to live data. We will then demonstrate how the architecture makes it easy to merge in additional services.

The skeletal system is shown in the figure below. The system is fed with price data from several realtime market data services. These may be available via dial-up modem, leased line, satellite feed, Internet, or other means. How each feed is delivered is not important. We will assume that for each feed there is an interface program that is capable of publishing the data to our JMS compliant message server. Let's assume that there are three feeds, and that each covers one of the three major financial regions: North America, Europe, and Asia.

In our trading room, activity is divided up by industry, so the feed interfaces will publish each price to one of the topics dedicated to technology, automotive, or pharmaceutical stock. The diagram also shows a number of subscribers that are consuming the price data. There are two trader desktop systems; each one is only subscribed to the topic containing prices for the industry in which that trader is interested. There are other institutional systems that subscribe to the raw prices. These systems need data from all industries, so they subscribe to all of the topics:

click to expand

Let us have a look at what we get from the basic system:

  • Any number of subscribers can have access to the live data.

  • Asynchronous subscribers get new prices as soon as they are available without wasting time having to poll the price source. Synchronous subscribers (that is those that use the blocking receive () call) can also get updates when they occur, but they have to block a thread while waiting for the next one.

  • Subscribers can restrict the amount of information that is delivered to them by selecting a subset of topics, and additionally by using message selectors. This decreases the amount of unwanted data that is delivered to a particular subscriber.

  • It is easy to add new subscribers and new price feeds at any time. Adding such a component does not disturb any of the components that are already in place.

Most traders who are watching stock prices are also interested in the volatility of those prices, so we will integrate a volatility calculator into the system. Volatility is a measure of the degree to which a price fluctuates over time, and is the result of a simple statistical calculation on the past stock prices. Our volatility calculator thus needs to subscribe to stock prices as input to the calculation. After performing the calculation, it will publish the results to new topics dedicated to volatility data. When these functions are implemented, the volatility calculator is fully integrated into the system. Other applications can access the calculated volatility data by subscribing to one of the volatility topics.

Now that we have access to volatility data, we would like to calculate theoretical stock option prices. The option price calculation requires stock price and volatility as input. (It actually requires interest rates and few other things also. Providing this additional data requires a hefty bit of infrastructure, so for the purposes of the example we will just assume that it is available by some other means.) The option price calculator is integrated into the system in the same way as the volatility calculator. It subscribes to stock prices and volatilities, and publishes option prices on a new set of topics.

Next we would like to make the calculation engines (for volatility and option prices) redundant in order to increase the reliability of the system. There are a few different possible approaches to this. The most basic approach is just to start a second instance of each engine. Now each time a new stock price is published, two identical volatility updates are published. If one of the volatility calculators fails, then the other still provides a complete set of data. If the subscribers of the calculated data are not adversely affected by receiving redundant updates, then we have already achieved high availability, but not in a particularly efficient way.

Consider this: for each stock price update, there are two identical volatility updates. If the option price calculator calculates a new price each time it receives an update for one of its input values, then each one will calculate three redundant price updates (one new stock price and two redundant volatilities). Two redundant option price calculators will produce a total of six redundant updates for each new stock price. This effect will tend to snowball, so we need to take a more sophisticated approach.

Added Sophistication

One approach is to build a bit more intelligence into update logic of the calculation engines. If the option calculator caches the input values of each calculation, then it can be programmed to calculate a new price only if a violatility price update contains a different value from that used in the last calculation. It might just delay the actual calculation so that several updates of input data that occur within a short time period only trigger one new price calculation. This curbs the snowball effect with minimal effort. The redundant calculators, however, will still produce redundant updates, so if this is not acceptable, we need to move on to the next technique.

The next level of sophistication requires each redundant calculator to subscribe to the topic to which it publishes its results. In this way it has a means to detect the results published by other engines. Each instance of the calculation engine should pause for a small, random delay before actually performing the calculation. If, during this pause, it receives a message from another calculation engine, then it knows that the task has already been performed and it can abort the calculation.

In this scenario, the random delay of the other calculation engine was shorter. This scheme still permits redundant calculations, but they will occur seldom. It is quite effective in providing both increased reliability and load balancing among the different parallel instances of the calculation engines, albeit it is a bit more difficult to implement than in the PTP case.

When a new subscriber is started, it would like to have the most recent price for each stock immediately, rather than wait an unknown amount of time until the stock price changes before seeing the current state of the market. This could be solved with durable subscribers, but durable subscribers, as defined by JMS, will deliver all the price changes that transpired while the subscriber was offline. In most cases, only the current prices are relevant.

To solve this problem, we will add a new service into the system: the MRV (Most Recent Value) service. The MRV service subscribes to all topics and stores the most recent value of each unique item in a database. It then listens for requests for MRVs for specific items on a special destination. It returns the appropriate value from its database to the destination specified in the ReplyTo header of the request. This service is actually best implemented with a queue, as each request is a single task that should be executed by exactly one consumer.

We will add one more useful service to the system. It may be desirable to have the history of all prices and calculations stored in a database for future reference. This can be accomplished by adding a subscriber that listens to all topics and writes all messages into a database. If the JMS and JDBC providers both support distributed transactions then this can be employed to ensure that every message is correctly copied to the database in spite of failures.

The complete system is depicted in this second figure:

click to expand

Here are some of the noteworthy aspects of the final system:

  • Each new service could be added without disrupting any of the existing system components

  • Every component can be stopped at any time without triggering exceptions in other system components

  • Each additional subscriber does not add any additional load on the corresponding message producers, only the message server

High availability and load balancing of the individual services are more difficult to implement than in the PTP domain, but nevertheless can be done in the robust fashion described above. This is still quite advantageous when you consider what it takes to implement these features without leveraging JMS: Your application would need to include a process that knows about all of the calculation engines and can distribute tasks to them. This process should not be the single point of failure — there needs to be a second one that listens to heartbeats from the first one and takes over if it dies (and goes away again if the heartbeats start again). This is complicated stuff if you have to implement it yourself. JMS can make your life easier if you use it right.

JMS in Application Integration

When talking of integration, it is imperative that organizations understand both business processes and data. They must select which processes and data elements require integration. This may involve data-level integration, application interface-level integration, method-level integration and user-interface-level integration. In any business organization, both internal and external integration are related. Unless businesses have some kind of common integration infrastructure that created and maintained the interfaces between different systems, they will find that Enterprise Application Integration (EAI) is very labor-intensive. Another issue to consider is web-technologies in the context of enterprise integration since the Internet drives market forces today.

Therefore in addition to EAI, it is also imperative that we talk about Internet Application Integration (IAI). In the first place, a lot of clients are seeking integration in the context of the Internet. This is because the client's e-commerce retail web site has to be integrated with backend systems using a more flexible integration infrastructure rather than hard-coded or even paper-based links. This could use EAI technology, as the web site server could reside within the enterprise. However, a lot of e-business activity is not retail but business-to-business (B2B). This means that operational systems of different corporations must be linked together and this creates a whole new dynamic which sees XML, integration with application servers, and Java becoming more important than EAI. Similarly, vendors like IBM and BEA Systems have strategies that focus on both the EAI and the IAI segments.

With EAI the focus is integrating a set of applications, whether built or bought, inside an enterprise in order to automate an overall business process for that enterprise. With IAI the focus is integrating applications, whether built or bought, across multiple enterprises in order to automate multi-enterprise business processes where the Internet provides the communications backbone. Specific examples are trading groups or associations, virtual companies (components and assembly of a final product are totally out-sourced, the virtual company handles distribution, marketing, and finance) and integrated supply chains.

If you are building an interface between two areas of the company that represent parts of the business you might outsource, externalize to partners, or even offer as part of your service to other companies, then it needs to incorporate open, Internet-based technology. If you don't, the chances are you will have to rebuild the interface in the future when you need to open it up. It's always a trade off, but the thought should be there.

As its name suggests, the distinguishing feature of IAI technology is that it integrates more tightly with Internet technology, particularly Java and application servers. A lot of vendors are working on versions of their message brokers that would run as a task inside a java application server. There are some good reasons why they would want to do this. Application servers are becoming the focus of a lot of investment because they are highly reliable, transactional, and scalable — capabilities that you want available when you build a message broker. You don't need to build this into your message broker when you can run over an application server. This is important from the customer's point of view too, because if they are building a distributed computing environment as well as an integration infrastructure, both environments need to be integrated with directory, security, and management infrastructure — all complex problems from an installation and management perspective. Why not solve this problem once rather than twice?

Traditionally, there have been a lot of issues surrounding messaging APIs. Many organizations ended up building their own API on top of the original proprietary messaging system API. Clearly, there was some dissatisfaction with the APIs provided by the messaging vendors. Also, the International Middleware Association (IMWA) — formerly Message Oriented Middleware Association (MOMA) — the standards governing body for MOM, never sought to make it their mission to devise a single, industry-standard messaging API.

It is here that JMS could be used as such a messaging API standard. Similarly, with its integration of messaging and message brokers with Java application servers and other Internet-based technology, IAI is an enabler for multi-enterprise e-business processes and will form the majority of integration projects in the future.

JMS is very tightly integrated with Java, meaning that if you are a Java programmer looking at talking to IBM MQSeries, for example, it will be much easier and much more productive to use JMS than the MQSeries API for Java. Each messaging vendor will implement JMS. This means that a Java developer can use JMS with either TIBCO Rendezvous or IBM MQSeries or both.

JavaMail

When first exposed to JMS, many people do not see a big difference between JMS and e-mail. E-mail is, of course, a means for sending and receiving messages. E-mail also has a huge established infrastructure. This infrastructure is ubiquitous, reasonably reliable, and usually someone else's responsibility to maintain.

E-mail is intended for transmitting messages from humans to humans, but by using an API such as JavaMail, it is possible to use e-mail for inter-application messaging. Not only is it possible, but has certainly been done numerous times in the history of distributed programming. In some of these cases e-mail was the best tool for the job, but in others cases JMS may well have been a better choice but may not yet have been available or not well enough understood.

Consider the following example. A company has established, but conventional, retail channels, business processes, and back-office IT systems. This company needs to add the Internet as a new retail channel as soon as possible (or in a matter of weeks, whichever is sooner). The company is not big enough to justify the expenses associated with maintaining a highly available (say 99.9% or better) web site on its own premises, so it outsources this to an ASP. The backend systems must remain on premises for security reasons, but a failure of these systems should not impair the functionality of the web site (a backend failure is costly, but customers do not need to know about it). Thus, the web front end must be able to transmit merchandise orders to the backend system.

Both systems are behind restrictive firewalls, but with enough effort, a secure TCP connection could be tunneled from the web system to the backend. In order to fulfill the requirements, then the web system would need to be able to queue orders and continue processing if the back end is not available. This is possible, but involves re-inventing the (messaging) wheel (remember the time constraints). In this case, encrypted e-mail via JavaMail is used to solve the connectivity problem. The ASP provides access to an SMTP server at no extra cost, and there are almost no firewall issues.

Although e-mail proved to be a quick and effective solution in the example above, there are some shortcomings:

  • Two Way Communications

    Although sending messages via an existing e-mail server is trivial, receiving mail requires more effort. A POP or IMAP mailbox, with corresponding address and possibly domain DNS entries, would have to be configured for the web application. In addition, a thread or a process would need to poll the mailbox for new messages. Since the web system had guaranteed uptime, communication in this direction was accomplished using a direct connection instead of e-mail.

  • Exception Handling

    If a message cannot be delivered for some reason, for example because of the destination mail server being unavailable for an extended period, the mail system tries to notify the sender via e-mail. Since the web system is not enabled to receive e-mail, it will never get these notifications. The workaround for this was to send a periodic mail containing an order summary. The backend system compares this summary to the received orders and generated an alert if messages were lost. It expects the summaries at regular intervals and will also generate an alert if one does not arrive. This process effectively creates a reliable protocol on top of e-mail.

  • Dependency on Third Parties

    This is the downside of using existing infrastructure. The system depends on the availability of the SMTP server. However, the ASP maintains it, most ordinary e-mail services packages do not provide explicit uptime guarantees. Although if necessary one could negotiate a Service Level Agreement (SLA) specifying maximum allowable downtime. E-mail is intended for inter-human messaging and not inter-application messaging. Humans are generally more robust to the failure of a mail server than computer programs are.

In addition to the points mentioned above, there are other shortcomings of using e-mail for inter-application messaging that were not relevant to the example:

  • No Transactions

    JMS providers support transactional messaging. This enables the provider to guarantee that messages produced and consumed within one transaction either all succeed or all fail as a single unit. This is important in mission-critical applications, since it means that the failure of the JMS server or of a client will never leave an operation partially completed. E-mail servers do not support transactions.

  • Delivery Order

    E-mail systems do not guarantee that messages will be delivered in the order that they were sent. Applications that depend on in-order message delivery would need to implement a custom ordering protocol.

  • Binary Data

    E-mail is intended for transmitting human readable text or HTML messages. Minor reformatting, such as the insertion of line breaks is possible. This not critical for the ability of a human to read a message, but could render a message unusable by an application that expects a very rigid format. This is not a showstopper, since it is possible to encode arbitrary binary data as text that can pass through e-mail systems without corruption. The big issue here is efficiency. Encoding binary data using the common base64 algorithm will inflate the size of the data by more than 30% and add the overhead of encoding and decoding each message. In high-volume applications this could be a serious issue.

  • Explicit Recipient Identity

    E-mail requires each message recipient to be named explicitly. It does not inherently support the ability to dynamically add subscribers to and remove subscribers from a topic as in JMS. Using aliases on the server can simulate this behavior, but this requires administrative access to the mail server and is not supported by the JavaMail API.

  • Load Balancing

    E-mail cannot simulate the ability of JMS to distribute Point-to-Point messages to multiple receivers, with the guarantee that each message is delivered to exactly one receiver.

  • Asynchronous Message Delivery

    E-mail systems push messages only as far as a mailbox. Consumers of e-mail messages are required to pull messages from the mailbox. JavaMail does define an event mechanism that will automatically invoke listeners when new mail arrives. However, the incoming part of the underlying mail system is pull-oriented, meaning that the client must explicitly query the mail server to find out if new messages are available. This means that the consumer is required to poll the mailbox in order to cause a "new mail" event to fire.

In summary, despite the JavaMail API, e-mail is intended for the exchange of messages between humans. JMS was designed for inter-application messaging. The fact that both JavaMail and JMS are required components of the Java 2 Platform, Enterprise Edition (as of version 1.3) underscores the fact that they are meant to fulfill different needs. Although the difference between these may not seem dramatic at first glance, the devil is in the details.

JMS provides many features that support the use of messaging in automated, transactional applications. These features would need to be implemented at the application level if JavaMail were to be used in this context. On the other hand, the ability to use JavaMail to cheaply and quickly leverage a huge existing global infrastructure should not be overlooked. Perhaps one day we will see JMS to e-mail gateways, or JMS that uses e-mail as its underlying transport mechanism.



Professional JMS
Professional JMS
ISBN: 1861004931
EAN: 2147483647
Year: 2000
Pages: 154

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net