Section 6.4. HTTP Streaming | Ajax Design Patterns

6.4. HTTP Streaming

Connection, Duplex, Live, Persistent, Publish, Push, RealTime, Refresh, Remoting, RemoteScripting, Stateful, ReverseAjax, Stream, TwoWay, Update

Figure 6-9. HTTP Streaming

6.4.1. Goal Story

Tracy's trading application contains a section showing the latest announcements. It's always up-to-date because the announcements are streamed directly from the server. Suddenly, an important announcement appears, which triggers her to issue a new trade. The trade occurs with a new XMLHttpRequest, so the announcement stream continues unaffected.

6.4.2. Problem

How can the server initiate communication with the browser?

6.4.3. Forces

The state of many Ajax Apps is inherently volatile. Changes can come from other users, external news and data, completion of complex calculations, and triggers based on the current time and date.
HTTP connections can only be created within the browser. When a state change occurs, there's no way for a server to create a physical connection to notify any interested client.

6.4.4. Solution

Stream server data in the response of a long-lived HTTP connection. Most web services do some processing, send back a response, and immediately exit. But in this pattern, they keep the connection open by running a long loop. The server script uses event registration or some other technique to detect any state changes. As soon as a state change occurs, it pushes new data to the outgoing stream and flushes it, but doesn't actually close it. Meanwhile, the browser must ensure the user interface reflects the new data. This pattern discusses a couple of techniques for Streaming HTTP, which I refer to as "page streaming" and "service streaming."

Is it Practical?

Note that this pattern, though it's used in production systems, remains somewhat experimental, especially the service-streaming variant. There are issues of feasibility, scaleability, and browser portability. One particular gotcha is the effect of proxies. Sometimes, a proxy sitting somewhere between server and browser will buffer responses, an unfortunate optimization that prevents real-time data from flowing into the browser. Even worse, some proxies will, with the best of intentions, take it upon themselves to close the connection after an idle period. Another issue is server performance, given the number of persistent connections that must be handled.

HTTP Streaming works best when you can control factors such as the web server, the network, and even the browser where possible. Scaleability is often over-emphasized, so streaming might be practical on more systems than one might expect. Furthermore, some modern servers (e.g., Twisted, at http://twistedmatrix.com/) are being developed with streaming-style architecture in mind and are able to maintain many long-lived connections.

Page streaming involves streaming the original page response (Figure 6-10). Here, the server immediately outputs an initial page and flushes the stream, but keeps it open. It then proceeds to alter it over time by outputting embedded scripts that manipulate the DOM. The browser's still officially writing the initial page out, so when it encounters a complete <script> tag, it will execute the script immediately. A simple demo is available at http://ajaxify.com/run/streaming/.

Figure 6-10. Page streaming

For example, the server can initially output a div that will always contain the latest news:

   print ("<div id='news'></div>");

But instead of exiting, it starts a loop to update the item every 10 seconds (ideally, there would be an interrupt mechanism instead of having to manually pause):

   <?     while (true) {   ?>       <script type="text/javascript">         $('news').innerHTML = '<?= getLatestNews( ) ?>';       </script>   <?       flush( ); // Ensure the Javascript tag is written out immediately       sleep(10);     }   ?>

Don't Forget to Flush: A PHP Tip

Each language and environment will have its own idiosyncrasies in implementing this pattern. For PHP, there's fortunately some very useful advice in the flush( ): online comments (http://php.net/flush). It turned out to be necessary to execute ob_end_flush( ); before flush( ) could be called. There's also a max_execution_time parameter you might need to increase, and the web server will have its own timeout-related parameters to tweak.

That illustrates the basic technique, and there are some refinements discussed next and in the "Decisions" section, later in this chapter. One burning question you might have is how the browser initiates communication, since the connection is in a perpetual response state. The answer is to use a "back channel"; i.e., a parallel HTTP connection. This can easily be accomplished with an XMLHttpRequest Call or an IFrame Call. The streaming service will be able to effect a subsequent change to the user interface, as long as it has some means of detecting the callfor example, a session object, a global application object (such as the applicationContext in a Java Servlet container), or the database.

Page streaming means the browser discovers server changes almost immediately. This opens up the possibility of real-time updates in the browser, and allows for bi-directional information flow. However, it's quite a departure from standard HTTP usage, which leads to several problems. First, there are unfortunate memory implications, because the JavaScript keeps accumulating and the browser must retain all of that in its page model. In a rich application with lots of updates, that model is going to grow quickly, and at some point a page refresh will be necessary in order to avoid hard drive swapping or worse. Second, long-lived connections will inevitably fail, so you have to prepare a recovery plan. Third, most servers can't deal with lots of simultaneous connections. Running multiple scripts is certainly going to hurt when each script runs in its own process, and even in more sophisticated multithreading environments, there will be limited resources.

Another problem is that JavaScript must be used, because it's the only way to alter page elements that have already been output. In its absence, the server could only communicate by appending to the page. Thus, browser and server are coupled closely, making it difficult to write a rich Ajaxy browser application.

Service streaming is a step towards solving these problems (Figure 6-11). The technique relies on XMLHttpRequest Call (or a similar remoting technology like IFrame Call). This time, it's an XMLHttpRequest connection that's long-lived, instead of the initial page load. There's more flexibility regarding length and frequency of connections. You could load the page normally, then start streaming for 30 seconds when the user clicks a button. Or you could start streaming once the page is loaded, and keep resetting the connection every 30 seconds. Flexibility is valuable, given that HTTP Streaming is constrained by the capabilities of the server, the browsers, and the network.

Figure 6-11. Service streaming

As for the mechanics of service streaming, the server uses the same trick of looping indefinitely to keep the connection open, and periodically flushing the stream. The output can no longer be HTML script tags, because the web browser wouldn't automatically execute them, so how does the browser deal with the stream? The answer is that it polls for the latest response and uses it accordingly.

The responseText property of XMLHttpRequest always contains the content that's been flushed out of the server, even when the connection's still open. So the browser can run a periodic check; e.g., to see if its length has changed. One problem, though, is that, once flushed, the service can't undo anything its output. For example, the responseText string arising from a timer service might look like this: "12:01:00 12:01:05 12:01:10," whereas it would ideally be just "12:01:00," then just "12:01:05," then just "12:01:10." The solution is to parse the response string and look only at the last valueto be more precise, the last complete value, since it's possible the text ends with a partial result. An example of this technique (http://www.ajaxify.com/run/streaming/xmlHttpRequest/countdown/) works in this way. To ease parsing, the service outputs each message delimited by a special token, @END@ (an XML tag would be an alternative approach). Then, a regular expression can be run to grab the latest message, which must be followed by that token to ensure it's complete:

   function periodicXHReqCheck( ) {     var fullResponse = util.trim(xhReq.responseText);     var responsePatt = /^(.*@END@)*(.*)@END@.*$/;     if (fullResponse.match(responsePatt)) { // At least one full response so far       var mostRecentDigit = fullResponse.replace(responsePatt, "$2");       $("response").innerHTML = mostRecentDigit;     }   }

That's great if you only care about the last message, but what if the browser needs to log all messages that came in? Or process them in some way? With a polling frequency of 10 seconds, the previous sequence would lead to values being skipped; the browser would skip from 12:01 to 12:01:10, ignoring the second value. If you want to catch all messages, you need to keep track of the position you've read up to. Doing so lets you determine what's new since the previous poll, a technique used in "Code Refactoring: AjaxPatterns Streaming Wiki," later in this chapter.

In summary, service streaming makes streaming more flexible, because you can stream arbitrary content rather than JavaScript commands, and because you can control the connection's lifecycle. However, it combines two concepts that aren't consistent across browsersXMLHttpRequest and HTTP Streamingwith predictable portability issues. Experiments suggest that the page-streaming technique does work on both IE and Firefox (http://ajaxify.com/run/streaming/), but service streaming only works properly on Firefox, whether XMLHTTPRequest (http://ajaxify.com/run/streaming/xmlHttpRequest/) or IFrame (http://ajaxify.com/run/streaming/xmlHttpRequest/iframe/) is used. In both cases, IE suppresses the response until it's complete. You could claim that's either a bug or a feature; but either way, it works against HTTP Streaming. So for portable page updates, you have a few options:

Use a hybrid (http://ajaxlocal/run/streaming/xmlHttpRequest/iframe/scriptTags/) of page streaming and IFrame-based service streaming, in which the IFrame response outputs script tags, which include code to communicate with the parent document (e.g., window.parent.onNewData(data);). As with standard page streaming, the scripts will be executed immediately. It's not elegant, because it couples the remote service to the browser script's structure, but it's a fairly portable and reliable way to achieve HTTP Streaming.
Use a limited form of service streaming, where the server blocks until the first state change occurs. At that point, it outputs a message and exits. This is not ideal, but certainly feasible (see "Real-World Examples" later in this chapter).
Use page streaming.
Use Periodic Refresh (Chapter 10) instead of HTTP Streaming.

6.4.5. Decisions

6.4.5.1. How long will you keep the connection open?

It's impractical to keep a connection open forever. You need to decide on a reasonable period of time to keep the connection open, which will depend on:

The resources involved: server, network, browsers, and supporting software along the way.
How many clients will be connecting at any time. Not just averages, but peak periods.
How the system will be usedhow much data will be output, and how the activity will change over time.
The consequences of too many connections at once. For example, will some users miss out on critical information? It's difficult to give exact figures, but it seems fair to assume a small intranet application could tolerate a connection of minutes or maybe hours, whereas a public dotcom might only be able to offer this service for quick, specialized situations, if at all.

6.4.5.2. How will you decide when to close the connection?

The web service has several ways to trigger the closing of a connection:

A time limit is reached.
The first message is output.
A particular event occurs. For example, the stream might indicate the progress of a complex calculation (see Progress Indicator [Chapter 14]) and conclude with the result itself.
Never. The client must terminate the connection.

6.4.5.3. How will the browser distinguish between messages?

As the "Solution" mentions, the service can't erase what it's already output, so it often needs to output a succession of distinct messages. You'll need some protocol to delineate the messages; e.g., the messages fit a standard pattern, the messages are separated by a special token string, or the messages are accompanied by some sort of metadatafor example, a header indicating message size.

6.4.6. Real-World Examples

6.4.6.1. LivePage

LivePage (http://twisted.sourceforge.net/TwistedDocs-1.2.0/howto/livepage.html) is part of Donovan Preston's Nevow framework (http://nevow.com), a Python-based framework, built on the Twisted framework (http://twistedmatrix.com/). Events are pushed from the server using XMLHttpRequest-based service streaming. For compatibility reasons, Nevow uses the technique mentioned in the "Solution" in which the connection closes after first output. Donovan explained the technique to me:

When the main page loads, an XHR (XMLHttpRequest) makes an "output conduit" request. If the server has collected any events between the main page rendering and the output conduit request rendering, it sends them immediately. If it has not, it waits until an event arrives and sends it over the output conduit. Any event from the server to the client causes the server to close the output conduit request. Any time the server closes the output conduit request, the client immediately reopens a new one. If the server hasn't received an event for the client in 30 seconds, it sends a noop (the javascript "null") and closes the request.

6.4.6.2. Jotspot Live

Jotspot Live (http://jotlive.com/) is a live, multiuser wiki environment that uses HTTP Streaming to update message content (Figure 6-12). In an interview with Ajaxian.com (http://www.ajaxian.com/archives/2005/09/jotspot_live_li.html), developer Abe Fettig explained the design is based on LivePage (see the previous example).

Figure 6-12. JotSpot Live

6.4.6.3. Realtime on Rails

Martin Scheffler's Realtime on Rails (http://www.uni-weimar.de/~scheffl2/wordpress/?p=19) is a real-time chat application that uses service streaming on Firefox and, because of the restrictions described in the earlier "Solution," Periodic Refresh on other browsers.

6.4.6.4. Lightstreamer engine

Lightstreamer (http://www.lightstreamer.com/) is a commercial "push engine" used by companies in finance and other sectors to perform large-scale HTTP Streaming. Because it works as a standalone server, there are many optimizations possible; the company states that 10,000 concurrent users can be supported on a standard 2.4GHz Pentium 4 (http://www.softwareas.com/http-streaming-an-alternative-to-polling-the-server#comment-3078).

6.4.6.5. Pushlets framework

Just van den Broecke's Pushlets framework (http://www.pushlets.com/doc/whitepaper-s4.html) is a Java servlet library based on HTTP Streaming that supports both page-streaming and service-streaming mechanisms.

6.4.7. Code Refactoring: AjaxPatterns Streaming Wiki

The Basic Wiki Demo (http://ajaxify.com/run/wiki) updates messages with Periodic Refresh, polling the server every five seconds. The present Demo (http://ajaxify.com/run/wiki/streaming) replaces that mechanism with service streaming. The AjaxCaller library continues to be used for uploading new messages, which effectively makes it a back-channel.

The Web Service remains generic, oblivious to the type of client that connects to it. All it has to do is output a new message each time it detects a change. Thus, it outputs a stream like this:

   <message>content at time zero</message>   <message>more content a few seconds later</message>   <message>even more content some time after that </message>   ... etc. ...

To illustrate how the server can be completely independent of the client application, it's up to the client to terminate the service (in a production system, it would be cleaner for the service to exit normally; e.g., every 60 seconds). The server detects data changes by comparing old messages to new messages (it could also use an update timestamp if the message table contained such a field):

   while (true) {     ...     foreach ($allIds as $messageId) {       ...       if ($isNew || $isChanged) {         print getMessageXML($newMessage); // prints "<message>...</message>"         flush( );       }     }     sleep(1);   }

The browser sets up the request using the standard open( ) and send( ) methods. Interestingly, there's no onreadystatechange handler because we're going to use a timer to poll the text. (In Firefox, a handler might actually make sense, because onreadystatechange seems to be called whenever the response changes.)

   xhReq.open("GET", "content.phtml", true);   xhReq.send(null);   // Don't bother with onreadystatechange - it shouldn't close   // and we're polling responsetext anyway   ...   pollTimer = setInterval(pollLatestResponse, 2000);

pollLatestResponse( ) keeps reading the outputted text. It keeps track of the last complete message it detected using nextreadPos. For example, if it's processed two 500-character messages so far, nexTReadPos will be 1001. That's where the search for a new message will begin. Each complete message after that point in the response will be processed sequentially. Think of the response as a work queue. Note that the algorithm doesn't assume the response ends with </message>; if a message is half-complete, it will simply be ignored.

   function pollLatestResponse( ) {     var allMessages = xhReq.responseText;     ...     do {       var unprocessed = allMessages.substring(nextReadPos);       var messageXMLEndIndex = unprocessed.indexOf("</message>");       if (messageXMLEndIndex!=-1) {         var endOfFirstMessageIndex = messageXMLEndIndex + "</message>".length;         var anUpdate = unprocessed.substring(0, endOfFirstMessageIndex);         renderMessage(anUpdate);         nextReadPos += endOfFirstMessageIndex;       }     } while (messageXMLEndIndex != -1);

After some time, the browser will call xhReq.abort( ). In this case, it's the browser that stops the connection, but as mentioned earlier, it could just as easily be the server.

Finally, note that the uploading still uses the AjaxCaller library. So if the user uploads a new message, it will soon be streamed out from the server, which the browser will pick up and render.

6.4.8. Alternatives

6.4.8.1. Periodic Refresh

Periodic Refresh (Chapter 10) is the obvious alternative to HTTP Streaming. It fakes a long-lived connection by frequently polling the server. Generally, Periodic Refresh is more scalable and easier to implement in a portable, robust manner. However, consider HTTP Streaming for systems, such as intranets, where there are fewer simultaneous users, you have some control over the infrastructure, and each connection carries a relatively high value.

6.4.8.2. TCP connection

Using a Richer Plugin like Flash or Java, the browser can initiate a TCP connection. The major benefit is protocol flexibilityyou can use a protocol that's suitable for long-lived connections, unlike HTTP, whose stateless nature means that servers don't handle long connections very well. One library that facilitates this pattern is Stream (http://www.stormtide.ca/Stream/), consisting of an invisible Flash component and a server, along with a JavaScript API to make use of them.

6.4.9. Related Patterns

6.4.9.1. Distributed Events

You can use Distributed Events (Chapter 10) to coordinate browser activity following a response.

6.4.10. Metaphor

Think of a "live blogger" at a conference, continuously restating what's happening each moment.

6.4.11. Want To Know More?

Donovan Preston on his creation of LivePage (http://ulaluma.com/pyx/archives/2005/05/multiuser_progr.html)
Alex Russell on Comet (http://alex.dojotoolkit.org/?p=545)

6.4.12. Acknowledgments

Thanks to Donovan Preston and Kevin Arthur for helping to clarify the pattern and its relationship to their respective projects, LivePage and Stream.