I l @ ve RuBoard |
The practices outlined in this section focus on client performance. That is, we assume there is a client application, probably being used by an end user , which talks to a server using RMI. The goal of these practices is to improve both the performance and the perceived performance of the client application. Perceived performance is a strange thing. The goal in improving perceived performance is improving application responsiveness. That is, when the user clicks a button, the application does the right thing, and it does the right thing quickly . Most practices that improve performance also improve perceived performance. But the converse is not true ”improving application responsiveness can actually degrade overall performance, or at least cause the server to work harder. But that's often a price worth paying if it means the end user is happier .
6.3.1 Cache Stubs to Remote ServersOne of the practices in Section 6.2 was entitled "Always Use a Naming Service." This is sound advice ”it does make the application more robust. When followed na vely, it also makes the application much slower. If you simply replace a remote method call (to a server) with two remote method calls (one to the naming service and then one to the server), you deserve whatever criticism comes your way. A much better solution is to implement a centralized cache for stubs for remote servers. This cache should meet the following requirements:
If you implement a stub cache that meets these four criteria, remote method call invocations will look like the following code snippet ( assuming that your implementation of a cache has get and remove methods ): try { MyServer stubToRemoteServer = (MyServer) cache.get(LOGICAL_NAME_FOR_SERVER); stubToRemoteServer.performMethod( . . . ); } catch {RemoteException remoteException) { cache.remove(LOGICAL_NAME_FOR_SERVER); } This has many benefits. For one, it's clear code. If the cache has the stub (e.g., if any part of the application has already talked to the server in question), the naming service is bypassed. And if the stub points to a server that is no longer running, the stub is immediately removed. The next step in using a stub cache is to integrate it with the retry loop discussed earlier. Here's a very simple integration to illustrate the idea: public void wrapRemoteCallInRetryLoop( ) { int numberOfTries = 0; while (numberOfTries < MAXIMUM_NUMBER_OF_TRIES) { numberOfTries++; try { MyServer stubToRemoteServer = (MyServer) cache.get(LOGICAL_NAME_FOR_SERVER); doActualWork(stubToRemoteServer); break; } catch (RemoteException exceptionThrownByRMIInfrastructure) { cache.remove(LOGICAL_NAME_FOR_SERVER); reportRemoteException(exceptionThrownByRMIInfrastructure); try { Thread.sleep(REMOTE_CALL_RETRY_DELAY); } catch (InterruptedException ignored) {} } } } This attempts to get the stub and then makes the remote call. If the call fails, the stub is flushed and, on the second try, a new stub is fetched from the naming service. This is good. In most cases, the cache has the stub, and the overhead of the cache is strictly local. In return for the overhead of maintaining a local cache, you've eliminated most of the calls to the naming service. Using a stub cache inside the retry loop also lets you gracefully handle the cases when a server is restarted or migrated . In these cases, the stubs in the cache won't work, and attempting to use them will cause instances of RemoteException to be thrown. Logically, the client performs the following sequence of operations:
Combining a stub cache with a retry loop is both efficient and robust! 6.3.2 Consider Caching Return Values from Distributed CallsAt one company for which I worked, my first task was described in the following way: "This call is too slow. See if you can make it faster." I looked at the call. It was slow. It involved 18 separate modules on the server and implicitly depended on about 40,000 lines of code, and all I could think was, "Oh yeah. I'll optimize this in my first week on the job." Because the task was impossible , I decided to cheat. I thought, "Hmmm. If I can't make the call faster, I can at least use a cache to try and make the call less often." This turned out to be good enough for the application in question, and is often good enough in distributed applications. The benefits of caching return values on the client side are:
On the other hand, caching does have some major drawbacks. The two most significant are:
Cache maintenance is usually done in a background thread so that a main thread doesn't take the hit of maintaining the cache (otherwise, perceived performance would suffer, and the "more predictable" item listed as an advantage would be false ). But nonetheless, if you're not accessing data that's been cached very often, the client-side cache might very well be a net loss in performance.
Caching is a subtle and tricky subject. One reference worth looking at is the series of data expiration articles I wrote for onjava.com. In addition, if you're interested in the subject, you might want to keep an eye on JSR-107, the "Java Temporary Caching API," at http://jcp.org.jsr/detail/107.jsp. 6.3.3 Use Batch Methods to Group Related Method CallsI once had lunch with a group of programmers who were visiting my company from Germany. They were from a company that was partnering with the company I was working for. We were supposed to use CORBA to handle the method calls from their system to ours. But there was a problem. "CORBA," they confided to me, "doesn't work. It's much too expensive. It doesn't scale very well at all." I was a little surprised by this. I'd been using CORBA for a couple of years at that point, and I'd never run into scalability problems (at least, not at the scale we were talking about). It turned out that their idea of "building a distributed application" was to "build a single-process application using standard object-oriented techniques and then stick the network in." Their testing had revealed that if an instance of Person is on one machine, and the user interface displaying information about the person is on another machine, a sequence of fine-grained calls such as getFirstName( ) , getLastName( ) , and getSSN ( ) to get the values to display on the screen has performance problems. Once I figured out what they were complaining about, I had a solution: "Don't do that." The object decompositions that make sense in a single-process application often don't make as much sense in a distributed application. Instead, you need to carefully look at the intent behind sequences of calls and see if you can encapsulate a sequence of calls in one bigger method call (which I refer to as a batch method). In the previous example, the sequence getFirstName( ) , getLastName ( ), getSSN ( ), . . . should really have been a call to a new method called getDisplayData( ) . Of course, getDisplayData( ) shouldn't exist for a single-process implementation of the Person class (it completely breaks encapsulation). But it has to be there for the distributed application to perform well, and that's the point of this practice. How do you spot a potential batch method? There's no cut-and- dried way to do so (that I know of). But here are four rules of thumb for when batch methods are appropriate:
6.3.4 Use a Server-Side Proxy to Manage Transactions Across ServersSometimes a client will make a series of calls to a server, and these calls, while conceptually a transaction, aren't easily batched, or don't feel like a batch method. A classic example of this is transferring money between two bank accounts. If you have two distinct servers (one for each account), this involves the following four steps:
This sequence of calls should not be executed from a client application for two reasons. The first is that client applications are often deployed across a WAN, which is slow ( especially compared to a datacenter managed by IT personnel). Making four concurrent remote method calls under those conditions could lead to significant performance problems. The second reason is that logic about transactions will most likely change. Even something as simple as "We've decided to log all money transfers from now on" would force you to redeploy a new version of the client that would call a logging method on a server somewhere, which is very bizarre in and of itself. The solution is to use a server-side proxy for the client. A server-side proxy is a server, running "near" the other servers, which is client-specific and whose sole role is to manage complex transactions for the client. If you're using a server-side proxy, the previous calling sequence turns into a single call to the proxy's transferMoney method. The proxy still has to perform the four remote method calls, but it does so on a LAN inside a well-managed environment. Figure 6-1 illustrates the trade-off. Figure 6-1. Using a server-side proxy6.3.5 Use Asynchronous Messaging Wherever PossibleMost remote method calls are synchronous: the calling thread stops and waits for a response from the server. An asynchronous method call is one in which the caller doesn't wait for a response, or even wait to know if the call succeeded. Instead, the calling thread continues processing. RMI doesn't directly support asynchronous calls. Instead, you have to use a background thread to make the call. In Example 6-2, the calling thread creates a command object and drops it off in a background queue for execution. This leaves the calling thread free to immediately resume processing, instead of waiting for a response, as in the following code snippet: BackgroundCallQueue_backgroundQueue = new BackgroundCallQueueImpl( ); // . . . _backgroundQueue.addCall(new SpecificCallToServer( . . . )); Example 6-2 is a sample implementation of BackgroundCallQueue and BackgroundCallQueueImpl . Example 6-2. BackgroundCallQueue.java and BackgroundCallQueueImpl.javapublic interface BackgroundCallQueue { public void addCall(RemoteMethodCall callToAdd); } public class BackgroundCallQueueImpl implements BackgroundCallQueue { private LinkedList _pendingCalls; private thread _dispatchThread; public BackgroundCallQueueImpl ( ) { _stopAcceptingRequests = false; _pendingCalls = new LinkedList( ); _dispatchThread = new Thread(this, "Background Call Queue Dispatch Thread"); _dispatchThread;.start( ); } public synchronized void addCall(RemoteMethodCall callToAdd) { _pendingCalls.addCall( ); notify( ); } public void run( ) { while (true) { RemoteMethodCall call = waitForCall( ); if (null!=call ) { executeCall(call); } } } private synchronized RemoteMethodCall waitForCall( ) { while (0== _pendingCalls.size( )) { wait( ); } return (RemoteMethodCall) _pendingCalls.removeFirst( ); } private void executeCall(RemoteMethodCall call) { // . . . } } This isn't very complicated code, but it does offer two very significant advantages over synchronous method calls. The first is that it decreases the time a main thread spends sending remote messages. Imagine, for example, that a user clicks on a button, and, as a result of that click, the server needs to be told something. If you use a synchronous method call, the button processing time (and, hence, the perceived performance of the application) will include the time spent sending the remote method call (and the time the server spends processing the call). If you can make the call asynchronously, in a background thread, the application isn't any faster or more efficient, but the user thinks it is. The second reason is that, once you've moved to a model where requests are dropped off into a queue, you can tweak the queue and make performance improvements without altering most of the client code. For example, if you are making a lot of calls to a single method on a server, you can group these calls and make them in a single call. For example, instead of 100 calls to: server.patientGivenMedication(Patient patient, Medication medication, long time, HealthCareProvider medicationSource); you might have 1 call to: server.patientsGivenMedication(ArrayList medicationEvents); Because each remote method call contains information about all the relevant classes involved, this will dramatically reduce both marshalling time and bandwidth. Instead of marshalling and sending information about the Patient class 100 times, you will send it only once. There are two major downsides to putting messages in a background queue. The first is that your code will be harder to debug. Decoupling the source of the remote method call from the time the remote call is made makes it harder to trace the source of logical errors. For example, if a command object has the wrong value for an argument, it's harder to track down the source of the error. The second problem with putting messages in a background queue is that it's harder to report failures (and harder to respond to them). If a user clicks a button and therefore thinks of an operation as "done," it can be disconcerting for him to find out later on that the operation failed. Given all this, the question is: when should you use asynchronous messaging? Here are three indicators that a method can be put safely into a background queue:
Using callbacks to let the user know the outcome of an event could incur some additional overhead. It replaces one bidirectional socket connection (the method call and then the return value) with two unidirectional connections (the method call is sent and then, in a later connection, the response is sent). Breaking apart messages like this is a useful technique for optimizing perceived performance, but it almost always incurs some extra overhead. |
I l @ ve RuBoard |