10.1 Presentation Fortresses

Presentation fortresses receive HTTP requests from thin clients and deliver HTML pages in response. This simple-sounding task is complicated by numerous factors, most prominently these:

Determining exactly what the client did (push a button? choose a menu item?) is often a nontrivial programming task.
Managing the client's statein other words, determining how the client's current request relates to previous requests from that same clientis often done in ways that severely restrict later scalability.
Creating the best possible client experience regardless of client device often requires device-specific code that is difficult to maintain.
Planning how the presentation fortress will fulfill the client request (e.g., make a purchase) requires an understanding of the overall enterprise fortress architecture.

10.1.1 J2EE versus the .NET Approach

For the presentation fortress, the technology battle is between J2EE's JavaServer Pages (JSP) and Microsoft's ASP.NET/Visual Studio.NET. JavaServer Pages is very similar to Microsoft's pre-.NET technology, known as ASP (Active Server Pages).

JSP and ASP both make the presentation fortress implementor responsible for determining what the client did, managing the client's state, and creating a device-specific experiencethree difficult tasks . A typical flowchart for a JSP or ASP presentation fortress looks something like Figure 10.2.

Figure 10.2. JSP or ASP Presentation Fortress Flowchart

The new ASP.NET programming model is much simpler because most of the fortress responsibilities are managed automatically by the presentation fortress infrastructure. A typical ASP.NET flowchart is shown in Figure 10.3, which, as you can see, is a huge reduction in complexity over Figure 10.2.

Figure 10.3. ASP.NET Presentation Fortress Flowchart

The reason ASP.NET achieves such a huge simplification is not that issues like state management, browser interactions, and device dependencies have gone away. Rather, ASP.NET automatically handles these issues on your behalf .

The JSP technology does have some features in its favor, primarily the support for non-Windows platforms. ASP.NET works only if your presentation fortress is built on the Windows platform. If you must run your presentation fortress on a different platformsay, Linux or a mainframeASP.NET won't help you. In this case you either use a particular vendor's implementation of JSP or come up with a proprietary solution.

The other differentiator between ASP.NET and JSP is language. Like all the J2EE technologies, JSP is hardwired to Java. The entire .NET platform is much more accommodating as far as languages are concerned . This topic is more relevant to the business application fortress, so I will postpone this discussion until Chapter 11 (Business Application Fortresses).

As far as presentation fortress technologies go, then, .NET wins on the simplicity of its programming model and its broad-based language support. J2EE wins on its broad-based platform support and Java support. Clearly both technologies have advantages. These are the main issues that differentiate J2EE's JSP and .NET's ASP.NET. The remaining issues are similar for both platforms, so I will discuss them generically.

10.1.2 Scalability

As usual, my preferred approach to scalability as it applies to presentation fortresses is to use a cluster (scale-out) architecture. For presentation fortresses, clusters are based on HTTP routers that balance the workload across a cluster of machines.

As Figure 10.4 illustrates, when a request comes in from a browser, it goes through a sequence of events like this:

The request passes through the drawbridge and is picked up by an HTTP/IP router.
The router chooses one of several similarly configured machines to process the request. It can do this either randomly or on the basis of its own workload tracking.
The router sends the request to the chosen machine.
The request is passed to the guard on that machine.
The guard decides whether to accept the request, possibly on the basis of the previous history of this browser session.
The request is passed to a worker, who figures out what, exactly, the user sitting at the browser wants to do, possibly using the previous history of this browser session.
The worker coordinates through envoys (not shown) with other fortresses to fulfill the request.
The worker prepares the HTML response and delivers it back to the browser.
The worker may need to update the history of this browser session to reflect the new work.

Figure 10.4. Routing of Presentation Fortress Requests

Notice that I haven't discussed how browser state, or session history, is maintained . The browser state can be stored in any of three places: on the browser, in the shared presentation fortress data strongbox, or on the guard/worker machine. I think of these options as more relevant to reliability than to scalability, so I will wait until the reliability discussion (Section 10.1.4) to compare these options.

10.1.3 Security

The presentation fortress is the most vulnerable part of your entire enterprise system. The only other fortress that is even close to it in terms of vulnerability is the Web service fortress, and Web service fortresses are not used widely. For a presentation fortress, security means two things. First, you must do everything possible to protect the fortress from malicious attack. Second, you must assume that you will ultimately fail and that, despite your best efforts, the horrible "they" will break in anyway.

Here are my major rules for protecting the presentation fortress. There are no guarantees in the world of security, but if you diligently follow these 11 rules, you will eliminate the vast majority of system compromises that occur today. The first seven rules have to do with trying to prevent fortress break-ins:

Rule 1: Put a firewall in front of the fortress. This firewall can be thought of as part of your fortress wall fortification . This is your first line of defense against them .

Rule 2: Put a firewall between the presentation fortress and the rest of the enterprise. When they do break into your presentation fortress, this additional firewall buys you a little more protection. Don't get too excited about this. If they can break through the first firewall, they can probably break through the second one as well.

The combination of Rule 1 and Rule 2 creates what is often referred to as a demilitarized zone (DMZ).

Rule 3: Keep up with security patches from your vendor. Most enterprise systems are compromised through known vulnerabilities with available patches that were never applied.

Rule 4: Run the presentation fortress on a minimal system. Disable any capabilities you do not need. Every capability available within the fortress is a potential vulnerability for them to attack. If you don't need to allow FTP or remote login, disable these options.

Rule 5: Validate all user input. I discussed this topic in Chapter 7 (Guards and Walls). Check for both buffer overflows and illegal characters .

Rule 6: Never assume that an HTTP request is coming from where you think it's coming from. They can create an HTTP request that looks exactly like one coming from your own form.

Rule 7: Guard the connection between the browser and your fortress. Even if perfectly respectable clients are running on the browser, they may be eavesdropping and even changing data as it moves back and forth between the browser and the fortress. Secure Sockets Layer (SSL) is a good candidate technology for this kind of duty.

Rules 8 through 11 exist because we assume the first seven rules will fail. These "fallback" rules are about minimizing the damage they can do once they have succeeded in breaking into your precious presentation fortress.

Rule 8: Run the presentation fortress with minimal permissions. A common attack is one in which they hijack one of the presentation fortress processes (e.g., the guard/worker process). If the process is running with just enough permissions to do its job, then the hijackers are limited in terms of how much damage they can do.

Rule 9: Stage the presentation fortress on a system that is not connected to the Internet. This staging machine has the official version of your presentation fortress, and when they break into your online presentation fortress and corrupt your files (not if, but when), you can re-create them from your staging machine. If your staging machine is not connected to the Internet, then it can't easily be corrupted from outside your organization. Of course, you still have the disgruntled employees inside your organization to worry about.

Rule 10: Don't store anything you care about anywhere in the presentation fortress, including the data strongbox. The data strongbox, after all, is part of the presentation fortress and therefore trusts any process that is a cohabitant of the fortress. The strong box doesn't distinguish between hijacked and non hijacked processes. Do not store credit cards, passwords, secret keys, sensitive customer information, or proprietary algorithms in the presentation fortress. This rule applies not only to data in the data strongbox, but to confidential information anyplace in the fortress, including files, executables, virtual memory, or system registries. Do not allow your presentation fortress to share a resource with any other fortress unless you can absolutely guarantee complete isolation (and you probably can't).

Rule 11: Use a software fortress architecture. This rule may seem odd. If you have read this far, you have presumably bought into the software fortress approach. This just seems like a good place to reiterate the importance of the trust boundary in maintaining overall enterprise security. Your business logic, living as it does in a business application fortress, does not trust your presentation logic, which lives in an alien and highly suspect presentation fortress. The software fortress architecture assumes that when you move from a presentation fortress to a business fortress, you are crossing a trust boundary. Even when they break into your presentation fortress, they still have a long way to go before they can compromise mission-critical systems and data.

Many of these rules are fortress adaptations of recommendations made by the CERT/Coordination Center ( formally known as the Computer Emergency Response Team Coordination Center) for any Internet-connected enterprise system, whether or not it is built with fortresses. These rules, with a few adaptations, just happen to fit especially well with the software fortress approach.

10.1.4 Reliability

Reliability means that you can count on a fortress to be there when your browser clients look for it. There is true reliability and pseudo reliability, the latter based on asynchronous drawbridges and discussed in Chapter 6 (Asynchronous Drawbridges ). The input to a presentation fortress is an HTTP request, which is synchronous. HTTP may not be exactly a poster child for a synchronous messaging system, but it is synchronous nevertheless. Pseudoreliability is therefore not an option for the presentation fortress. We must look for true reliability.

I have discussed scale-out versus scale-up several times in regard to both scalability and reliability. I have pointed out that scale-out is preferable where possible. As I mentioned earlier in this chapter, the presentation fortress is particularly fortunate to have good cluster algorithms in the form of HTTP/IP load balancing. To make the most of the cluster for reliability, however, we do need to build our presentation fortresses appropriately.

Imagine that a browser is making two HTTP requestsR1 and R2to a presentation fortress. Assume that our fortress is implemented with a cluster of four machines, M1 through M4, and is controlled by an IP load balancer. The basic configuration is shown in Figure 10.5.

Figure 10.5. Reliability Configuration of a Presentation Fortress

Assume that both requests, R1 and R2, are "naturally" headed toward machine M1. I'll take you through the possibilities of M1 going down for the count. M1 could go down at any of five possible moments:

Before R1 is received
After R1 is received but before R1 is completely processed
After R1 is processed but before R2 is received
After R2 is received but before R2 is processed
After R2 is processed

Case 1 (failure before R1 is received) is dealt with by the IP router. In this case the router will recognize that M1 is down and reroute R1 to another machine. Overall, fortress reliability is maintained. Nobody will ever know that M1 went down.

Case 2 (failure before R1 has been completed) is dealt with by the fortress architect, who makes sure that any updates done by the fortress as a result of R1 are transactionally protected. I discussed transaction algorithms back in Chapter 3 (Transactions). If all of the R1 updates are contained within a single transaction, then by the guarantee of transactional integrity, either all of those updates will succeed or none will succeed. Therefore, in the worst scenario you will have to submit R1 again. If M1 is still down when you make that rerequest, the failure will be dealt with as a Case 1 failure.

Case 3 (failure between requests) is an interesting case. R1 has been processed, but R2 has not yet been received. Presumably the overall state of the browser session has been changed as a result of the success of R1. The issue is, where are you storing that browser session state (e.g., an updated shopping cart)? This is where the decisions made by the fortress architect become critical.

I said earlier that there are three places to store the changed browser state: on the browser machine, in memory on the fortress machine processing the browser request, or in the fortress data strongbox.

Storing the entire browser session state on the browser machine itself (either in a field or in a cookie) means that the entire session state must be passed into the presentation fortress with every request. This is not very efficient.

Storing the browser session state in memory (or even on a local disk) on the fortress machine causes reliability problems. If the machine goes down between requests (Case 3), you have lost the browser session state and the browser client must restart the session.

The right place to store the browser session state is the software fortress's data strongbox. Now if M1 goes down between requests, there is no problem. M2 can pick up where M1 left off. Remember, the data strongbox is not local to any single machine in the presentation fortress. It is a resource shared by all machines in the fortress.

You might ask what happens if the machine hosting the data strongbox goes down. This is a much different problem, and either we decide not to worry about this, or we use techniques specialized for high database reliability. These techniques include Redundant Array of Independent Disks (RAID) algorithms to protect the disk drives and tightly coupled backup cluster algorithms to protect the machine itself. These are standard techniques in the database industry, and nothing about them is in any way specific to software fortresses.

Storing the browser session state in the data strongbox is good, but not enough. You still need to store the key to that session state someplace. The session state key is the information the fortress machine needs to find the specific session state being stored in the data strongbox. The session state key can't be stored in the data strongbox without a recursive problem being created (where is the key to the key to the session state stored?). It can't be stored on the machine processing the request, because if it is lost, so is the session state itself (the data may now be locked safely away, but we have lost the means of finding it!).

The session state key must be stored someplace on the browser machine. You can put it in a cookie. You can put it in a browser form field. You can try to convince the browser to let you open a data file (good luck!). How you store it is less important than where you store it. It must live someplace on the browser machine.

This means, of course, that the session state key must be passed in with every request from the browser to the presentation fortress. But wasn't this one of my objections to storing the session state itself on the browser?

It was, but session states represent a lot more data than session state keys . The key need be only large enough to uniquely identify the true data to the data strongbox. In the overall scheme of things, the overhead of passing in session state keys with each browser request isn't worth worrying about.

The remaining two failure cases can be considered variants on problems that are already solved . Case 4 (M1 goes down while processing R2) is just a variant of Case 2 (M1 goes down while processing R1). Case 5 is a variant of Case 1 if R2 is the last request, or a variant of Case 3 if there will be an R3.

10.1.5 Integrity

The remaining architectural issues relating to presentation fortresses have to do with internal integrity. How easy is it to mess up the state of the fortress?

Part of this we have already dealt with. We don't have to worry about a partial update of fortress data, for example, if the fortress dies while processing a request. (Why? Because we have transactional protection, as discussed in the preceding section.) But other problems may occur.

Suppose, for example, that the browser sends a request R1 that is "naturally" headed for machine M1 in the fortress cluster, and that M1 dies just before it receives the request. The browser eventually gets bored waiting for a response and sends R1 again. We'll call this version of R1 R1A. But now it turns out that M1 didn't die; it was just out getting coffee. M1 returns and receives both R1 and R1A. Now what happens?

What happens depends on exactly what R1 looks like. If R1 was a request to find out what time of day it is, it probably will do no harm to process both R1 and R1A. But suppose R1 is a request to withdraw 1,000 dollars from your savings account. You might have problems with that request being processed twice!

What is the difference between one request that asks for the time of day and another that withdraws 1,000 dollars from your savings account? This is an interesting theoretical problem that is dealt with most extensively by Pat Helland's fiefdom model for software architectures.

Helland defines requests by whether or not they are what he calls idempotent . He defines an idempotent request as one that can be executed multiple times with the same result as one would get if it were executed once. In other words, as long as the request is processed more than zero times, it doesn't matter how many times more than zero it is processed.

One of the tricks to designing integrity into a presentation fortress is designing all requests to be idempotent. This means recognizing, at design time, when a request is not idempotent and then changing it to an equivalent version that is idempotent. I suspect (but have not proved) that all nonidempotent requests have an idempotent equivalent. In fact, most probably have many idempotent equivalents.

Let me give you an example of a metamorphosis from nonidempotent to idempotent. Consider a request to withdraw money. A nonidempotent request simply asks for 1,000 dollars to be withdrawn from account 100. An equivalent idempotent version includes a request ID that is assigned by the browser. This request ID can then be checked by the presentation fortress to make sure the request hasn't already been processed. If the presentation fortress gets two requests with the same request ID, it discards the duplicates. With the idempotent version, you can send duplicates all night without doing any harm.

Note that the idempotent version is not free. You need to assign request IDs in your browser form. You have to design a system to assign request IDs, verify uniqueness, and store their histories. All of this needs to be designed by somebody. Guess who?