Designing the Environment

With the dossier of information required from your client, you should be able to come up with a solid proposal for your systems architecture.

Let's take a look at each of the building blocks of systems architecture we've identified, and see how one can best design an environment to fit these requirements.

Hosting and Connectivity

Unless you're designing a totally closed system, there will almost certainly be a requirement for Internet connectivity and a secure environment for the hosting of your client's servers.

Without a doubt, the best approach here is a data center. The sole purpose of such facilities is to provide a secure, air-conditioned, controlled environment for servers and other equipment and, frequently, provide some kind of network connectivity to them.

Because these facilities house servers designed solely for remote access, they are often rather inhospitable environments for human beings. The use of carcinogenic fire retardants has been made unlawful in most states and countries, but the intense air conditioning can provide even the fittest and healthiest software architect with a wheeze upon exit. However, this environment is perfect for servers.

There are two types of data center carrier-specific and carrier-neutral. A carrier-specific data center is typically owned by the ISP providing bandwidth and connectivity there, and, therefore, only that particular ISP is available for connectivity. A carrier-neutral center concerns itself only with housing equipment and allows you to connect with any number of ISPs providing connectivity there.

Both have their advantages and disadvantages. Carrier-specific centers tend to have a more integrated support and billing approach, which can certainly cut down on the hassle. With a carrier-neutral center, however, should one ISP prove unreliable or inappropriate, or you just fancy a change, you can move your equipment without physically moving it at all. Either way, you will typically be presented with a product that consists of a certain amount of rack space, a suite of IP addresses (or, in some cases, a range of addresses and a single IP on their network, requiring you to fit a router of your own), and a handoff a physical Ethernet link into your ISP's network. Your ISP will, of course, give you a decent helping of support and assistance in getting up and running, but it's important that you actually order the right product in the first place.

Calculating CIR

Your ISP will sell you either a quantity of bandwidth measured as a Committed Information Rate (CIR) in megabits, or a quantity of monthly transfer, measured in gigabytes. A CIR will almost always be the more expensive option but is essential for bigger sites. It allows you to push as much traffic as possible within a given bandwidth. To calculate a CIR, consider the number of simultaneous sessions your application is likely to support.

Say that your Webmail Web application needs to support a total of one thousand simultaneous sessions. This represents one thousand people all using the application at any given point in time. It does not represent one thousand simultaneous connections. This distinction is cataclysmically important. When using any given application, each session will consist of a flurry of HTTP requests (page plus images), followed by a pause while content is read, followed by another flurry of HTTP requests, and so on. With this in mind, connectivity is being made only a given percentage of the time.

It is also important to understand the likely dwell time on each page. This is something only user testing can determine accurately, but you could estimate it to be something like 30 seconds for a typical Web page on a consumer Web site, or 60 seconds on a Web application (such as a Webmail application).

If you assume that each page takes approximately 10 seconds to load, then on a Web application there will be a pattern-per-session of 10 seconds of traffic followed by 60 seconds of no traffic, followed by 10 seconds of traffic...and so on.

One quick and slightly crude calculation later, this means that at any given moment in time, there is a 15 percent possibility that any given session is transferring data. The number of simultaneous connections is therefore likely to be equal to 15 percent of your simultaneous sessions for a Web application, or 2 percent for a Web site.

In the preceding example of 1,000 simultaneous sessions, we can tell that this represents 150 simultaneous connections. But with some users on T1 lines, and others on 56K dial-ups, how do we know what bandwidth requirement this represents?

This is where the data mined from your client comes in handy. The client may well have some figures on his or her projected demographic. People in offices, for example, are all likely to have E1 lines, but these are likely to be shared among other people in each office. For home users, you can use national figures on broadband penetration to figure out the likely percentage on broadband and the likely percentage on dial-up.

Even better, if you as a PHP professional have access to some server logs for similar past projects, dig them out and do the math yourself.

Assume that this application targets home users only. Broadband penetration in the United States is approximately 30 percent of all home Internet connections at the time of writing. This is all almost exclusively at 512 Kbps. The rest we can assume is at just 56 Kbps, which in reality always connects at 48K or less.

With this in mind, the math is simply 30 percent of 200 at 512K and 70 percent of 200 at 48K. Total requirement? About 37 Mbit to ensure that everybody gets full speed let's say 40 Mbit after leeway which is not cheap.

Thankfully, 1,000 simultaneous sessions is very rare. That would be a huge Web application. To help your client calculate simultaneous sessions, ascertain the total regular users of the application. If, for example, your Web application has 5,000 regular users, each of whom uses the platform once a day for a total of twenty minutes, and that usage follows an approximately normal distribution centered around 6 p.m., college calculus shows us that the likely maximum (at 6 p.m.) number of simultaneous sessions will be around 2 percent of your regular user base, or just 100 simultaneous sessions a much more reasonable 4 Mbit.

Essentially, the integral of a graph showing sessions on the Y axis and time on the X axis, divided into intervals equal to the average length of a session, will equal the total number of user sessions per day. From this, you can determine the approximate formula that describes the curve and use that formula to calculate the peak number of sessions (the Y-value) at the known peak X-value.

For further information, pick up any college calculus textbook and hunt details on the "normal distribution.''