5.4 The Internet | Internet-Enabled Business Intelligence


Team-Fly

	Internet-Enabled Business Intelligence By William A. Giovinazzo
	Table of Contents

	Chapter 5. Servers: The Heart of IEBI

5.4 The Internet

The solution to the problem of distributing information throughout the organization while providing centralized control is the Internet. All Internet-enabled applications, however, are not created equal. There are Internet applications, and then again, there are Internet applications. In Chapter 2, we discussed Tim Berners-Lee's proposal for a hypertext markup language and how that proposal served as a basis for the first browser, MOSAIC, written by students at the NCSA in Urbana-Champaign, Illinois.

Back then the delivery of information over the Web, while simple, was very limited. As shown in the Figure 5.5, the browser communicated with the Web server using HTTP, the HyperText Transfer Protocol. The browser would make requests of the server that in turn passed back HTML, HyperText Markup Language, pages. There wasn't much to an HTML page; it contained a mix of text, the specification for the display of that text, graphics, and links to other HTML pages. The client in this environment simply served as a means to display the data and communicate with the server. This of course limited the capabilities of an IEBI application.

Figure 5.5. Basic Web application.

graphics/05fig05.gif

The first challenge to delivering an IEBI application was to maintain state. HTTP is stateless. This means that each request received by the server from the client is treated as separate and unique. There is no association between one request and any previous or subsequent requests. When creating HTTP, the objective was to deliver data over the Internet. Establishing a state was much more complex and was unnecessary to meet the design objectives. If the file didn't make it to its destination, that was fine; a new request could be submitted. There was no long- term damage to an application because of a break in communications.

In a client/server environment, we must maintain state. If, for example, we wish to select a particular data item and drill down on that item, it is a simple matter for the client/server application to send a more specific request to the server. In a stateless environment, where the client is used as means to display the data, it is difficult for the server to maintain state, or the context in which the client is making the request. This limitation to HTTP not only limits the capability of the decision maker to perform analyses, but as we shall see in later chapters, it also limits our ability to analyze activities on our Web site.

There is yet another complication to this scenario: The data must be in the form of an HTML page. Data could not be pulled directly to the browser from a database or an application to fulfill a request. If it wasn't in an HTML page, it was inaccessible. There was no way for the user to truly interact with an application. This, of course, brought us back to the days of the mainframe when there was no such thing as ad hoc reporting. It didn't take developers long to realize this shortcoming. In fact, the interval between the two was so short that one could almost consider them commensurate. To remedy this situation, an interface was developed between the Web server and the application. Figure 5.6 presents this solution.

Figure 5.6. Connecting the Web server to the application.

graphics/05fig06.gif

The Common Gateway Interface (CGI) defines a way to communicate with the Web server. As shown in the figure, the user makes a request via a browser. Typically, the request is in an HTML form. Upon receiving a request, the Web server initiates a CGI process. The information in the HTML form is used by the Web server to build a packet, which ultimately is passed to the CGI program. This packet contains whatever information is necessary for the processing of the request by the CGI program. For example, the request may require a query to a database, in which case the CGI program establishes a connection to the database. It then builds and issues the query. Upon receipt of a response from the application, the CGI program formats the data into an HTML page and passes the page back to the Web server. Once the data is passed back to the Web server, the work of the CGI program is completed and the process terminates. The Web server in turn passes the HTML page back to the requesting browser.

For IEBI applications, CGI was certainly a step in the right direction. As we said earlier, the Web server only understands HTTP, a stateless protocol. Within the Web server itself, there is nothing we would consider application logic. It receives requests for specific files and passes them back to the requesting browser. There is no continuity between one request and the next . Multiple requests made from one browser are treated no differently than multiple browsers each making a request. CGI, however, provides us with two methods of establishing a pseudo-stateful connection.

To establish state, we need to establish a bridge between one request and the next. Figure 5.7 demonstrates the first method of establishing such a connection. In this scenario, we use the fact that the CGI application constructs the Web page to our advantage. Let's say that we are an online bookstore. In this store, we have set up personalized pages for repeat customers. On these personalized pages, we have recommended reading lists. One day, user A at browser A enters our Web site from her personal page. She sees on the recommended reading list Object-Oriented Data Warehouse Design: Building the Star Schema by William A. Giovinazzo and decides to investigate this title further by clicking on the link to that page. The CGI application passes the page back to browser A via the Web server, but prior to doing so, embeds a tag in the HTML page. This tag identifies the session in which this request was made. The CGI program then terminates.

Figure 5.7. Establishing state with CGI.

graphics/05fig07.gif

At the same time, user B receives an email from a friend recommending the same book. The friend included in the email the Uniform Resource Locator (URL) for the book. User B clicks on the link and immediately goes to the page for the book. The Web server starts another CGI process to generate a new page with a completely different tag. Meanwhile, user A reviews the Web page and decides that she must have this insightful, unique book to truly understand star schema design methodologies. She orders the book. When she submits her request to purchase the book, the CGI program receives the HTML page with the embedded tag and determines that the purchase is part of the user A's session. User B, however, not recognizing the brilliance of the author, decides to terminate the session in order to watch championship wrestling.

While we can see that this method is useful in simulating a state to our sessions, there is a problem. Every time we process a request, we need to reconstruct our Web page. In addition, this is only good for one session; there is no persistence. If the user terminates the session and returns at a later time, there is no continuity between the two. To resolve these issues, Netscape developed a solution that is elegant in its simplicity: cookies. YUM! (Personally, I feel that chocolate-chip is overrated. Give me a good oatmeal raisin with a glass of milk any day.) Rather than modify the HTML pages in response to a request, the CGI application requests the browser to create a simple block of text. This text uniquely identifies the browser from other browsers accessing the Web site. Figure 5.8 shows how this works.

Figure 5.8. C is for cookies. That's good enough for CGI.

graphics/05fig08.gif

As we can see, the browser makes its initial request of the Web server. The server, as we described earlier, builds a packet of information and starts the CGI process. If the application does not find a cookie in the packet passed to it from the Web server, it assumes that this is the first visit to this site from this browser. The application therefore adds a request for the browser to create a cookie on the client's local system (in some cases, the user may set the browser to not allow cookies). If it accepts the cookie, the text string is saved in the browser's memory. This text string, the cookie, is passed to the Web site with any subsequent requests. When the CGI application receives the cookie, it will know that this is a returning user. When the browser terminates, the cookies in its memory are written to disk and retrieved later, when the browser is restarted.

We can employ these cookies the same as we did with HTML tags. The difference is that continuity can be maintained even beyond individual sessions. In the example presented in Figure 5.7, we noted that the user ended the session to go watch championship wrestling. Let's say that after a time, he comes to his senses and realizes his dire need for the book Object-Oriented Data Warehouse Design: Building the Star Schema . When he returns to the Web site using browser B, the cookie will be passed along with the request and the application identifies the browser as having visited this page before.

Establishing state provides IEBI systems with two very important capabilities. First, we are able analyze how a user travels through our site. As we watch where a user goes on our site, we gain a better understanding of his or her behavior. Customer behavior becomes input to our IEBI system. Based on these behaviors, we are able to customize or personalize users' experiences on our site. Second, we can provide better support for the decision maker. By establishing continuity, the decision support system can maintain the context in which the decision maker is functioning. Since we now understand where they are and where they've been, we can more easily accommodate what they would like to do. For example, we can provide drill-down, rollup, and rotate capabilities when performing multidimensional analysis. So, establishing state provides IEBI with both better input and better output. In Chapter 13, we will discuss how to use cookies to better understand customer behavior.

CGI was not, however, a panacea for the data warehouse architect designing a client interface. The first challenge, of course, is that we are still working with a stateless connection. Although CGI could be used by a talented programmer to maintain the notion of a session, it wasn't a simple process. It is important to stress that while CGI allowed us to simulate state, the connection between the browser and the application remained stateless. Every request to the Web server generated a new CGI process. CGI did not provide for reentrant code or for sharing code between processes. Consequently, each request required that a new connection be established with the application. Consider what this means in terms of processing requirements. Each request means that an HTML page is sent to the Web server. A new process is initiated and a new connection is made to the database. As you can well gather, in addition to being slow, this consumed a great deal of resources. The resolution to this problem was the application server, presented in Figure 5.9.

Figure 5.9. The application server.

graphics/05fig09.gif

We could characterize an application server as stateful CGI, but such a characterization can be misleading. As we can see in the figure, an application server is a server process that sits between the Web server and the backend database. Recall the definition of a server process: It is a reactive process running virtually forever. The CGI program is similar to the server in that it responds to user requests. It differs in that there is a one-to-one correspondence between each user request and the CGI program, continually starting and stopping with each request. The application server is persistent. Where the CGI program is designed to service one request for its entire processing life, the application server is designed to repeatedly process many simultaneous requests. The browser clients are clients of the application server, which is itself the client of the database server. This in effect abstracts the connection between the client and the database. The application server is now responsible to establish communication with the database server and make efficient use of its resources.

Since the application server process is persistent, we can now maintain a true state for our clients. Just as with CGI, we provide a cookie to the client process to identify an individual browser. When the user at that browser begins a session, we establish and maintain a state for that particular client. The CGI program, in stopping after processing the user request, loses its context. When a subsequent request is received from that client, the state must be retrieved before processing can continue. The persistence of the application server means that it does not loose context between client requests. As new requests are received from the client, the application server maintains the state for each client within its own persistence engine. Users can terminate from these sessions as easily as they can from more traditional client/server or mainframe applications. Inactive sessions or sessions for browsers that have lost connection can be timed out by the application server. The application server can simply terminate these sessions. It can also save their state for retrieval later.

The maintenance of state is just part of the story. While important, the application server also optimizes the utilization of resources. Since the application server is designed to serve many requests at one time, requests share resources. As shown in Figure 5.9, the application server supports its own data cache. The basic principle with caching is that you store data in memory as closely as you can to the user. The application server stores in its memory data it has retrieved from the database server. Subsequent requests from clients requiring the same data can be serviced directly from the application server's cache rather than querying the database. Because data is pulled directly from memory, this improves system response time and reduces the number of queries to the database, ultimately reducing the load on the database server.

Consider the effects of caching on IEBI, perhaps in our organization we have a BI portal, as described in Chapter 3. As part of this portal, we may have a standard set of graphs and reports that are referenced by the C-level executives within our organization. These Web pages are retrieved directly from the application server's cache. As the user drills down for more detail, the application server caches the new pages as well.

Connection pooling is another optimization technique of the application server. As shown in Figure 5.10, pooling allows the application server to service more clients than connections to the database. Again, this is due to the persistent nature of the application server. As we stated, there is a one-to-one relationship between the client and CGI process. Since GCI processes cannot share resources, there is a separate connection to the database for every client request, each connection being reestablished with every new request to the database. The application server opens a set of connections to the database and shares them between the clients. Remember that the application server is the client of the database server. As such, the application server sends requests to the database server over whichever connection is available.

Figure 5.10. Application server resource sharing.

graphics/05fig10.gif

For example, a request is sent from browser 1. The application server may send that request by using a database connection. While the response is being sent back to the client, the application server may receive a request from browser 2 and submit a query for that request over connection A as well. While the request for browser 2 is being processed, browser 1 may send a subsequent request that is processed via connection B. Which browser sends the request has nothing to do with which connection is used to service the request. Connection pooling also allows the application server to establish fewer database connections than clients accessing the system. If, for example, the application server on average has 100 clients accessing the database 30 percent of the time, the system need only establish 30 connections to the database. Another alternative is to establish as few as 20 connections for the typical load, with the option of creating up to 35 to address peak requests.

Going forward, we expect to see two types of application servers. The first will be general-purpose application servers. These will be more or less platformss ervers upon which application developers will be able to develop their own Internet-enabled applications. Examples of these application servers are the Oracle 9iAS and BEA Web Logic. The second category of application servers will come from the application vendors themselves . While these systems will be extensible, their primary focus will be to act as an application server for that vendor's particular set of functionality.

5.4.1 THE INTERNET-ENABLED APPLICATION: A NEW AGE FOR CLIENT/SERVER

In reviewing these architectures, we need to consider an important point. We often hear discussions of client/server versus Internet-enabled or Web-enabled applications as if the two were to distinct solutions from which the data warehouse architect has to make a selection. A Web-enabled application, however, is not something differentit is merely the next phase in the evolution of client/server.

We defined client/server as an architecture in which there are two sets of processes. The first is the server, which is reactive. It runs virtually forever and responds to requests sent to it from the client. The client, the second process type in the architecture, is just the opposite . It is proactive and runs for the duration of the user session. Apply this definition to the Web-enabled application shown in Figure 5.5. We can easily see that the Web-enabled application fits this model. The client browser proactively makes requests of a middle tier , a Web server, that processes the requests and passes them on to some backend server.

As we compare Figure 5.9 to Figures 5.1 and 5.3, we see that Web-enabled applications are simply a specific implementation, or an evolution of client/server applications. Web-based applications are further down the evolutionary chain than client/server. There couldn't have been a Web without the development of client/server concepts. The basic principles of multiple processes communicating with one another to share the workload were first developed in the client/server world.

There is, however, a distinction between a Web-enabled or Internet-enabled application and traditional client/server architecture. Traditional client/server systems have a large server in the background that is front-ended by a smart or fat client. The fat client carries out most of the processing locally. The Web-enabled application has a thin client front-ending a middle tier that is composed of a Web server, an application server, or both. A thin client is a client that has a limited scope; it merely acts as a front end to the background system. The background servers provide the functionality.

5.4.2 THE BENEFITS OF WEB-ENABLED APPLICATIONS

The same technological, financial, and political forces that brought about the shift from mainframe systems to client/server are causing a shift from client/server to Web-enabled applications. These benefits become even more profound in the context of IEBI. As we expand the scope of our IEBI system, the value of the system quickly increases . The Internet allows for the expansion of this scope to extend throughout the entire organization and beyond. No longer bound by the four walls of our organization, the Internet provides for the integration of partner and supplier information systems.

To parallel our discussion of client/server benefits, we begin with the technological benefits. In the client/server era, we had a diverse set of clients. Since a major part of the application ran on the client software, developers would develop software for every major operating system, or at least every profitable one. In addition to the expense of developing multiple versions of the same software, client/server vendors were also challenged with varying degrees of system capabilities. Where one operating system provided capabilities that made certain functionality simple, other systems were more restrictive. These more restrictive systems either limited the functionality of the client or required more of the software developers.

The benefit of a Web-enabled application is a pervasive client. There is one client to which all applications are written: the browser. Figure 5.11 demonstrates this environment. The browser acts as an envelope in which the client operates. As clients are developed, they are written against a standard browser. The browser abstracts the network and operating system functionality. When the client requires these services, such as receiving input from the user or communicating with the server, requests are made via the browser. While not all browsers are created equal, they are close. In a client/server environment, we have a specific client. To distribute our application to a user outside the organization, someone would have to physically install the client on that user's system. Since we are writing to a browser, we assume that everyone has the browser and we can easily distribute the application across our value chain. The browser in turn makes whatever requests are necessary to the operating system. The benefit is that the browser developer, not the software developer, deals with the differences between the different operating systems.

Figure 5.11. Clientbrowserserver relationship.

graphics/05fig11.gif

This gives rise to another technological benefit, the portal. Consider what the portal does. It takes the interface to separate applications and combines them into one Web page. From one Web page, a user can access email, reports, external Web sites, and internal applications. Most important, the user is not aware of moving between applications, but is simply linking to another page on the Web site. This is extraordinarily important to the designer of IEBI systems. Consider the average decision maker. These are management types, not technicians. The IEBI system must be as simple as possible, not because of any lack of intelligence on the part of the decision maker, but because of a lack of interest. The decision maker wants to get to the point. What's the bottom line? Anything more than a few steps will cause him or her to delegate the task to a subordinate, thus diminishing the value of the IEBI system.

These technological benefits lead directly to financial benefits. First, we see that a pervasive client has significant cost savings. The application is written to one environment. This means one set of code to write and maintain. The implication of this, of course, is one programming staff, which is perhaps the most expensive software development resource.

The second financial benefit is a higher return on investment. Earlier, we stated that an information portal makes a simple interface from which the decision maker can easily gather information from a number of sources. We also noted that without such an interface, there is a high probability that the decision maker would gladly delegate the gathering of this information to a subordinate. We need to reemphasize a point made earlier: It is important to the value of the IEBI system that the primary user be the decision maker. Why? One of the shortcomings of IEBI in the mainframe era was that the decision maker had to go through an intermediary to gather data. In the Internet age, why would we settle for creating an intermediary in order to gather data from the system? In effect, we would be returning to the days of mainframes when the consumer of the data and the person who generated it were two separate people. The establishment of a portal gives the decision maker easy access to BI. This ease of access means that the decision maker is the primary user, thus increasing the overall system value. This portal into the business environment creates a single consolidated view of the business. This leads us to the final and perhaps the most important force leading to the development of IEBI: the political force.

We can see how the Internet has satisfied the desires of the major political factions within an organization by returning to our earlier discussion of information governances. We noted that there are three types: dictatorship, anarchy, and democracy. Information dictatorship forced the decision maker to go to a third party for information. We just discussed how such an environment reduces the value of IEBI. In the age of information anarchy, while each department may have been empowered with its own systems, there were significant barriers to establishing an enterprise-wide view of the organization. The decision maker could not get a consolidated view of the entire organization. IEBI moves us from anarchy to democracy. It provides us with the ultimate solution for decision maker.

IEBI threads the eye of the proverbial needle. On the one hand, we have centralized control of information. Rather than a plethora of independent data marts, we have a central data warehouse. Should the need arise for a separate data mart, we establish a dependent data mart. At the same time, the Internet provides us with universal access to this central repository. Universal access has the connotation of access to output, but it is only part of the story. Universal access also means the ability to provide input from any system within the organization as well.

How does this create a political force that moved organizations to Internet enablement? Let's begin by understanding that individual department managers didn't care. Voltaire noted how little one individual is truly concerned with another; it is only when an individual's needs are threatened that one develops a true concern. Basically, what's in it for me? While this is a very pessimistic view of human nature, one cannot help to see a good deal of truth in this sentiment. Department managers weren't concerned. They had their own little information fiefdoms that were meeting their needs. Centralization was the last thing on their minds. If the individual department manager wasn't concerned , then who was? Where did the political force originate? The political force behind centralization of information came from the people who were most affected, the C-level executive, the center of the corporate structure.

While a fragmented information infrastructure may have served the needs of the individual departments, it in no way met the needs of central management. There was no enterprise-wide view. How many C-level executives are challenged with the question, Who are my top ten customers? Only a centralized information infrastructure could answer this question with any amount of certainty . Second, each department built its own independent mini-IT departments. These redundant departments generated greater cost to the organization. In short, these organizations had higher cost and lower effectiveness. The C-level executive, desiring an enterprise-wide view of the organization, drove the organization to developing a centralized information infrastructure.


Team-Fly

Top