Computer storage, the ubiquitous component that has invaded both our personal and working space, has taken on a life of its own with the new technology of storage networking. Like it or not, weve become a global society with a currency of information thats exchanged within todays computer networks. Nearly every aspect of our world, including the places where we live and work, are accessible through the Internetthe data from which is stored on
Our lives revolve around computer technology. Although we dont always realize it, accessing information on a daily basis the way we do means there must be computers out there that store the data we need, making certain its available when we need it, and ensuring the datas both accurate and up-to-date. Rapid changes within the computer networking industry have had a dynamic effect on our ability to retrieve information, and networking innovations have provided powerful tools that allow us to access data on a personal and global scale.
With so much data to store and with such global access to it, the collision between networking technology and storage innovations was inevitable. The gridlock of too much data
Storing and accessing data starts with the requirements of a business application. What many application designers fail to recognize are the multiple dependent data access points within an application design where data storage strategies can be key. Recognizing the function of application design in todays component-driven world is a challenging task, requiring understanding, analysis, and experience. These functions are necessary to facilitate
Figure 1-1: Database choices for applications
In all fairness to the application designers and product developers, the choice of database is really very limited. Most designs just note the type of database or databases required, be it relational or non-relational. This decision in many cases is made from economic and existing infrastructure factors. For example, how many times does an application come online using a database purely because thats the existing database of choice for the enterprise? In other cases, applications may be implemented using file systems, when they were actually designed to leverage the relational operations of an RDBMS.
The major factors influencing non-linear performance are twofold. First is the availability of sufficient online storage capacity for application data
First, lets look at the availability of online storage. Certainly, if users are going to interact with an application, the information associated with the application needs to be accessible in real time. Online storage is the mechanism that allows this to happen. As seen in Figure 1-3, the amount of online storage required needs to account for sufficient space for existing user data, data the application requires, and unused space for expanding the user data with minimal disruption to the applications operation.
Figure 1-3: Online storage for applications
However, as users of the application submit transactions, there must be locations where the application can temporarily store information prior to the complete execution of the transactionssuccessful or
Second, among the factors influencing non-linear performance, is the number of users accessing the application or planning data access. Indicated in Figure 1-4, the number of user transactions accessing storage resources will
Figure 1-4: The optimum configuration of user transactions versus storage resources
Figure 1-5: The increasing diversity of application data
Our storage configurations today reflect the characteristics of the client/server model of distributed computing. Evolving out of the centralized mainframe era, the first elementary configurations of networking computers required a central computer for logging in to the network. This demanded that one computer within a network be designated, and that the necessary information be stored and
Figure 1-6: The client/server storage model
The client/server storage model provides data storage capabilities for the server component as well as storage for the network
Along with the demand for client network connectivity, sharing information on the server increased the need for capacity, requiring networks to use multiple servers. This was not due so much to server capacity but that online storage needed to be large enough to handle the growing amount of information. So, as demand for online information increased so did the need for storage resources, which
This quickly grew into the world of server specialization. The network servers had to handle just the amount of work necessary to log clients into and out of the network, keep their profiles, and manage network resources. Client profiles and shared information were now being kept on their own file servers. The demand for online storage space and multiple access by clients required that multiple servers be deployed to handle the load. Finally, as the sophistication of the centralized mainframe computers was downsized, the capability to house larger and larger databases demanded the deployment of the database server.
The database server continues to be one of the main drivers that push the client/server storage model into new and expanded storage solutions. Why? Initially, this was due to
As the relational database model became pervasive, the amount of databases within the network grew exponentially. This required that many databases become derivatives of other databases, and be replicated to other database servers within the enterprise. These activities
The database server model also required that servers become more robust and powerful. Servers supporting databases evolved into multiple CPUs, larger amounts of RAM, levels of system cache, and the capability to have several paths to the external storage resources.
Figure 1-7: An overview of database server configuration
As data grows, storage solutions with the client/server storage model continue to be
An example is the exponential growth factors for storage supporting the data warehouse and data mart application segments. Typical of this is a target marketing application that ensures customers with good credit are selected and matched with offerings for financial investments. These matches are then stored and
Figure 1-8: Data moving through a data warehouse application
The actual data traverses from online storage A on server A, where the customer activity and tracking database resides, to populate the data warehouse server online storage B on server DW. This selection of good customers from server A populates the database on server DWs online storage B where it is matched with financial investment product offerings also stored on server A. Subsequently, the aggregate data on C is developed from combining good customers on server B to form a mailing and tracking list stored on server DWs online storage C.
As we have described in our example, data moves from server to server in the client/server storage model. The capability to store multiple types of data throughout the network provides the foundation for selecting, comparing, and
As data grows, so does user access. The problem of access has to do with understanding the challenges of the client/server storage modelthe biggest challenge being the Internet. The amount of data stored from the Internet has stressed the most comprehensive systems and
As web servers began to grow due to the ease of deploying web pages, not to mention the capability to connect to an existing networking structure (the Internet, for instance), the amount of users able to connect to a web server became limited only by addressing and linking facilities. Given the client/server connectivity, servers on the Internet essentially had no limitations to access, other than their popularity and linking capability. Because of this, an even more
Figure 1-9: An overview of web server storage
If we examine the performance and growth factors for storage supporting the web server applications, we find data moving in a
The actual data traverses from online storage on web server A, where the customer authentication and access takes place, then moves to online storage on WebUser servers where profiles, activity tracking, and home pages are stored. Essentially, the user waits here after logging in, until they issue a request. If the request is to check e-mail, the WebUser transfers the request to the WebMail server where the request is placed in an index and transferred to the location of the users mail. In this example, were using WebMail TX.
However, if the user issues a request to access another URL in order to browse an outside web page, the WebUser server will send the request to web server A or B. Either of these servers will transmit the request through the Internet, locate the outside server, bring back the web page or pages, and notify the WebUser server who in
Although Figure 1-6 is a simple example of an ISPs web server configuration, the process becomes convoluted upon much interaction with the ISPs storage infrastructure. We can estimate the scope of this infrastructure by multiplying the number of customers by one million. If each customer stores 8MB of data within their profiles, e-mail, and activity tracking, this becomes 8 terabytes, or 8 million MBs, of necessary online data storage. In estimating the scope of interaction (or data access), each customer issues, on average, 20 requests for data with most being outside the ISP infrastructure. Each request requiring, on average, 500KB of data to be transferred, therefore becomes 10MB of data transferred. Multiplying this by one million customers equals 10 terabytes of data transferred through the ISP storage infrastructure every half
The capability to maintain these environments, like the database server environments with database systems, multiple servers, and exponentially expanding storage infrastructure, requires an ever-increasing number of skilled technicians and programmers.
We are left asking
In addition, for every byte stored on disk there is the possibility that someone will access it. More and more users are coming online through Internet-based capabilities. Increased functionality and access to a diversity of data has created an informational workplace. Our own individual demands to store more and more data are pushing the limits of computer networks. It will be an increasing challenge to accommodate a global online community with client/server storage configurations.
Data flow challenges have historically been the boot disk of innovation. The data storage/data access problems have prompted the industry to bootstrap itself with creative solutions to respond to the limitations of the client/server storage model. In 1995, in what began as efforts to decouple the storage component from the computing and connectivity elements within a computer system, became a specialized architecture that supported a significantly dense storage element (size, for example) with faster connectivity to the outside world. Combining elements from the storage world with innovation from networking technologies provided the first
As with most out of the box innovations, storage networking combines aspects of previous ideas, technologies, and solutions and applies them in a totally different manner. In the case of storage networking, it changes the traditional storage paradigm by moving storage from a direct connection to the server to a Network connection. This design places storage directly on the network. This dynamic change to the I/O capabilities of servers by decoupling the storage connectivity provides a basis for dealing with the non-linear performance factors of applications. It also sets the foundations for highly scalable storage infrastructures that can handle larger data access
However, this architectural change comes with costs. It requires rethinking how applications are deployed and how the existing client/server storage model should