Intelligent Networking | Inescapable Data: Harnessing the Power of Convergence (paperback)

"The question is not about the data, but about applying context to the data," states Cheng Wu, founder and chairman of Acopia Networks (and formerly founder of ArrowPoint Communications). The world of data networking is going through some powerful changes that might re-label the data networking industry as information networking. The old days of moving bits without knowledge of what they are or their purpose for existence seems to be coming to a close. Wu's new venture is squarely focused on more intelligently managing data motion and data location. (Storage might be too old of a word now, because storing implies keeping something in one place for a long period of time.)

Wu believes that there is a fundamental gap between applications that are seen more as services to the users of applications, and the computing infrastructure that supports them, and that this gap has to close for new efficiencies to be realized. Wu believes that the computing industry has been constrained by what is now a rigid architecture model that dictates how data is stored and moved. As a result, the industry can only achieve a yearly 10 percent improvement of some aspect (faster networking switches, faster server CPUs, faster disks, etc.) of the computing infrastructure. In his view, step-function improvements will come from a completely new attitude about what the "data network" really is.

"The core intelligence of the data moves away from computers and into the network. Endpoint intelligence goes into the network. Today, however, the network is assumed to only be a transport. We need to first change our own view of what a network can do for us and then match that with proper machinery," states Wu.

In Chapter 11, "Computer Storage Impacted by Inescapable Data," we looked at how the concept of a super file would allow storage systems to be far smarter than today regarding how they manage, secure, and protect data. The same notion applies to networking. The more information we can glom around datainformation that describes data, otherwise known as "metadata" (data about data, if you will)the better the chance that a data package will move efficiently and purposely throughout the network. Data today flies through a great many infrastructure components, first within a computer system itself, then throughout the network, and finally landing on a storage device. If each of those components has some understanding of what is inside the data package traversing the computing infrastructure, from source to destination, and the intended use of the data, the entire infrastructure would function far more efficiently.

Every network switch, every storage element, every router, and every computer along the way, each could pick apart a higher-order message contained in metadata and add value to the infrastructure that ultimately supports computing services as seen by application users by finding a faster route, caching the information for later faster access, better protecting the information by duplicating it, pregathering related information that might be useful soon, etc. Today, a typical storage request looks like this: "Fetch block 10,221 from disk 78." A typical LAN transaction looks like this: "Here's 768 bytes of something that needs to land on port 21 on host 551." Such messages make it difficult for the vast amount of gear between the endpoints to do anything more useful with it than send it along on its merry way. The newer view would be requests that perhaps look like, "Fetch employee record 556," or "Here is a picture of my hairy dog taken last Tuesday outside of Boston on a rainy day just before noon, and it's pretty important to me for my archives, but not as important as my work proposals, and I'm likely to access it a bunch of times this month but then almost never again. Now, go store it somewhere."

This new goal is not magic, and we have seen similar examples in history that at first seemed like science "hope." For example, consider how memory management inside of your computer works. When computers were first invented, a program was held in the computer's memory, and the program's data in one flat contiguous space. The machine language program itself specified exactly what memory locations were used to store instructions and data, and those instructions had to be hard coded into the program as well. At that time, computer memory was extremely expensive by today's standards. There was a real limitation on how much memory a manufacturer could build in to a system and still make it affordable by even the largest of corporate buyers. As programs got larger and more sophisticated, the need for more memory soon outstripped a manufacturer's ability to affordably attach it to the system. A need arose to make some other form of memory or storage appear to the system's processor as real memory. The concept of "virtual memory" was born.

Virtual memory allows a processor to still believe that a program and its data are loaded into a particular place in real memory. However, behind the scenes, the computing system machinery transparently moves data back and forth from real memory to some other storage location such as a disk, in essence fooling the processor into thinking it actually has more real memory than actually exists. Processing has progressed to the point that today the processor inside your laptop program has no knowledge about where programs and data physically live. At any given time, the bits could live in RAM, in some intermediate memory cache, on disk, or even out on the Internet somewhere. An intervening layer of technology presents a virtual view of data and the pathway to it, and hides the underlying complexity; program and data bits are shuffled around at will behind the scenes.

This is essentially the view that Wu has for the future of the general networking and data access world. Today, it may sound like future-ware to say, "Throw your Excel spreadsheet into the network, never specifying a destination for it, and never knowing where it actually physically lives at any instant." The network may fling it around, it may make copies of it, it may decompose it a bit and index parts of it, and it may move it far away from you without asking for your permission first. No matter. Don't worry. Let the network figure all that out. The network will respond to you instantly and get it back to you as soon as you need it. The approach is similar to the virtual memory concept at work inside your computeryou have no idea exactly where in the network your Excel spreadsheet is living, how much of it is actually in your laptop's memory or disk, and how often it cycles around to various other locations within the network. The value of this approach to Inescapable Data is enormous because it allows the network to handle the large volumes of data that will be generated by Inescapable Data devices automatically and with far greater speed.

Intelligent networking is currently in its infancy. We're seeing evidence in a new class of network switch that can decompose some amount of data on-the-fly and perform some optimizations as a result. XML-encapsulated (i.e., self-describing) data bundles allow devices such as the DataPower XML processor to add value to networks. If more and more data goes the encapsulation route, we stand a better chance of seeing significant information management benefits.