Super Computers | Inescapable Data: Harnessing the Power of Convergence (paperback)

Silicon Graphics (SGI), Cray, IBM, and Sun all make super-computer class systems. Other players have come and gone, and in spite of their importance to the scientific community the market for super computers has not grown substantially over the past few years. But, the machines themselves have truly grown in power and capabilities, and the value of their contribution to society is rarely acknowledged.

The federated or grid computing models that we examined in Chapter 6, "Connecting Medicine," now threaten to take applications away from the current super-computing platforms because a great many problems are now solvable by these alternative architectures. Problems that can be cleanly broken up into smaller pieces, distributed outward to many small computing elements, and re-assembled later are now migrating over to the alternative models.

Even so, some of the hardest problems in the world can only be solved by super computers. In addition, studying the application, architecture, and deployment of super-computer solutions is useful in the context of Inescapable Data because over time, many large systems architectural approaches move downstream and add tremendous value to the general population of smaller computing platforms. Super computers are also critical to unlocking the value contained in the biggest data sets known in the fastest amount of time (albeit at significant cost), thus solving some of the most perplexing problems in areas such as meteorology, seismology, and medical researchall endeavors that will benefit from the future implementation of Inescapable Data gathering devices on Earth. Changes in present-day super-computing architectures are forthcoming that hold hope of step-function increases in capability without the requisite increases in physical size and cost.

Dr. Goh elaborates:

Two years ago, we sat down and decided to go with a clean sheet of paper when designing our mew platform for production in 2007. One thing we observed, as HPC (high-performance computing) systems evolved, their peak performances were running away from sustainable application performances; that is, applications could only realize a small fraction of the claimed performance of the system. That "gap" has been growing and growing over the years.

The present gap between an application and the inability of the machine to use its massive compute resources efficiently occurs because it is presently difficult to optimize applications to today's multiprocessor super-computing architectures. Imagine being a waitress or waiter and needing to set 10 different tables for the dinner rush. You can set one table at a time or stage and carry all the knives on one loop through the tables, and then the forks, and so on. The former way is analogous to how most programmers tend to create software applications, because it is simpler to debug and understand. The latter way is similar to the way super-computer chips would rather see things organized. So, chips perform at their best when they are running handcrafted algorithms that are tailored to the needs of those CPUs. An application that has not been laboriously "trained" cannot extract maximum machine efficiency.

How big is the efficiency gap? Worst case, a user might only realize 10 percent or less of the super computer's rated performance for some applications. Given the multimillions of dollars these systems can cost, purchasers find this inefficiency hard to swallow, leaving many potentially solvable problems unsolved because the cost of a solution is simply too great. Another way to view the costliness of this gap is to consider the power consumed by the largest super computers in the world. It is in the order of a megawatt. If we are only getting 10 percent useful work out of them, this translates to effectively wasting 900,000 watts of power (and more, if we also consider the air-conditioning power required to remove this wasted heat).

Nevertheless super-computer customers continue to buy increasingly more powerful systems in an effort to throw more and more hardware at a problem to get to a suitable performance solution. Cost aside, this poses other significant problems. For example, cooling these massive systems is problematic. Manufacturers have gone back to air-cooling processors (fans) versus the more cumbersome liquid cooling methods. A 10,000-processor system generates a gigantic amount of heat that has to be cooled rapidly. The temperature on the surface of the near-microscopic working part of a computer chip reaches the same temperature as a nuclear reactor core. Consequently, air flow under the floors of these new computers has been known to approach 60 miles per hour. Needless to say, having fewer processors that do more useful work has to be one new direction manufacturers take.

As discussed in Chapter 4, "From Warfare to Government, Connectivity Is Vitality," super-computersized problems are mathematically intensive. Historically, nearly all of these problems have been "floating-point" intensive as well. Dr. Goh and his team set out to do some research to see to what extent this is still true today of the problems being solved by SGI's current customer base. As noted, any application suitable to a federated computing model has appropriately moved onward. The remaining applications in fact do have a different but common set of characteristics.

One of the hallmarks of a super-computer application is the requirement for the machine to support relatively enormous random access memory (RAM) usage, measured on a scale that most computers would measure disk space. A super computer could have a terabyte of RAM and have thousands of processors accessing that RAM simultaneously. RAM access patterns for super-computer applications differ from those of conventional applications.

Consequently, the design of super-computer RAM differs fundamentally from that of more conventional computers. Think of super-computer RAM as a series of cells organized in a similar fashion to that of a chessboard. As such, moving data into and out of memory cells is exceedingly expensive by conventional computing standards. Super computers are specially designed to make such operations better (but still challenging), and much of the performance increase comes from clever software that knows how to remap and move blocks of data in the most conducive manner.

Embedded Density

Case in point, Mercury Computer Systems (the company introduced in Chapter 4) is one of a number of companies that are known for "handcrafted" data movement and signal processing routinessoftware that is written in basic machine instruction code and uses nearly all the horsepower of computing machinery it embodies. Mercury's computing systems are essentially mini super computers embedded into equipment designed to withstand the rigors of harsh (and physically small) military-application environments.

Handcrafting is nearly a lost art in the computer world. The general-purpose computer industry is driven to mass producing hardware that runs applications written in higher-level programming languages and tools that can produce more "utility" with less development and debugging cost. (However, it does so at the expense of wasted CPU cycles.) Military applications have extreme constraints on heat and physical size, and so the effort has to focus largely on the development of algorithms that get the most efficiency out of embedded, special-purpose computers. On the other hand, commercial computer manufacturers can be more lax on processing efficiency as well as size and heat generation. Super-computer applications not only use enormous amounts of memory, they spend a fair amount of time simply moving data arounda thoroughly nonmathematical operation. Dr. Goh's research of his customers showed the actual dependency on floating-point operations, although still critical, was not the 95+ percent everyone assumed, plus dependency varied greatly from one application to the next. Some applications would require very few floating-point operations, but needed the super-computer memory architecture and other speed benefits. Others were closer to traditional dependency, but none as high as 90 percent. Memory rotations aside, today's typical high-end application now requires a sizeable amount of integer processing that needs to take place along with the floating-point operations and other work. The answer, Dr. Goh believes, is tightly integrated hybrid processing architectures.

Mercury Computer Systems has held the same belief for some time, necessitated by the military computing customers they largely serve. Nearly all Mercury systems are sold with a variety of processors (scalar and vector) tightly knitted together into a monolithic memory model with super-fast internal element interconnections. Mercury's systems are typically optimized for placement in very small spaces and harsh environments, and thus play in somewhat of specialty-market segment. SGI and other classic super-computer manufacturers are now heading the same direction.

Many challenges will remain, however. The need for different types of processors (floating-point plus scalar) within the same architecture adds a great deal of application complexity that the machine internal operating system software must shield. The open-source community will likely be the source of standard Linux-based application programming interfaces (APIs) that allow higher-level application engineers to write applications based on a "core" set of mathematical and data-movement services that nearly perfectly exploit the power of the hardware.

If super-computer makers continued along the path of squeezing only 10% performance improvements out of our highest-end computing solutions, we would never solve today's toughest computational problems. Instead, they have to step back, see the processing "Muda" and completely redesign. By so doing, they will hopefully be able to leap frog various problem spaces and advance data analysis and consumption at a rate that would take a decade or more by conventional measure. SGI, Mercury, and other forward-looking companies are designing new systems that have an appropriate variety of computing elements. If done correctly, the efficiency gap that forces applications to utilize as little as 10 percent of a machine's power could be improved to 50 percent or highera monumental leap that should yield significant advances in the number and types problems that super computers can solve. The new approaches and technologies developed will ultimately move downstream and wind up possibly in computers the size of today's workstations and maybe even laptops. (Note that today's laptops pack as much computing power as a small IBM mainframe of 20 years ago.) Imagine doing home weather prediction and personal protein-folding analysis as you watch Sunday-afternoon football. Such step-function advances will be critical to enabling localized processing of the massive amounts of data that will be available to us as citizens of the Inescapable Data world.