13.4 A Major Role for Software | ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles

Today people may think about their personal computers primarily in terms of the capabilities of installed software. Yet until recently, the tangible computer hardware would get not only most of the credit for achievements in the computer industry, but also the blame for some deficiencies that justifiably could have been directed toward the less tangible software. The vital compiler technology that produces software remains obscure to all but those people familiar with the process of developing applications and system software.

Recognition of the essential role of software to make computing hardware truly useful can be traced back at least into the nineteenth century, when Augusta Ada Lovelace (1815 1852) famously programmed the difference engine built by Charles Babbage (1791 1871). Although Babbage and others may have done more actual programming of his machine, her name was given to the modern programming language Ada; she was also the daughter of the poet Lord Byron.

Grace Murray Hopper (1906 1992) of the US Navy was an advisor to the team that developed the programming language COBOL, which became responsible for rapid adoption of mainframe business computing in large corporations. At the end of the twentieth century, huge "legacy" programs written in COBOL were linked in the general culture with the notorious Y2K problem associated with shortsighted conventions for representing calendar dates. Of course, the Y2K problem was neither unique to, nor the fault of, COBOL programs or compilers.

Turning away from this sketch of the place of computer software in history, we now describe some of the challenges in the role that compilers play when new computer architectures and implementations are produced.

13.4.1 New Architectures

Throughout this book, we have seen that an instruction set defines a computer architecture. People can apply a new computer to useful work only by giving it programs in its own machine language. Compilers and the full range of software development tools mediate between human thought and machine action.

Obviously, the current generation of computers is extensively used in the theoretical exploration, modeling, simulation, design layout, and development of any new architecture. In particular, some form of cross-compiler or cross-assembler is needed to bootstrap software onto the new machine. Eranian and Mosberger have written about the porting of Linux to Itanium systems, before any native hardware was available, using tools such as the Ski simulator (Appendix B.2.3) developed by Hewlett-Packard to run on IA-32 Linux systems.

In order for system software and third-party application software to exist for the market launch of a new architecture, compilers and other tools must evolve concurrently with the design of the architecture itself. In one sense, this is a well-rehearsed endeavor that the industry has encountered with previous architectures. In another sense, the differences and advances of each new architecture make this a fresh challenge that engages some of the best engineering talent.

When a new architecture also represents an emergent type of architecture, as with the EPIC basis of Itanium architecture, the challenge rises. Great risk attends the possibility that previous compiler and software strategies simply will not do justice to the new architecture. Care must therefore be exercised not to let lackluster performance of prematurely released software gain enough public exposure to taint the new contender unfairly. At the same time, this process of bootstrapping the software environment, like most high-stakes intellectual activity, benefits from some open sharing of ideas and discussion of obstacles among colleagues with different backgrounds and perspectives to contribute toward finding solutions.

13.4.2 New Implementations

The risks and challenges differ in degree, though not essentially in kind, when significant new implementations of an established architecture are produced. The architectural contract only assures that software should run on the new implementation, not that it will run optimally. Given the stakes for making the new processor implementation a success, considerable effort must again go into the refinement of compilers or other tools. This concern became more important for RISC than CISC architectures, and it is critical for the EPIC-based Itanium implementations.

Instruction-level parallelism (ILP) is easier to attain in loops, especially with software pipelining, when the instruction stream can remain cached and data prefetching regularities may be possible. Indeed, Li emphasizes that Itanium compilers have not improved ILP very much in code sequences outside of loops, on account of misses in the caches or other processor-maintained lookup tables. Thus better cache management and optimal use of control and data speculation remain near the top of Li's list of EPIC compiler challenges.

Note that compiler writers must reassess the factors identified by Li and others for each new processor implementation, since the quantitative aspects of cache and other processor features change latency relationships and shift the balance points for various heuristics used by compilers. Similarly, software developers should verify that new compilers or new releases of previous compilers will produce versions of their applications that work optimally on all processor generations still relevant to the market for the software application.

Over time, compilers may acquire or retain the ability to "tune" a program for different processor implementations. For example, the HP-UX compilers recognize the +DS option to schedule machine instructions to run best on the original Itanium processor (+DSitanium) or the Itanium 2 processor (+DSitanium2 and/or +DSmckinley), or to run indifferently well on any implementation (+DSblended). The bias of the latter blended option may favor more recent implementations. Thus a later release of a compiler may not produce exactly the same tuning with +DSblended as an earlier release.

13.4.3 New Instructions or More Registers

An architecture that enjoys a long lifetime will very likely experience at least one extension through the addition of new instructions, or sometimes through the addition of new hardware such as more registers (e.g., additional register windows for SPARC).

Either critiques of the architecture or research demonstrating performance advantages can stimulate the consideration of extensions. Both factors influenced the introduction of load and store instructions for byte- and word-length data into the Alpha architecture, which could originally load and store only 32- or 64-bit quantities. Software and firmware changes then permitted older Alpha systems to recognize the new opcodes as exceptions and complete the requested load and store operations through emulation, though not as efficiently as the inline instruction sequences formerly recommended.

Other extensions, such as the multimedia or motion video extensions in most current architectures, process data rapidly as vectors and encode and/or decode compressed video formats using the main processor instead of requiring add-in cards in systems supporting graphic-intensive applications. Hewlett-Packard introduced MAX-2 instructions into the PA-RISC architecture, while Intel Corporation introduced multimedia extensions (some 57 MMX™ instructions) into later implementations of the IA-32 architecture. Similarly, Sun Microsystems introduced VIS instructions into later implementations of the UltraSparc™ processors, MIPS introduced MDMX™ extensions into its processors using the MIPS V™ architecture, and Motorola introduced AltiVec™ instructions into later implementations of the PowerPC processors. Some sets of such added instructions have been quite extensive.

Sometimes it is not widely appreciated that the appearance of new instructions or more registers cannot benefit application software at all without recompiling the application using a compiler that is aware of those new features. We mentioned a demonstration, in magnetic resonance imaging (Section 12.4), of overcoming through inline assembly (see Appendix F) the inability of unenhanced compilers to provide access to multimedia instructions.

The Internet Streaming SIMD Extensions (SSE) added to the IA-32 architecture not only increase the repertoire of machine instructions, but also introduce eight new 128-bit registers in processors conforming to the architectural extension. Although it is unusual to add programmer-visible registers to an established architecture, Thakkar and Huff comment that IA-32 was an "already register-starved" architecture.

Quite obviously new instructions or registers do little good unless compilers actually use them, since otherwise their benefit would be restricted to a limited set of hand-tuned assembly language routines. Thus, over time, some kinds of architectural extensions should probably also result in new capabilities being added to the major high-level languages themselves, in some architecture-independent manner.