7.4 Program Segmentation | ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles

Good modular programming practices promote the use of subroutines, procedures, and functions for several reasons:

Ease of development: Large programming projects can be divided into sections more commensurate with the capabilities of individual programmers.
Reusability: Well-documented and well-maintained routines can be reused for different environments and marketed over the course of many years.
Reliability: Monolithic programs are difficult to test and debug, while individual routines can be empirically tested or proven to give correct results.
Maintainability: Frequently repeated sequences of instructions are well-suited to compartmentalization. Revisions are made in one place, ensuring that the change occurs throughout the entire program.
Reducing memory requirements: If several blocks of instructions are replaced by a more general but carefully documented subroutine, the overall program size can be substantially reduced.

Of these motivations, the first four now stand well above the fifth in precedence. The cost of memory has decreased over time, while the other concerns remain in force because the cost of labor continually increases.

Another consideration is the size of cache structures. Procedural code can remain in cache for reuse, while monolithic code with repeated sequences can cause reloading from much slower memory. The performance capabilities of cache structures promote the use of shared libraries and routines.

Program segmentation can take several forms, some involving source-level modularity and others involving execution-time modularity.

7.4.1 Source-Level Modularity

All software, particularly commercial products, must work with or around many differences in operating system environments and hardware implementations. If the product is to be successful, it must have a maintainable common code base that can be tailored to a target environment at compile time. Include files, preprocessor directives, and macros are common tools for this strategy.

Include files

The C version of the SQUARES program (Section 1.7.1) started with the line #include <stdio.h>, as did getput.c (Figure 6-6). The contents of the file named stdio.h are retrieved from a system library directory and merged into the source, replacing the line starting with #include. This mechanism simplifies programming efforts by providing access to commonly required routines compatible with the target system.

Preprocessor directives

Any line in a C program that starts with the # character is a directive to the C preprocessor. These directives are carried out before the source is sent to the compiler or assembler. Other examples of C preprocessor directives are #if and #define.

Macros

Macros are another form of text substitution used on a more local scale than #include directives. Macros have a long history in assembly language programming, and Appendix E outlines how they are implemented by the GNU programming environment.

7.4.2 Traditional Subroutines

A subroutine may simply be an arbitrary chunk of code that has been separated from other code for some reason. There is no requirement that a subroutine be reused, though that is the normal application.

In one sense, subroutine is the broadest generic term encompassing procedures and functions. More specifically, it may connote a program segment that may not require a set of inputs and outputs because it shares the global data space.

Minimally, a subroutine requires some means of preserving the return address, which would actually be the value of the IP had there not been a jump to a subroutine:

 Calling Routine               Called Routine (Subroutine) <instruction sequence> Jump to subroutine X          X:  Entry point                               <instruction sequence>                               Jump back from subroutine Resumption point

In the worst case, the two jumps could be GOTO instructions with hard-coded target addresses.

Many architectures use a memory stack to hold return addresses, which are pushed by a jump to subroutine instruction and popped back into the IP as part of a jump back from subroutine instruction. A few architectures implement a specialized register stack exclusively to hold return addresses. In any case, the nesting depth of subroutines must not overrun the region that holds return addresses or other preserved context information.

7.4.3 Coroutines

Hierarchical programming structures, consisting of a main program and one or more subroutines, fully execute a subroutine with each jump and then resume execution of the main routine after each jump back. The relationship between the main program and the subroutine is thus asymmetric.

Suppose instead that two routines are designed to call each other in a symmetrical manner as coroutines. Whenever one leaves off, the other resumes:

 Coroutine A                   Coroutine B <instruction sequence> Jump to coroutine B at X                               X:  Entry point                               <instruction sequence>                               Jump to coroutine A Resumption point <instruction sequence> Jump to coroutine B                               Resumption point                               <instruction sequence>                               Jump to coroutine A Resumption point etc.                          etc.

Coroutines can call each other indefinitely. Note that there is not necessarily any stack activity, and that A and B share whatever information is contained in all the processor registers.

Coroutines might be designed in such a way as to avoid making multiple passes over a large data file, yet retain the conceptual advantage of compartmentalized program code for each type of specialized processing. A pair of coroutines might also be envisioned as the Black and White players in a program that simulates the game of chess.

Modern operating systems usually implement cooperative threads. While requiring more support and involving more overhead than coroutines, threads can better take advantage of parallel architectures and multiprocessor computer systems. Mosberger and Eranian discuss thread support for Itanium-based Linux systems.

7.4.4 Procedures and Functions

High-level languages usually support procedures and functions, often blurring the distinction between these two ways to "divide and conquer" an application. Both constructs need well-defined specifications for their purpose, input arguments, and effects on the overall program.

A procedure is merely some portion of code that has been isolated, optimized, and given a name to facilitate its repetitive use. That is, you might accomplish a big task in FORTRAN by arranging to CALL TOM, CALL DICK, and CALL HARRY.

A function tends to convey a more mathematical or algorithmic purpose, taking input data or boundary conditions and producing a result. In high-level languages, the name of the function becomes associated with the computed result, eliminating the need for temporary assignments in multipart calculations. For example, Z=A*SQR(X) is the same as Y=SQR(X); Z=A*Y.

7.4.5 Shared Library Functions

Programming environments tend to provide extensive shared libraries that can be dynamically linked with new programs. Modern operating systems enable such functions to be loaded on demand, shared by multiple processes, and unloaded when no longer required.

Prior to virtual memory and shared libraries, copies of a function would have been statically bound to each program needing it. While this is still possible, efficient and shared use of system resources makes more sense.