Programming in C | A Practical Guide to UNIX for Mac OS X Users

A major reason that the Mac OS X system provides an excellent C programming environment is that C programs can easily access the services of the operating system. System callsthe routines that make operating system services available to programmerscan be called from C programs. These system calls provide such services as creating files, reading from and writing to files, collecting information about files, and sending signals to processes. When you write a C program, you can use system calls in the same way you use ordinary C program modules, or functions, that you have written. For more information refer to "System Calls" on page 506.

Several libraries of functions have been developed to support programming in C. The libraries are collections of related functions that you can use just as you use your own functions and the system calls. Many of the library functions access basic operating system services through the system calls, providing the services in ways that are more suited to typical programming tasks. Other library functions, such as the math library functions, serve special purposes. Some higher-level application libraries are written in Objective-C, an object-oriented language derived from C.

This chapter describes the processes of writing and compiling C programs. However, it will not teach you to program in C.

Checking Your Compiler

The C compiler in common use on Mac OS X is GNU gcc, which comes as part of the Developer Tools package (www.gnu.org/software/gcc/gcc.html). Give the following command to determine whether you have access to the gcc compiler:

$ gcc --version bash: gcc: command not found

If you get a response other than version information, either the compiler is not installed or your PATH variable does not contain the necessary pathname (usually gcc is installed in /usr/bin). If you get version information from the gcc command, the GNU C compiler is installed.

Next make sure that the compiler is functioning. As a simple test, create a file named Makefile with the following lines. The line that starts with gcc must be indented by using a TAB, not SPACEs.

$ cat Makefile morning: morning.c TAB gcc -o morning morning.c

Now create a source file named morning.c with the following lines:

$ cat morning.c #include <stdio.h> int main(int argc, char** argv) {     printf("Good Morning\n");     return 0; }

Compile the file with the command make morning. When it compiles successfully, the resulting file will be executable; you can run the program by giving the command ./morning. When you get output from this program, you know that you have a working C compiler.

$ make morning gcc -o morning morning.c $ ./morning Good Morning

A C Programming Example

You must use an editor, such as emacs or vim, to create or modify a C program. The name of the C program file must end in .c. Entering the source code for a program is similar to typing a memo or shell script. Although emacs and vim "know" that you are editing a C program, many editors do not know whether your file is a C program, a shell script, or an ordinary text document. You are responsible for making the contents of the file syntactically suitable for the C compiler to process.

Figure 12-1 illustrates the structure of a simple C program named tabs.c. The first two lines of the program are comments that describe what the program does. The string /* identifies the beginning of the comment, and the string */ identifies the end of the comment; the C compiler ignores all characters between them. Because a comment can span two or more lines, the */ at the end of the first line and the /* at the beginning of the second line are not necessary but are included for clarity. As the comment explains, the program reads standard input, converts TAB characters into the appropriate number of SPACEs, and writes the transformed input to standard output. Like many UNIX utilities, this program is a filter.

Figure 12-1. A simple C program: tabs.c (The line numbers are not part of the source code.)

The comments at the top of tabs.c are followed by preprocessor directives, which are instructions for the C preprocessor. During the initial phase of compilation the C preprocessor expands these directives, making the program ready for the later stages of the compilation process. Preprocessor directives begin with the pound sign (#) and may optionally be preceded by SPACE and TAB characters.

Symbolic constants

You can use the #define preprocessor directive to define symbolic constants and macros. Symbolic constants are names that you can use in a program in place of constant values. For example, tabs.c uses a #define preprocessor directive to associate the symbolic constant TABSIZE with the constant 8. TABSIZE is then used in the program in place of the constant 8 as the distance between TAB stops. By convention the names of symbolic constants consist of all uppercase letters.

By defining symbolic names for constant values, you can make a program easier to read and easier to modify. If you later decide to change a constant, you need to change only the preprocessor directive; you do not need to change the value everywhere it occurs in the program. If you replace the #define directive for TABSIZE in Figure 12-1 with the following directive, the program will place TAB stops every four columns rather than every eight:

#define     TABSIZE    4

Macros

A symbolic constant, which is a type of macro, maps a symbolic name to replacement text. Macros are handy when the replacement text is needed at multiple points throughout the source code or when the definition of the macro is subject to change. The process of substituting the replacement text for the symbolic name is called macro expansion.

You can also use #define directives to define macros with arguments. Use of such a macro resembles a function call. Unlike C functions, however, macros are replaced with C code prior to compilation into object files.

The NEXTTAB macro computes the distance to the next TAB stop, given the current column position curcol:

#define NEXTTAB(curcol) (TABSIZE - ((curcol) % TABSIZE))

This definition uses the macro TABSIZE, whose definition must appear prior to NEXTTAB in the source code. The macro NEXTTAB could be used in tabs.c to assign a value to retval in the function findstop:

retval = NEXTTAB(*col);

Headers (include files)

When modules of a program use several macro definitions, the definitions are typically collected together in a single file called a header file or an include file. Although the C compiler does not place constraints on the names of header files, by convention they end in .h. The name of the header file is listed in an #include preprocessor directive in each program source file that uses any of the macros. The program in Figure 12-1 uses getchar and putchar, which are functions defined in stdio.h. The stdio.h header file defines a variety of general-purpose macros and is used by many C library functions.

The angle brackets (< and >) that surround stdio.h in tabs.c instruct the C preprocessor to look for the header file in a standard list of directories (such as /usr/include). To include a header file from another directory, enclose its pathname between double quotation marks. You can specify an absolute pathname within the double quotation marks or you can give a relative pathname. If you give a relative pathname, searching begins with the working directory and then moves to the same directories that are searched when the header file is surrounded by angle brackets.

You can also specify directories to be searched for header files by using the I option to the C compiler. Assume that you want to compile the program deriv.c, which contains the following preprocessor directive:

#include "eqns.h"

If the header file eqns.h is located in the subdirectory myincludes, you can compile deriv.c with the I option to tell the C preprocessor to look for eqns.h there:

$ gcc -I./myincludes deriv.c

When the C preprocessor encounters the #include directive in the deriv.c file, it will look for eqns.h in the myincludes subdirectory of the working directory.

Tip: Use relative pathnames for include files

Using absolute pathnames for include files does not work if the location of the header file within the filesystem changes. Using relative pathnames for header files works as long as the location of the header file relative to the working directory remains the same. Relative pathnames also work with the I option on the gcc command line and allow header files to be moved.

Function prototype

Preceding the definition of the function main is a function prototype. This declaration tells the compiler what type a function returns, how many arguments a function expects, and what the types of those arguments are. In tabs.c the prototype for the function findstop informs the compiler that findstop returns type int and that it expects a single argument of type pointer to int:

int findstop(int *);

Once the compiler has seen this declaration, it can detect and flag inconsistencies in the definition and the uses of the function. As an example, suppose that the reference to findstop in tabs.c was replaced with the following statement:

inc = findstop();

The prototype for findstop would cause the compiler to detect a missing argument and issue an error message. You could then easily fix the problem. When a function is present in a separate source file or is defined after it is referenced in a source file (as findstop is in the example), the function prototype helps the compiler check that the function is being called properly. Without the prototype, the compiler would not issue an error message and the problem might manifest itself as unexpected behavior during execution. At this late point, finding the bug might be difficult and time-consuming.

Functions

Although you can name most C functions anything you want, each program must have exactly one function named main. The function main is the control module: A program begins execution with the function main, which typically calls other functions, which in turn may call still other functions, and so forth. By putting different operations into separate functions, you can make a program easier to read and maintain. For example, the program in Figure 12-1 uses the function findstop to compute the distance to the next TAB stop. Although the few statements of findstop could easily have been included in the main function, isolating them in a separate function draws attention to a key computation.

Functions can make both development and maintenance of the program more efficient. By putting a frequently used code segment into a function, you avoid entering the same code into the program over and over again. When you later want to modify the code, you need change it only once.

If a program is long and includes several functions, you may want to split it into two or more files. Regardless of its size, you may want to place logically distinct parts of a program in separate files. A C program can be split into any number of different files; however, each function must be wholly contained within a single file.

Tip: Use a header file for multiple source files

When you are creating a program that takes advantage of multiple source files, put #define preprocessor directives into a header file. Then use an include statement with the name of the header file in any source file that uses the directives.

Compiling and Linking a C Program

To compile tabs.c and create an executable file named a.out, give the following command:

$ gcc tabs.c

The gcc utility calls the C preprocessor, the C compiler, the assembler, and the linker. Figure 12-2 shows these four components of the compilation process. The C preprocessor expands macro definitions and includes header files. The compilation phase creates assembly language code corresponding to the instructions in the source file. Then the assembler creates machine-readable object code. One object file is created for each source file. Each object file has the same name as the source file, except that the .c extension is replaced with .o. After successfully completing all phases of the compilation process for a program, the C compiler creates the executable file and then removes any temporary .o files.

Figure 12-2. The compilation process

During the final phase of the compilation process, the linker searches specified libraries for functions the program uses and combines object modules for those functions with the program's object modules. By default the C compiler links the standard C library libc.dylib (found in /usr/lib), which contains functions that handle input and output and provides many other general-purpose capabilities. If you want the linker to search other libraries, you must use the l (lowercase "l") option to specify the libraries on the command line. Unlike most options to Mac OS X system utilities, the l option does not come before all filenames on the command line but rather appears after the filenames of all modules that it applies to. In the next example, the C compiler searches the math library libm.dylib (also found in /usr/lib):

$ gcc calc.c -lm

The l option uses abbreviations for library names, appending the letter following l to lib and adding a .dylib or .a extension. The m in the example stands for libm.dylib.

Using the same naming mechanism, you can have a graphics library named libgraphics.a, which can be linked with the following command:

$ gcc pgm.c -lgraphics

When you use this convention to name libraries, gcc knows to search for them in /usr/lib. You can also have gcc search other directories by using the L option:

$ gcc pgm.c -L. -L/usr/X11R6/lib -lgraphics

The preceding command causes gcc to search for the library file libgraphics.a in the working directory and in /usr/X11R6/lib before searching /usr/lib.

As the last step of the compilation process, the linker creates an executable file named a.out unless you specify a different filename with the o option. Object files are deleted after the executable is created.

Mach-O format

You may occasionally encounter references to the a.out format, an old UNIX binary format. Mac OS X uses the Mach-O format for binaries, not the a.out format, in spite of the filename. Use the file utility (page 726) to display the format of the executable that gcc generates:

$ file a.out a.out: Mach-O executable ppc

In the next example, the O3 option causes gcc to use the C compiler optimizer. The optimizer makes object code more efficient so that the executable program runs more quickly. Optimization has many facets, including locating frequently used variables and taking advantage of processor-specific features. The number after the O indicates the level of optimization, where a higher number specifies more optimization. See the gcc info page for specifics. The following example also shows that the .o files are not present after a.out is created:

$ ls acctspay.c acctsrec.c ledger.c $ gcc -O3 ledger.c acctspay.c acctsrec.c $ ls a.out acctspay.c acctsrec.c ledger.c

You can use the executable a.out in the same way you use shell scripts and other programs: by typing its name on the command line. The program in Figure 12-1 on page 481 expects to read from standard input, so once you have created the executable a.out, you can use a command such as the following to run it:

$ ./a.out < mymemo

If you want to save the a.out file, you should change its name to a more descriptive one. Otherwise, you might accidentally overwrite it during a later compilation.

$ mv a.out accounting

To save yourself the trouble of renaming an a.out file, you can specify the name of the executable file via the gcc command. The o option causes the C compiler to give the executable the name you specify rather than a.out. In the next example, the executable is named accounting:

$ gcc -o accounting ledger.c acctspay.c acctsrec.c

If accounting does not require arguments, you can run it with the following command:

$ accounting

You can suppress the linking phase of compilation by using the c option with the gcc command. The c option does not treat unresolved external references as errors; this capability enables you to compile and debug the syntax of the modules of a program as you create them. Once you have compiled and debugged all of the modules, you can run gcc again with the object files as arguments to produce an executable program. In the next example, gcc produces three object files but no executable:

$ gcc -c ledger.c acctspay.c acctsrec.c $ ls acctspay.c acctspay.o acctsrec.c acctsrec.o ledger.c ledger.o

If you then run gcc again and name the object files on the command line, gcc will produce the executable. Because it recognizes the filename extension .o, the C compiler knows that the files need only to be linked. You can also include both .c and .o files on a single command line:

$ gcc -o accounting ledger.o acctspay.c acctsrec.o

The C compiler recognizes that the .c file needs to be preprocessed and compiled, whereas the .o files do not. The C compiler also accepts assembly language files ending in .s and assembles and links them. This feature makes it easy to modify and recompile a program.

You can use separate files to divide a project into functional groups. For instance, you might put graphics routines in one file, string functions in another file, and database calls in a third file. Having multiple files can enable several engineers to work on the same project concurrently and can speed up compilation. For example, if all functions are in one file and you make a change to one of the functions, the compiler must recompile all of the functions in the file. Thus the entire program will be recompiled, which may take considerable time even if you made only a small change. When you use separate files, only the file that you change must be recompiled. For large programs with many source files (for example, the C compiler or emacs), the time lost by recompiling one huge file for every small change would be enormous. For more information refer to "make: Keeps a Set of Programs Current" on page 489.

Tip: What not to name a program

Do not name a program test or any other name of a builtin or other executable on the local system. If you do, you will likely execute the builtin or other program instead of the program you intend to run. Use which (page 58) to determine which program you will run when you give a command.