Chapter 8: Development Tools | How Linux Works: What Every Superuser Should Know

Unix is very popular with programmers not just due to the overwhelming array of tools and environments available, but also because the system is exceptionally well documented and transparent. On a Unix machine, you don't have to be a programmer to take advantage of development tools, and when working with the system, you must know something about programming tools because they play a larger role in Unix systems management than in other operating systems. At the very least, you should be able to identify development utilities and have some idea of how to run them.

This chapter packs a lot of information into a small space, but you do not need to master everything here. Furthermore, you can easily leave the material and come back later.

8.1 The C Compiler

Knowing how to run the C compiler can give you a great deal of insight into the origin of the programs that you see on your Linux system. The source code for nearly all Linux utilities, and for many applications on Linux systems, is written in the C programming language. C is a compiled language, and C programs follow the traditional development process: you write programs, you compile them, and they run. After you write a C program, you must compile the source code that you wrote into a binary low-level form that the computer understands.

The C compiler on most Unix systems is named cc (on Linux, this is usually a link to gcc ), and C source code files end with .c . Take a look at the single, self-contained C source code file called hello.c that you can find in The C Programming Language [Kernighan and Ritchie]:

 #include <stdio.h> main() {     printf("Hello, World.\n"); }

To compile this source code, run this command:

 cc hello.c

The result is an executable named a.out , which you can run like any other executable on the system. However, you should probably give the executable another name (such as hello ). To do this, use the compiler's -o option:

 cc  -o hello  hello.c

For small programs, there isn't much more to compiling than that. You may need to add an extra include directory or library (see Sections 8.1.2 and 8.1.3), but let's look at slightly larger programs before getting into those topics.

8.1.1 Multiple Source Files

Most C programs are too large to reasonably fit inside one single source code file. Mammoth files become too disorganized for the programmer, and compilers sometimes even have trouble parsing large files. Therefore, developers group components of the source code together, giving each piece its own file.

To compile the .c files, use the compiler's -c option on each file. Let's say that you have two files, main.c and aux.c . The following two commands would do most of the work of building the program:

 cc  -c  main.c cc  -c  aux.c

The preceding two compiler commands compile the two source files into two object files ( main.o and aux.o ). An object file is a nearly complete binary that a processor can almost understand, except that there are still a few loose ends. First, the operating system does not know how to run an object file by itself, and second, you may need to combine several object files to make a complete program.

To build a fully functioning executable from one or more object files, you must run the linker ” the ld command in Unix. Programmers rarely use ld on the command line, because the C compiler knows how to run the linker program properly. To create an executable called myprog from the two object files above, run this command to links them:

 cc -o myprog main.o aux.o

Although you can compile multiple source files by hand, as the preceding example shows, it can be hard to keep track of them all during the compiling process when the number of source files multiplies. The make system described in Section 8.1.5 is the Unix standard for managing compiles. This system is especially important in managing the files described in the next two sections.

8.1.2 Header (Include) Files and Directories

C header files are additional source code files that usually contain type and library function declarations. For example, stdio.h is a header file (see the simple program in Section 8.1).

Unfortunately, a great number of compiler problems crop up with header files. Most glitches occur when the compiler can't find header files and libraries. There are even some cases where a programmer forgets to include a required header file, so some of the source code may not compile.

Tracking down the correct include files isn't always easy. Sometimes there are several include files with the same names in different directories, and it's not clear which is the correct one. When the compiler can't find an include file, the error message looks like this:

 badinclude.c:1: notfound.h: No such file or directory

This message reports that the compiler cannot not find the notfound.h header file that the badinclude.c file references. This specific error is a direct result of this directive on line 1 of badinclude.c :

 #include <notfound.h>

The default include directory in Unix is /usr/include ; the compiler always looks there unless you explicitly tell it not to (for example, with gcc -nostdinc ). However, you can make the compiler look in other include directories (most paths that contain header files have include somewhere in the name).

For example, let's say that you find notfound.h in /usr/junk/include . You can make the compiler see this directory with the -I option:

 cc -c -I/usr/junk/include badinclude.c

Now the compiler should no longer stumble on the line of code in bad - include.c that references the header file.

You should also be careful of includes that use double quotes ( " " ) instead of angle brackets ( < > ), like this:

 #include "myheader.h"

Double quotes mean that the header file is not in a system include directory, but that the compiler should otherwise search its include path . It often means that the include file is supposed to be in the same directory as the source file. If you encounter a problem with double quotes, you're probably trying to compile incomplete source code.

What Is the C Preprocessor (cpp)?

The C preprocessor is a program that the compiler runs on your source code before parsing the actual program. The preprocessor rewrites source code into a form that the compiler understands; it's a tool for making source code easier to read (and for providing shortcuts).

Preprocessor commands in the source code are called directives , and they start with the # character. There are three basic types of directives:

Include files An #include directive instructs the preprocessor to include an entire file. Note that the compiler's -I flag is actually an option that causes the preprocessor to search a specified directory for include files, as you saw in the previous section.
Macro definitions A line such as #define BLAH something tells the preprocessor to substitute something for all occurrences of BLAH in the source code. Convention dictates that macros appear in all uppercase, but it should come as no shock that programmers sometimes use macros whose names look like functions and variables . (Every now and then, this causes a world of headaches . Many programmers make a sport out of abusing the preprocessor.)

Note that instead of defining macros within your source code, you can also define macros by passing parameters to the compiler: -DBLAH=something works like the directive above.
Conditionals You can mark out certain pieces of code with #ifdef , #if , and #endif . The #ifdef MACRO directive checks to see if the preprocessor macro MACRO is defined, and #if condition tests to see if condition is non-zero . For both directives, if the condition following the "if statement" is false, the preprocessor does not pass any of the program text between the #if and the next #endif to the compiler. If you plan to look at any C code, you'd better get used to this.

An example of a conditional directive follows . When the preprocessor sees the following code, it checks to see if the macro DEBUG is defined, and if so, passes the line containing fprintf() on to the compiler. Otherwise, the preprocessor skips this line and continues to process the file after the #endif :

 #ifdef DEBUG   fprintf(stderr, "This is a debugging message.\n"); #endif

Note	The C preprocessor doesn't know anything about C syntax, variables, functions, and other elements. It understands only its own macros and directives.

On Unix, the C preprocessor's name is cpp , but you can also run it with gcc -E . However, you will rarely need to run the preprocessor by itself.

8.1.3 Linking with Libraries

The C compiler does not know enough about your system to create a useful program all by itself. Modern systems require libraries to build complete programs. A C library is a collection of common precompiled functions that you can build into your program. For example, many executables use the math library because it provides trigonometric functions and the like.

Libraries come into play primarily at link time, when the linker program creates an executable from object files. For example, if you have a program that uses the math library, but you forget to tell the compiler to link against that library, you'll see errors like this:

 badmath.o(.text+0x28):  undefined  reference to '  sin  ' badmath.o(.text+0x36):  undefined  reference to '  pow  '

The most important parts of these error messages are in bold. When the linker program examined the badmath.o object file, it could not find the math functions that appear in bold, and as a consequence, it could not create the executable. In this particular case, you might suspect that you forgot the math library because the missing functions refer to mathematical operations (sine and exponentiation).

Note	Undefined references do not always mean that you're missing a library. One of the program's object files could be missing in the link command. It's usually easy to differentiate between library functions and functions in your object files, though.

To fix this problem, you must first find the math library and then use the compiler's -l option to link against the library. As with include files, libraries are scattered throughout the system ( /usr/lib is the system default location), though most libraries reside in a subdirectory named lib . For the preceding example, the math library file is libm.a (in /usr/lib ), so the library name is m . Putting it all together, you would link the program like this:

 cc -o badmath badmath.o  -lm

You must tell the linker about nonstandard library locations; the parameter for this is -L . Let's say that the badmath program requires libcrud.a in /usr/junk/lib . To compile and create the executable, use a command like this:

 cc -o badmath badmath.o -lm  -L/usr/junk/lib -lcrud

Note	If you want to search a library for a particular function, use the nm command. Be prepared for a lot of output. For example, try this: nm /usr/lib/libm.a

8.1.4 Shared Libraries

A library file ending with .a (such as libm.a ) is a static library . Sadly, the story on libraries doesn't end here.

When you link a program against a static library, the linker copies machine code from the library file into your executable. Therefore, the final executable does not need the original library file to run.

However, the ever-expanding size of libraries has made static libraries wasteful in terms of disk space and memory. Shared libraries counter this problem. When you run a program linked against a shared library, the system loads the library's code into the process memory space only when necessary. Furthermore, many processes can share the same shared library code in memory.

Shared libraries have their own costs: difficult management and a somewhat complicated linking procedure. However, you can bring shared libraries under control if you know four things:

How to list the shared libraries that an executable needs
How an executable looks for shared libraries
How to link a program against a shared library
What the common shared library pitfalls are

The following sections tell you how to use and maintain your system's shared libraries. If you're interested in how shared libraries work, or if you want to know about linkers in general, you can look at Linkers and Loaders [Levine], or at "The Inside Story on Shared Libraries and Dynamic Loading," [Beazley/Ward/Cooke]. The ld.so(8) manual page is also worth a read.

Listing Shared Library Dependencies

Shared library files usually reside in the same places as static libraries. The two standard library directories on a Linux system are /lib and /usr/lib . The /lib directory should not contain static libraries.

A shared library has a suffix that contains .so , as in libc-2.3.2.so and libc.so.6 . To see what shared libraries a program uses, run ldd prog , where prog is the executable name. Here is the output of ldd /bin/bash :

 libreadline.so.2 => /lib/libreadline.so.2 (0x40019000) libncurses.so.3.4 => /lib/libncurses.so.3.4 (0x40045000) libdl.so.2 => /lib/libdl.so.2 (0x4008a000) libc.so.6 => /lib/libc.so.6 (0x4008d000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

Executables alone do not know the locations of their shared libraries; they know only the names of the libraries, and perhaps a little hint. A small program named ld.so (the runtime dynamic linker/loader ) finds and loads shared libraries for a program at runtime. The preceding ldd output shows the library names on the left ” that's what the executable knows. The right side shows where ld.so finds the library.

How ld.so Finds Shared Libraries

The first place that the dynamic linker should normally look for shared libraries is an executable's preconfigured runtime library search path (if one exists). You will see how to create this path shortly.

Next, the dynamic linker looks in a system cache, /etc/ld.so.cache , to see if the library is in a standard location. This is a fast cache of the names of library files found in directories listed in the cache configuration file /etc/ld.so.conf . Each line in this file is a directory that you want to include in the cache. The list of directories is usually short, containing something like this:

 /usr/X11R6/lib /usr/lib/libc5-compat

The standard library directories /lib and /usr/lib are implicit ” you don't need to include them in /etc/ld.so.conf .

If you alter ld.so.conf or make a change to one of the shared library directories, you must rebuild the /etc/ld.so.cache file by hand with the following command:

 ldconfig -v

The -v option provides detailed information on libraries that ldconfig adds to the cache and any changes that it detects.

There is one more place that ld.so looks for shared libraries: the environment variable LD_LIBRARY_PATH . Before discussing this variable, let's look at the runtime library search path.

Linking Programs Against Shared Libraries

Don't get into the habit of adding stuff to /etc/ld.so.conf . You should know what shared libraries are in the system cache, and if you put every bizarre little shared library directory into the cache, you risk conflicts and an extremely disorganized system. When you compile software that needs an obscure library path, give your executable a built-in runtime library search path.

Let's say that you have a shared library named libweird.so.1 in /opt/bizarresoft/lib that you need to link myprog against. Link the program as follows:

 cc -o myprog myprog.o -Wl,-rpath=/opt/obscure/lib -L/opt/obscure/lib -lweird

The -Wl,-rpath option tells the linker to include a following directory into the executable's runtime library search path. However, even if you use -Wl,-rpath , you still need the -L flag.

Problems with Shared Libraries

Shared libraries provide remarkable flexibility, not to mention some really incredible hacks, but it's also possible to abuse them to the point where your system is an utter and complete mess. Three particularly bad things can happen:

Missing libraries
Terrible performance
Mismatched libraries

The number one cause of all shared library problems is an environment variable named LD_LIBRARY_PATH . Setting this variable to a colon -delimited set of directory names makes ld.so search the given directories before anything else when looking for a shared library. This is a cheap way to make programs work when you move a library around, if you don't have the program's source code, or if you're just too lazy to recompile the executables. Unfortunately, you get what you pay for.

Never set LD_LIBRARY_PATH in shell startup files or when compiling software. When the dynamic runtime linker encounters this variable, it must often search through the entire contents of each specified directory more times than you would care to know. This causes a big performance hit, but more importantly, you can get conflicts and mismatched libraries because the runtime linker looks in these directories for every program.

If you must use LD_LIBRARY_PATH to run some crummy program for which you don't have the source (or an application that you'd rather not compile, like Mozilla or some other beast ), use a wrapper script. Let's say that your executable is /opt/crummy/bin/crummy.bin and it needs some shared libraries in /opt/crummy/lib . Write a wrapper script called crummy that looks like this:

 #!/bin/sh LD_LIBRARY_PATH=/opt/crummy/lib export LD_LIBRARY_PATH exec /opt/crummy/bin/crummy.bin $@

Avoiding LD_LIBRARY_PATH prevents most shared library problems. But one other significant problem that occasionally comes up with developers is that a library's application programming interface (API) may change slightly from one minor version to another, breaking installed software. The best solutions here are preventive: either use a system like Encap (see Section 9.4) to install shared libraries with -Wl,-rpath to create a runtime link path, or simply use the static versions of obscure libraries.

8.1.5 Make

A program with more than one source code file or requiring strange compiler options is too cumbersome to compile by hand. This problem has been around for years , and the traditional Unix compile management utility that eases these pains is called make . You need to know a little about make if you're running a Unix system, because system utilities sometimes rely on make to operate . However, this chapter is only the tip of the iceberg. The classic guide for compiled languages and make is The UNIX Programming Environment [Kernighan and Pike].

make is a big system, and there are entire books on the subject (such as Managing Projects with make [Oram]), but it's not very difficult to get an idea of how it works. When you see a file named Makefile or makefile , you know that you're dealing with make . (Try running make to see if you can build anything.)

The basic idea behind make is the target , a goal that you want to achieve. A target can be a file (a .o file, an executable, and so on) or a label. In addition, some targets depend on other targets ” for instance, you need a complete set of .o files before you can link your executable. The targets on which another target depends are called dependencies .

To build a target, make follows a rule , such as a rule for how to go from a .c source file to a .o object file. make already knows several rules, but you can customize these existing rules and create your own.

The following is a very simple Makefile that builds a program called myprog from aux.c and main.c :

 OBJS=aux.o main.o # object files all: myprog myprog: $(OBJS)         $(CC) -o myprog $(OBJS)

The first line of the Makefile is just a macro definition; it sets the OBJS variable to two object filenames. This will be important later. For now, take note of how you define the macro and also how you reference it later ( $(OBJS) ).

The # in the next line denotes a comment.

The next item in the Makefile contains its first target, all . The first target is always the default, the target that make wants to build when you run make by itself on the command line.

The rule for building a target comes after the colon. For all , this Makefile says that you need to satisfy something called myprog . This is the first dependency in the file; all depends on myprog . Note that myprog can be an actual file or the target of another rule. In this case, it is both (it is the rule for all and the target of OBJS ).

To build myprog , this Makefile uses the macro $(OBJS) in the dependencies. The macro expands to aux.o and main.o , so myprog depends on these two files (they must be actual files, because there aren't any targets with those names anywhere in the Makefile).

This Makefile assumes that you have two C source files named aux.c and main.c in the same directory. Running make on the Makefile yields the following output, showing the commands that make is running:

 cc    -c -o aux.o aux.c cc    -c -o main.o main.c cc -o myprog aux.o main.o

So how does make know how to go from aux.c to aux.o ? After all, aux.c is not in the Makefile. The answer is that make follows its built-in rules. It knows to look for a .c file when you want a .o file, and furthermore, it knows how to run cc-c on that .c file to get to its goal of creating a .o file.

The final step of getting to myprog is a little tricky, but the idea is clear enough. Once you have the two object files in $(OBJS) , you can run the C compiler according to the following line (where $(CC) expands to the compiler name):

 $(CC) -o myprog $(OBJS)

Pay special attention to the whitespace before $(CC) . This is a tab. You must put a tab before any real command, and it must be on a line by its own. Watch out for this:

 Makefile:7: *** missing separator. Stop.

An error like this means that the Makefile is broken. The tab is the separator, and if there is no separator or there's some other interference, you will see this error.

Staying up to Date

One last make fundamental is that targets should be up to date with their dependencies. If you type make twice in a row for the preceding example, the first command builds myprog , but the second command yields this output:

 make: Nothing to be done for 'all'.

For this second time through, make looked at its rules and noticed that myprog already exists. To be more specific, make did not build myprog again because none of the dependencies had changed since the last time it built myprog . You can experiment with this as follows:

Run touch aux.c .
Run make again. This time, make figures out that aux.c is newer than the aux.o already in the directory, so it must compile aux.o again.
myprog depends on aux.o , and now aux.o is newer than the preexisting myprog , so make must create myprog again.

This type of chain reaction is very typical.

Command-Line Options

You can get a great deal of mileage from make if you know how its command line options work.

The most common make option is specifying a single target on the command line. For the preceding Makefile, you can run make aux.o if you want only the aux.o file.

You can also define a macro on the command line. Let's say that you want to use a different compiler called my_bad_cc . Try this:

 make CC=my_bad_cc

Here, make uses your definition of CC instead of its default compiler, cc . Command-line macros come in handy when you're testing out preprocessor definitions and libraries, especially with the CFLAGS and LDFLAGS macros explained later.

You don't even need a Makefile to run make . If built-in make rules match a target, you can just ask make to try to create the target. For example, if you have the source to a very simple program called blah.c , try make blah . The make run tries the following command:

 cc blah.o -o blah

This use of make works only for the most elementary C programs; if your program needs a library or special include directory, you're probably better off writing a Makefile.

Running make without a Makefile is actually most useful when you aren't dealing with a C program, but with something like Fortran, lex, or yacc. It can be a real pain to figure out how the compiler or utility works, so why not let make try to figure it out for you? Even if make fails to create the target, it will probably still give you a pretty good hint as to how you might use the tool.

Two more make options stand out from the rest:

-n Prints the commands necessary for a build, but prevents make from actually running any commands
-f file Tells make to read from file instead of Makefile or makefile

Standard Macros and Variables

make has many special macros and variables. It's difficult to tell the difference between a macro and a variable, so this book uses the term macro to mean something that usually doesn't change after make starts building targets.

As you saw earlier, you can set macros at the start of your Makefile. The following list includes the most common macros:

CFLAGS C compiler options. When creating object code from a .c file, make passes this as an argument to the compiler.
LDFLAGS Like CFLAGS , but for the linker when creating an executable from object code.
LDLIBS If you use LDFLAGS but do not want to combine the library name options with the search path, put the library name options in this file.
CC The C compiler. The default is cc .
CPPFLAGS C preprocessor options. When make runs the C preprocessor in some way, it passes this macro's expansion on as an argument.
CXXFLAGS GNU make uses this for C++ compiler flags. Like C++ source code extensions (and nearly everything else associated with C++), this isn't standard and probably won't work with other make variants unless you define your own rule.

A make variable changes as you build targets. Because you don't ever set variables by hand, the following list includes the $ .

$@ When inside a rule, this expands to the current target.
$* Expands to the basename of the current target. For example, if you're building blah.o , this expands to blah .

The most comprehensive list of the make variables on Linux is the make info page.

Note	Keep in mind that GNU make has many extensions, built-in rules, and features that other variants do not have. This is fine as long as you're running Linux, but if you step off onto a Sun or BSD machine and expect the same stuff to work, you might be in for a surprise.

Conventional Targets

Most Makefiles contain several standard targets that perform auxiliary tasks related to compiles.

The clean target is ubiquitous; a make clean usually instructs make to remove all of the object files and executables so that you can make a fresh start or pack up the software. Here is an example rule for the myprog Makefile:
```
 clean:         rm -f $(OBJS) myprog 
```
A Makefile created with the GNU autoconf system always has a distclean target to remove everything that wasn't part of the original distribution, including the Makefile. You will see more of this in Section 9.2.3. On very rare occasions, you may find that a developer opts not to remove the executable with this target, preferring something like realclean instead.
install copies files and compiled programs to what the Makefile thinks is the proper place on the system. This can be dangerous, so you should always run a make -n install first to see what will happen without actually running any commands.
Some developers provide test or check targets to make sure that everything works after you perform a build.
depend creates dependencies by calling the compiler with a special option ( -M ) to examine the source code. This is an unusual-looking target because it often changes the Makefile itself. This is no longer common practice, but if you come across some instructions telling you to use this rule, make sure that you do it.
all is often the first target in the Makefile; you will often see references to this target instead of an actual executable.

Organizing a Makefile

Even though there are many different Makefile styles, there are still some general rules of thumb to which most programmers adhere .

In the first part of the Makefile (inside the macro definitions), you should see libraries and includes grouped according to package:

 X_INCLUDES=-I/usr/X11R6/include X_LIB=-L/usr/X11R6/lib -lX11 -Xt PNG_INCLUDES=-I/usr/local/include PNG_LIB=-L/usr/local/lib -lpng

Each type of compiler and linker flag often gets a macro like the following:

 CFLAGS=$(CFLAGS) $(X_INCLUDES) $(PNG_INCLUDES) LDFLAGS=$(LDFLAGS) $(X_LIB) $(PNG_LIB)

Object files are usually grouped according to executables. Let's say that you have a package that creates executables called boring and trite . Each has its own .c source file and requires the code in util.c . You might see something like this:

 UTIL_OBJS=util.o BORING_OBJS=$(UTIL_OBJS) boring.o TRITE_OBJS=$(UTIL_OBJS) trite.o PROGS=boring trite

The rest of the Makefile might look like this:

 all: $(PROGS) boring: $(BORING_OBJS)         $(CC) -o $@ $(BORING_OBJS) $(LDFLAGS) trite: $(TRITE_OBJS)         $(CC) -o $@ $(TRITE_OBJS) $(LDFLAGS)

You could combine the two executable targets into one rule, but this is usually not good practice because you would not easily be able to move a rule to another Makefile, delete an executable, or group executables differently. Furthermore, the dependencies would be incorrect ” if you had just one rule for boring and trite , trite would depend on boring.c , and boring would depend on trite.c , and make would always try to rebuild both programs whenever you changed one of the two source files.

Note	If you need to define a special rule for an object file, put the rule for the object file just above the rule that builds the executable. If several executables use the same object file, put the object rule above all of the executable rules.