|< Free Open Study >|| |
Installing software on a Linux system is pretty easy, after a few tries and a little experience. However, in the long term the biggest challenge is keeping everything orderly and sane. It's easy to let things get out of control and wind up with one big ball of string instead of a functional system. This chapter outlines the following steps of installing software:
Choosing between a source or binary installation
Deciding on a destination
Building the software
Configuring the software's settings
Configuring the user environment
Hooking into the operating system
This chapter will describe each of these steps in detail. The steps don't provide detailed instructions on how to install software; for details like that, users should consult the documentation for the software or another, more introductory reference book. Like the rest of this book, rules of thumb, guidelines, and general patterns and techniques are emphasized. After reading these sections, users will have a broad understanding of exactly what is going on when they install software.
The first step to installing a new software package is, of course, to obtain the package. However, since the vast majority of software used on Linux systems is open source, source code is usually available. This means that your first decision is whether to install precompiled binaries of the software or build your own binaries from the source code. The appropriate option is going to vary from one situation to the next, depending on the hardware and particular distribution being used as well as the needs of the user. This section will discuss how to make this decision, based on factors such as platform architecture and software configuration needs.
Most microprocessors have a "family" of related processors. As manufacturers develop newer and faster chips, they generally augment existing capabilities of older chips, and so the later processors from a particular manufacturer are related to—but not always 100% compatible with—other chips in the family. Two example microprocessor families are Intel's ubiquitous x86 family and Compaq's Alpha family of RISC processors. The later chips in these families are simply extensions of earlier chips; for example, Intel's Pentium 4 processor is backward compatible with the original 8086 processor, while Compaq's 21364 EV67 chip is backward compatible with earlier chips, such as the 21264 EV56.
However, each subsequent chip in a family generally introduces new extensions to the processor. For example, Intel introduced the Multimedia Extension (MMX) instructions to their Pentium line of processors. One of the jobs of the compiler is to take the source code and optimize it for use with the features of a particular processor. Obviously, code that was compiled to make use of instructions or optimizations available on a specific chip isn't going to run on a processor that doesn't have those features, even if the chips are in the same family.
The impact of these optimizations varies. In a case like the MMX instructions, the code simply might not run at all on processors that don't have the MMX instructions, and fail with an error. In a more subtle case, code that was optimized to run on an Alpha EV6 makes use of timing characteristics and instruction scheduling of the EV6 processor. Even though these instructions will physically run on an earlier EV5 processor, they might not run optimally since the EV5 might have different timing requirements.
What does this have to do with installing software? Well, if the only binary package available is compiled for a slightly different version of your own processor, it might run suboptimally, or it might not run at all. This varies by architecture; in the Intel world, it's generally not a big deal, but Alpha users frequently choose to build their own software to make certain they have the appropriate optimizations for their specific system. So, depending on your platform you may wish to build software yourself from source code rather than rely on someone else's prebuilt package.
Sometimes it's not the compiler optimizations that the user is concerned with, but rather the software configuration options. That is, a given piece of software might support different features or behaviors that the user can request be activated or deactivated when the software is built from source code. These changes are independent of any optimizations the compiler might perform, and are called compile-time (or build-time) configurations.
The important thing to understand about compile-time settings is that they can't be changed later; if a user elects to disable a specific feature at compile-time, it cannot be added later without recompiling all or some of the software. For example, the Apache web server supports the ability to dynamically load modules that provide various capabilities; however, a user may wish to disable this behavior and build Apache as a single program, without support for modules. If an administrator chooses to do this and later wishes to use a dynamic library, Apache will have to be recompiled.
A given prebuilt software package will have been compiled with a specific set of features. If the user of this package wishes to use a different set of features, then she would need to build the software from source code. This issue is independent of packaging format; whether the binary software is shipped in RPM, Debian, or some other format, it still has been compiled with a certain set of compile-time options. When considering the installation of a binary software package, users should consult the documentation of whoever prepared the package to see what options are enabled. If the package's configuration is inadequate, the administrator will have to build her own copy from source code, after all.
Unfortunately, there's no truly general rule of thumb for making the decision of installing from source or binary. The rule varies by platform (such as Alpha users who prefer to build from source) and by need. (A user may be more security conscious about a production server than about a workstation, and so may be more proactive about disabling unneeded features in software.) Obviously, of course, it is easier and simpler to install from binary packages when doing so is acceptable, and for most applications that policy will work perfectly well. Occasionally, though, you might need to install from source code; for such situations, "you'll know it when you see it", as the saying goes. The key is just to make sure you know what you need from your software before you install it—that will tell you whether to install from source or binary.
Once it's determined how the software is going to be installed, where it's going to be installed is next. This section will discuss the options and some guidelines for choosing an installation location. The location of a single software package on the computer's filesystem isn't really that important (provided it can be properly accessed by everyone who needs it), but it is a good idea to standardize on a policy of where software is installed, to prevent maintenance and upgrade headaches later.
The best location to install software is going to vary by the type of software and the distribution. Most Linux distributions closely follow the Linux Filesystem Hierarchy Standard (FHS) mentioned first in Chapter 3, so that document may be a good guide for making the decision. For most systems, though, there is a rule of thumb that can be applied: If the software comes as a prebuilt binary package or as a preconfigured source package (such as a source RPM file), let the packager install the software according to the package's defaults. If the software is being built from raw source (such as a ZIP file or tarball), it should be installed in a subdirectory of /usr/local or /opt.
A common place to install software is /usr/local. As discussed in Chapter 3, the FHS specifies that /usr is essentially a mirror of the root directory, in that they both have /etc, /lib, /bin, and other subdirectories. The /usr directory is intended to be used for general files, while the root directory is intended to be used only for files that are absolutely critical to the system during startup. The /usr/local directory, meanwhile, is intended to contain software that is specific to a particular system; /usr generally contains only general software.
On Linux systems, this essentially means that /usr and the root directory contain software managed by the native package manager (such as Red Hat's RPM or Debian's system), while /usr/local is intended to be used for files that are installed manually and bypass the native packager. The native packager typically handles conflicts between packages; however, it can't do anything for software it doesn't know about, and so it's useful to keep manually installed software in /usr/local, which most package managers seldom use.
Typically, software installed in /usr/local is placed into the standard directories. That is, /usr/local/etc contains configuration files, /usr/local/bin contains program executables, and so forth. The /opt directory, discussed in the next section, is usually a bit different.
Sometimes you might encounter a system where you wish to install some custom software, but don't have root privileges. For example, many users used to Linux systems and the accompanying GNU tools frequently find it frustrating to use the less capable default tools found on commercial Unix systems. These users may wish to install the GNU tools for their own use.
The lack of root access prevents you from installing in the usual places, and you're stuck with installing to your home directory. An effective technique in such cases is to simply create a subdirectory of your home directory and create your own little mini /usr/local or /opt. For example, you could create a directory named "~/usr" and install all your software into it in the standard way, so that programs go in ~/usr/bin, libraries go in ~/usr/lib, and so on. With this technique, you can use all the tricks covered in this chapter, even if you don't have root access. If you're truly ambitious, you might even install your own copy of a package manager (such as RPM) in your home directory!
Like /usr/local, the /opt (for optional) directory is usually not used by the package managers. So, it's a good place to install software manually, since files placed there won't conflict with files managed by the system. However, /opt is usually used somewhat differently than /usr/local.
The /usr/local directory typically contains files from all packages mixed together. This makes it simple to configure (since, for example, you only have to add a single directory—/usr/local/bin—to the path). However, it also makes it somewhat tricky to upgrade a specific package. For example, if you installed version 1.0 of a software library called "my-software" into /usr/local and then wish to upgrade it to version 2.0, you'll have to overwrite what's already in there. Users who did not wish the library to be upgraded might be impacted by this.
Most systems that use an /opt directory use it to address these issues. Sun Microsystems' Solaris appears to have been the first to use /opt extensively, and since then many other systems have copied the style. Essentially, rather than mixing all the software into a set of shared directories, each package installs into its own subdirectory of /opt. For example, that software library could have been installed into /opt/my-software to keep it separate. Beyond that, it could have been installed into /opt/my-software-1.0; that would allow the new version to be placed in /opt/my-software-2.0. This allows for easier upgrades, and symbolic links can also be used (such as a link in /opt/my-software pointing to the current version) so that users don't have to reference specific versions' directories.
There are pros and cons to the use of both directories. On the one hand, /usr/local makes it simpler to configure the system, since only one directory needs to be added to the shells' path for all programs to be accessible. (Similarly, only one "lib" directory needs to be added to the library path, etc.) However, once you've mixed a large number of files, it can be hard to manage. The multiple-directories approach used in /opt makes it easier to keep track of which files belong to which packages. (It's a sort of poor man's package manager!) However, with /opt you have to hook in multiple directories—and multiple paths, library paths, and so on—into the system's configuration. This can also be a pain to manage. Which is better? It depends on the software, really; with a little experience, you'll work out your own rules of thumb.
Choosing the location for the software is easy if it's being installed via the native package manager: Just let it be installed wherever it pleases, and let the manager deal with it. It's a bit trickier when you have to install it manually. A good way to choose between /usr/local and /opt is on the size or complexity of the software. If it's simply a small utility package, such as, say, an SSH (secure shell) client, then it's probably fine to place it in /usr/local. However, if it's a large package, such as the Apache web server or the KDE desktop, you might want to use /opt to make it easier to upgrade later. Additionally, if the users need multiple versions of a package installed, such as the Java programming language, where developers frequently need access to several versions, /opt is probably easier to use.
After an administrator has decided where to install the software and settled on a set of compile-time configuration parameters, he can actually build the software. Of course, this step only applies to software that is being installed from source code; installations of prebuilt binary packages omit this step. Generally, there are two stages to building software: specifying compile-time options and automating the build.
Source code is turned into an actual program by a compiler (such as the gcc, the C language compiler from GNU). Typically, the compiler has to be invoked on each source code file. Most software projects consist of dozens or hundreds of files (or even more), and compilers are complex beasts, with hundreds of compilation options. This is clearly not something that administrators can do by hand—it's just too much typing. Almost every software project known to humankind uses some sort of scripting mechanism to automate the build process; almost universally, this is the make tool. This tool reads files (usually named, intuitively enough, Makefile) that specify compilation options and identifies files that need to be compiled. In most cases, building the software is as simple as running make from the top-level directory of the source code.
Before the software can be built, though, the compile-time options have to be specified somehow. It's all fine and well for the Makefile to script the compilation process, but how does it know what compilation options the user wants? There are, unfortunately, a lot of different ways of accomplishing this. Some software requires the user to edit a Makefile to set some options (such as installation directories or the locations of libraries that the software depends on). The vast majority of open source software, however, uses the GNU project's autoconf tools, which are described in more detail later in this chapter. In the end, the only way to find out how to configure an application's settings is by reading its documentation. That's why the first step, as mentioned previously, is to read the ReadMe, Install, and other documentation files. As they say, "RTFM", or Read The Fine Material (or something more explicit, if you're not in polite company!).
The last step after building the software is installing it. Again, this is typically automated by the Makefile for the project. The make command lets the Makefile specify targets. One target (the default target) usually builds the target, and a second target (usually called "install") handles the installation. So, in most cases installing the software after it is built is as simple as typing make install. Again, though, this can vary and the software's own documentation is the last word.
The administrator's task isn't done, even after the software is built and installed! Frequently, there are additional configuration activities that must be accomplished after the software is installed. Configuration options that alter the behavior of software while it runs are called run-time options. This section will describe the most common ways of configuring a system at run-time.
A compile-time option typically enables or disables some feature in the software, or hard-codes a parameter such as the location of a required file. Since these options are literally compiled into the software, they can't be altered. (If you disable a feature at compile-time, it's simply not there to be activated later!) A run-time option, in contrast, sets a value or alters behavior for a specific execution of the program. There are a lot of ways this can be accomplished, and the remainder of this section describes some of the most common techniques.
The simplest and most direct run-time configuration mechanism is the command-line parameter, sometimes known as command-line switches or flags. This isn't really a full-blown configuration option like the others mentioned in this section, but its end result is similar. Even the greenest Unix user is intimately acquainted with this technique, and almost all the example commands in this book use command-line parameters.
On Unix-like systems, command-line parameters typically follow the command name and are delimited by dashes; for example, the ls file listing command takes several command-line options to alter the output. Most of these are single-character options and can be joined together with a single dash; for example, ls -la is equivalent to ls -l -a. The GNU project introduced a variant of this technique by using two dashes and a full word instead of a single dash and letter. This is intended to be a more intuitive and easy-to-remember syntax. As an example, GNU's version of ls has color highlighting that is enabled by the command line flag –color=always. (Typically, GNU programs also offer both traditional single-dash/single-letter forms in addition to the longer forms.)
There are other forms of command line options, too. Some programs, such as the Concurrent Version System (CVS) program, which is a software change management tool, take sub commands, which are simple words. The command cvs checkout, for example, fetches a module from a CVS repository. There are other variants as well, and they vary in complexity. The manual page (which can be accessed by the command "man cvs") or other documentation is the best bet, but it's useful to be aware of the patterns.
Once the software has been configured for features, compiled, and installed in the appropriate directory, it might need to be configured for actual use. This differs from the first step—compile-time configuration—in which the software is configured before building it, to activate or deactivate features and behaviors. In this step, the software will be configured to actually run; this is known as run-time configuration. For example, the Apache web server can be configured to either activate or deactivate support for dynamic shared objects before compiling it, but in either case it must still be configured with values required to actually run correctly, such as the location of the HTML files it is to serve up, or the TCP/IP port number to use.
There are a number of different ways to configure software, and different packages will take different approaches. Generally, though, there are some traditional techniques used on Unix-like systems that most applications follow. The next five subsections will discuss the following most common ways of configuring software:
Global configuration files
Drop-in configuration file directories
Flat files or directories
Servers launched by inetd
Some are typical of all Unix-like systems, and some are more specific to Linux systems or particular distributions.
The canonical way to configure software for a Unix-like system is to have the software read one or more specially-named configuration files in the /etc directory. For example, the file /etc/sendmail.cf is used to configure the sendmail SMTP server. Sometimes, applications that require more than one configuration file will use multiple files in a subdirectory of /etc. For instance, by default the OpenSSH package reads its client and server configuration files out of /etc/ssh. (These locations are frequently hard-coded at compile-time and can usually be altered by rebuilding the package.) This is probably the most common way of configuring software.
The drop-in file configuration directory is a technique used extensively by Red Hat's Linux distribution, and it's catching on in other systems as well. This technique is used when a single software package can have separate configurations for separate scenarios. The software then defines a single directory (typically as a subdirectory of /etc) and reads a separate configuration from that directory for each scenario. A scenario can be anything from a different user to a different software application—anything that requires its own unique configuration. For example, the xinetd package stores configuration files in /etc/xinetd.d and has a separate configuration file for each inetd service that it is to manage. The PAM (Pluggable Authentication Modules) package, meanwhile, reads configuration files out of /etc/pam.d and has a separate file for each service or program for which it manages authentication.
The advantage of this approach is that software can be customized for a given scenario by altering a file specific to that scenario, rather than by modifying the contents of a single file containing configuration for all of the scenarios. For example, a new inetd service can be configured with xinetd by adding a file to /etc/xinetd.d for that service. The alternative would be to have the service being installed modify a single shared configuration file, which is prone to error and might end up breaking another service's configuration. As mentioned previously, this technique is used extensively in Red Hat Linux, and it is discussed in more detail in Chapter 4.
The drop-in configuration file technique works best with software that is providing some kind of service to other software. In such cases, the software that provides the service doesn't know in advance how many other applications need to use it. So, the service software simply provides a "hook" for other programs to add their own configuration data to the mix. Look for uses of this technique wherever a program doesn't know in advance how much it will be used. It's becoming quite popular!
Some applications simply locate their configuration files in the same directory in which they are installed. Normally, Unix applications place their configuration files in the /etc directory, however, so this technique is most commonly seen in applications that are ported to Unix from another platform, or that have no global configuration requirements at all (such as the OpenOffice productivity application).
Additionally, users sometimes install software in a specific directory that differs from the default directory. For example, if the Apache web server is built from source and placed in /usr/local/apache-2.0, its configuration files will be found in /usr/local/apache-2.0/etc. (Normally, these would be found in simply /etc.)
Some server applications are not stand-alone servers, but rather services managed and launched on demand by the inetd superserver. Services such as these do not have their own configuration files, but are instead passed whatever information they require by inetd when they are launched. The inetd program, in turn, provides a way to specify configuration information for each service. There are several different implementations of the inetd server, and each has its own method of configuring the services.
Some software is not intended to be used by itself, but is rather providing functionality other programs rely on (i.e., some software is shipped as libraries used by other programs). This software may or may not have any configuration needs. For example, the Dante library, which provides SOCKS5 functionality, requires a configuration file defining which servers and addresses Dante uses, while the OpenSSL library, which provides various cryptographic algorithms, requires no run-time configuration at all. User libraries that require configuration information generally follow one of the other patterns (such as flat configuration files and the drop-in directory) discussed in the earlier subsections.
Sometimes software requires (or simply allows) that users set additional configuration values. Typically this is user-specific information such as home directory, location of various files, etc., and it is usually handled by setting shell variables. For the newly installed software to function correctly for a given user, the user may need to set some environment variables in her login shell.
Users can always do this themselves, and for software that's used only by a small number of users, that works fine. However, commonly used software demands that many users set the environment variables, and this quickly becomes an administrative burden. In these cases, it is necessary for the administrator to configure the software globally; however, how this is done varies from system to system.
The user environments for Red Hat Linux, Debian, and Slackware are described in Chapters 4, 5, and 6, respectively; the information in those chapters should be enough of a head start to figure out how to customize any Unix-like system's global user environment. In particular, see Red Hat's mechanism in Chapter 4. It appears to be catching on even among non–open source applications.
For end-user applications, installation is complete once the user environment is configured. However, server applications are obviously only useful when they are running, and so servers need to be hooked into the startup process of the computer. Each Unix system does this in its own way, but there are two general methods in common use.
One method is the set of startup scripts used by AT&T System V (SysV), and the other is the equivalent startup script mechanism used by BSD. These two systems serve the same purpose, but they are very different in practice. The SysV approach uses the "drop-in file configuration directory" technique described earlier in this chapter, while the BSD approach is simpler but relies on a single file. Red Hat Linux uses the SysV technique, while Debian and Slackware use the BSD technique. For more information on the SysV technique, see Chapter 4; for information on the BSD approach, see Chapter 5.
|< Free Open Study >|| |