Packaging Format | Tuning and Customizing a Linux System

< Free Open Study >

Red Hat Linux uses the Red Hat Package Manager (RPM) format as its package manager. RPM was developed explicitly for Red Hat Linux. This section discusses the Red Hat Package Manager in detail, first describing some of the general properties of the format and then covering the important features in detail. Once you've read this section, you'll have a working understanding of how to manage software on an RPM-based system, and you'll know where to look to find out how to do the more esoteric tasks.

RPM was one of the first package management tools for Linux systems, but it is currently at version 4, meaning that it has been significantly overhauled three times since the original version. RPM is consequently a relatively mature package management tool and compares favorably to most commercial package management tools. In fact, it's such a solid product that even commercial Unix vendors are beginning to include tailored installations of it. RPM is really only rivaled or exceeded by the Debian packaging system.

RPM is a pretty typical example of a package management tool for Unix systems, though it far exceeds most in actual functionality. Such utilities typically allow you to install and uninstall software packages. RPM provides rich functionality for these basic tasks, but it also goes beyond them by providing upgrade and dependency-validation semantics, as well as a way to list installed packages and even query packages (both installed and uninstalled) for documentation. In addition, RPM has functionality supporting source RPMs, which are packages containing the source code for a particular software package. Users can use source RPMs to build software with customizations different from the defaults for a given package or to build the software on an architecture or platform not supported by a given package.

You can think of RPM in a certain sense as a sort of database for software information. Relational databases (such as those used for web sites or other applications) store their data so that it can be queried or updated in a variety of ways. Similarly, the RPM format allows administrators and users to query installed packages for a variety of information and to "insert" or "remove" applications by installing or uninstalling them. Also, just as relational databases enforce transactions and atomic operations, RPM enforces software dependency requirements and tries to ensure atomic operations.

Note

An atomic operation is one that can't be interrupted and is guaranteed to either complete successfully or not at all—an atomic operation can't partially succeed and leave behind a mess.

Technical Summary

This section covers some of the basic details of an RPM package. In particular, the conventions for naming RPM files and the properties of the file format itself are discussed.

File Naming Conventions

By convention, RPM file names have this format:

 package-version-build.arch.rpm

where "package" is the name of the software, "version" is the release version of the software, "build" is the version of the particular RPM package, and "arch" is the architecture that the RPM was compiled for.

For example, an RPM file for glibc v2.2.5 as shipped with Red Hat Linux 7.3 and built for Intel-based systems might have the file name glibc-2.2.5-.i386.rpm. The version of this file that actually shipped with Red Hat 7.3 is glibc-2.2.5-34.i386.rpm; in this case, "-34" indicates that this was the thirty-fourth version of this RPM Red Hat created before releasing Red Hat 7.3. (Generally, these miniversions include minor tweaks to the package layout or configuration options, and occasionally minor bug fixes.)

Architectures

As mentioned in the previous section, RPMs are built for specific architectures. An architecture is usually simply a type of microprocessor. Most installations of Linux are on systems with Intel microprocessors in the i386 family, but Linux runs on many other processors, and you can find RPMs for these platforms as well. Table 4-1 lists some common architectures and the name of that representation in an RPM file name.

Table 4-1: RPM Architectures
ARCHITECTURE	VENDOR	NAME IN RPM	COMMENT
Intel 386 family	Intel	i386	Most common RPM format.
Intel Pentium	Intel	i586	Makes use of Pentium-specific performance optimizations.
Intel Pentium Pro	Intel	i686	Makes use of optimizations for Pentium Pro, Pentium III, or Pentium IV.
Alpha	Compaq	alpha	Compaq's high-performance RISC processor. This is an actual architecture and it does not indicate a test version of the software!
PowerPC	IBM, Motorola, Apple	ppc	IBM's PowerPC architecture used in Apple Macintoshes and certain IBM workstations.
SPARC	Sun Microsystems	sparc	Processor used in Sun workstations and servers.
Generic	N/A	noarch	The RPM does not contain any processor-specific binary files—for example, it might consist entirely of shell scripts.
Source code	N/A	src	Contains only source code. When compiled for a specific architecture, the output will be a binary RPM for that platform.

Package File Format

The RPM file format is based on cpio; that is, an rpm file is simply a cpio file with some extra header bytes prepended to it. The RPM package itself comes with a tool (rpm2cpio) that can actually convert an RPM file into the underlying cpio file. The cpio command on a Unix system is similar to tar, but it has a slightly different usage.

RPM is more than just a fancy file storage format, however. RPM archives (sometimes called packages) include additional information on dependencies that the software has. RPM also allows for scripts to be run when the package is installed or uninstalled, and these scripts can perform additional work that RPM itself doesn't do automatically. An RPM installation will enforce these dependencies and reject an administrator's attempt to install a file if the dependencies are not met (unless the user explicitly overrides this behavior).

The cpio and tar Commands

Two common formats for packaging and distributing files on Unix-like systems are cpio and tar. These formats are named after the programs used to create and manipulate them. The tar command is short for "tape archive" and was originally used to shuffle data to and from magnetic tapes, usually for backup purposes. However, the tool can also be used to create and manipulate archives of ordinary files. The cpio command is very similar to tar, and its name means "I/O (input/output) copy."

These two commands are used to create files that contain other files. They are very similar to the ZIP file format commonly used on other operating systems such as Windows and IBM's OS/2. Unlike ZIP, however, they do not themselves provide any compression of the data, but rather are simply "containers" of other files. Typically, one of these archives is compressed via a command such as the GNU Zip (gzip) program—for example, a gzip-compressed tar file becomes the ubiquitous .tar.gz file.

These file formats differ from a more sophisticated packaging format such as RPM. Whereas a tar or cpio archive simply contains other files for "transportation", an RPM archive augments the files themselves with additional information (such as dependencies on other files not included in the archive or installation scripts) that is required to correctly install or use the files.

Using RPM

This section covers the basics of using RPM to manage software installations. Red Hat's documentation on RPM is quite good, so rather than duplicate that information, this section and the following ones cover only the basics. Readers who need more detail should consult the RPM manual page (via the man rpm command) or the excellent documentation on Red Hat's web site (http://www.redhat.com/docs/books/max-rpm/index.html).

Program Modes

The command for managing RPM packages is simply rpm. This single program has the five following "modes" that it operates in to perform its various tasks:

Query
Install
Upgrade
Verify
Erase
Rebuild

Each of these modes has additional options that let users perform specific tasks. Again, for full documentation, consult the information available from Red Hat. The following sections present details on how to perform several common, useful tasks.

Access Permissions

The RPM database is a tool for managing the software installed on the system. Obviously, if a system is shared among many users, only an administrator should be able to install software. However, users may still wish to be able to check on installed packages; for this reason, the RPM database is usually installed with "world-readable" but only "root-writable" permissions. Thus, as a general rule of thumb, normal users may execute any query, but only the root user will be able to execute any command that would modify the contents of the disk, such as an installation or deletion.

Dependency Checks

Many of the commands described in this section rely on dependency checks. That is, RPM will determine whether it can perform an operation based on whether or not the operation would result in software with broken or unfulfilled dependencies. As a convenience, RPM will also include all package or file names given to it in this dependency checking. For example, the command

 rpm -i foo.i386.rpm bar.i386.rpm

will install both RPMs at the same time. Aside from saving the user the effort of issuing two rpm commands, this allows users to install multiple RPMs at the same time that might be mutually dependent on each other. Most of the RPM operations have this behavior.

Understanding Dependencies

A package has a dependency on another package or file if it requires the presence of that package or file to properly function. These dependencies can take any form. A program might require a particular shared library in order to function, or it might require that another program first be installed. (This program might be used in configuring or operating the new program.) Alternatively, a package of documentation files might require the presence of a reader program for those files.

As a concrete example, the KDE window manager and desktop system requires that XFree86 be installed, as well as the Qt windowing toolkit library. Meanwhile, the Washington University FTP server (wuftpd) requires that an inetd program (such as xinetd) be present. Keeping track of all these dependencies can become burdensome, and so the RPM system (and other packaging formats) was created to automate these tasks.

Querying Uninstalled Packages

RPM is a distribution format as much as it is a package management format. That is, an important goal of RPM is to provide a canonical, standardized way for distributors of software to package their applications so that end users can easily install those applications. To this end, the RPM format and the rpm program provide the capability to query uninstalled RPM packages for information about the software. This allows users to find out what the RPM will do and what it will install before they actually install it.

This is the general form of an RPM query command:

 rpm -q <options> <target>

<target> is either the name of an installed package or a path to an RPM file, and <options> are additional letters specifying the type of information to be retrieved and whether the desired package is already installed.

To query a package that has not been installed yet, the form is

 rpm -qp <options> <filename>

<options> in this case must indicate what information is to be retrieved, and <filename> must be the path to the RPM file to be queried. There are many options for obtaining information, but these four are particularly useful:

General information
File listing
Requirements
Provided capabilities

Table 4-2 lists the RPM commands required to retrieve these query types and describes what details are provided by each.

QUERY TYPE

RPM COMMAND

INFORMATION PROVIDED

General information

rpm -qpi

General information about the package, such as size in bytes and a text description of the package

File listing

rpm -qpl

A list of all files the package will install

Requirements

rpm -qpR

A list of all capabilities (such as shared libraries) required by the software

Provided capabilities

rpm -qp —provides

A list of all capabilities the software provides

Note

Each of these RPM commands must be followed by the file name of the RPM package to be queried—for example, rpm -qpi glibc-2.2.5-34.i386.rpm

Additional queries that you can perform on uninstalled packages include listing configuration and documentation files, listing the scripts used by the package, and so forth. Some of the more advanced RPM queries are actually quite powerful but can become a bit arcane. Interested readers should consult Red Hat's documentation on RPM.

Installing Packages

Installing packages with RPM is quite easy. The command used is simply

 rpm -i  <filename>

Two useful options are —force and —nodeps, which respectively force the package to be installed even if it conflicts with existing files and instruct RPM to ignore dependency checks when installing the package.

Caution

Use the –force and –nodeps options with great care, since they are overriding RPM's default behavior.

Users may take advantage of additional options to instruct RPM to customize the installation of a package, but generally installations are as simple as executing the previous command.

RPM goes through the following routine when installing a package:

It extracts a file list from the package and checks it against its database of installed files, looking for conflicts.
It checks the dependencies listed by the package to make sure they are met. If all is well, RPM then executes any preinstallation scripts identified by the package.
The files are extracted from the archive and placed in the appropriate locations on the disk.
Finally, RPM executes any postinstallation scripts identified by the package.

After these steps are completed, the package is installed. However, sometimes a given package that has already been installed may need to be notified when a related package is installed. For example, the xinetd package may need to be notified whenever any applications that use xinetd are installed. In cases such as this, RPM allows a package to designate trigger scripts that get executed whenever certain conditions are met during a subsequent package installation. Executing any relevant triggers is the last step in installing an RPM package. In practice, comparatively few RPM packages actually use triggers, but the functionality is available.

Upgrading Packages

If one thing is true of software, it's that it's never finished. Whenever a new version of software is released, users will probably want to upgrade their copies. RPM allows users to upgrade the packages installed on their system to a later version.

The command for an RPM upgrade is

 rpm -U <filename>

The process for upgrading an RPM is similar to installing one, except that RPM first checks to see if the software is already installed. (RPM will abort the process if the software is not already present.) Most of the same commands available for an RPM installation also work for an upgrade. Using an RPM upgrade essentially saves the user from having to manually erase the old RPM and install the new one.

Beware the Case!

The syntax for an RPM upgrade is rpm -U <filename>; however, older versions of RPM used a lowercase "u" to mean "uninstall"! Current versions of RPM use the -e option (for "erase") instead of -u, but you should take care nonetheless, especially when using older Red Hat Linux systems.

Querying Installed Packages

One of the most powerful features of RPM is its capability to query the system for installed packages. This allows users to easily check whether a given piece of software is installed (and if so, where it is located), rather than have to hunt around the system to find out the hard way. It also allows administrators to keep track of a system, since they can generate a list of all software installed. Because the RPM database also keeps track of the additional information described earlier in this chapter under the "Querying Uninstalled Packages" section, RPM also provides access to documentation and dependency information on installed packages.

Querying an installed package is almost identical to querying a package that hasn't been installed yet. The only difference is that instead of executing the command

 rpm -qp <options> <filename>

users simply execute the command

 rpm -q<options> <package>

and substitute the name of an installed package for the file name of an uninstalled package. Otherwise, all of the query options listed in Table 4-2 can be used.

One notable variant of the package query command is the option to query all packages. The command

 rpm -qa

allows users to perform a query on all packages. Normally, this is used to obtain a list of all installed packages, but users can also add the query options listed in Table 4-2. Be forewarned, though: Commands such as rpm -qai will generate a lot of output!

Verifying Installed Packages

Over the course of a system's normal usage, files get changed. This could be as innocent as simply changing a configuration file in the /etc directory or as problematic as a file being corrupted by a power failure. To assist in the detection of these changes, RPM allows users and administrators to verify the integrity of an installed package.

The command

 rpm -V <package>

 rpm -Va

will verify all installed packages. RPM then computes the MD5 checksum of all files on the disk and compares them to the stored MD5 sums from the original RPMs. The output of the command is a line indicating any files that differ from the versions that were originally installed and a code indicating how the file has changed.

RPM verification is useful for detecting accidentally damaged files and for maintaining a list of files that have been manually changed. It is important to note that many files (such as the device node files in /dev) get changed almost immediately after installation, so there will almost always be changed files on any given system.

Also, while RPM verification can be a crude first line of defense in detecting an unauthorized intrusion (such as a cracker gaining root on the machine), there are better tools available, such as Tripwire, which comes with many distributions, including Red Hat 7.3. RPM's package verification functionality should be used strictly as a damage detector, and should not be relied on as a security tool. You can find the open source version of Tripwire at http://www.tripwire.org.

Rebuilding Packages

Sometimes a user will want to install a piece of software with different customizations than those contained in the default RPM as distributed or is using a Linux system on an architecture for which there is no binary RPM. Sometimes a software vendor simply can't keep track of all the possible target configurations of their software and wishes to provide a convenient package of source code for their product. To handle these cases, Red Hat's RPM format includes support for source RPMs that contain source code for software. These RPMs have the extension .src.rpm and they can be rebuilt into binary RPMs.

Recall that an RPM file is really a cpio file with some additional information. This additional information is contained in a file called a spec file (which is short for "specification file"). A spec file contains all the information needed to build an RPM, such as the list of dependencies, the list of required capabilities, and so forth. Spec files are written by the developers of the software or by the packager of the RPM file. All that is needed to build a binary RPM file is the spec file and the source code.

There are two ways to rebuild an RPM package. If a source RPM is available, the command

 rpm --rebuild <filename>

will rebuild that file into a binary RPM (assuming no errors occur during the build). This is the simplest case, and it will work for almost all source RPMs. However, the RPM command supports additional options for conveniently building RPM files out of other formats.

Sometimes, the developers of a piece of software will release their application in some neutral format, such as a tarball, but include a spec file with it. In this case, the tarball can be placed in the special directory /usr/src/redhat/SOURCES, and the spec file can be placed in /usr/src/redhat/SPECS. Then, the command

 rpm -ba /usr/src/redhat/SPECS/<filename>

will rebuild the tarball according to the directions in the spec file. The output will be an RPM file in /usr/src/redhat/RPMS/<arch> (where <arch> is the architecture of your system, such as i386, as listed in Table 4-1). This RPM file can then be installed normally, like any other RPM file. It can also be copied to another system and installed there (assuming any dependencies are met); in fact, this is generally how most RPMs are created for distribution.

Uninstalling Packages

Occasionally software must be uninstalled from a system. This is historically one of the most difficult tasks about managing a computer system. Windows users are familiar with DLL hell, which refers to the problem of "orphaned" shared libraries and multiple (incompatible) versions of libraries installed or left behind by several programs. On RPM-based systems, however, this problem is greatly reduced.

Because the RPM database keeps track of all the files installed by a given package and ensures that one package doesn't replace a file owned by another package, RPM always knows which files belong to a given package, making it easy to uninstall a package. RPM's capability to track the dependency information of packages also allows it to make sure that a user doesn't delete a package that provides a capability required by another package. These behaviors can be overridden or ignored, but generally RPM is very successful at uninstalling software.

The command to uninstall an RPM package is simply

 rpm -e <package>

Most of the options for install and upgrade (for example, –nodeps and –force) are also supported by the erase operation. If RPM determines that it is safe to uninstall the software, then the command will simply return. If a dependency check fails or some other error occurs, however, RPM will indicate what the error is and will not remove the package. If the problem is that another package depends on the package being removed, the user must decide whether both programs can be removed.

Creating New Packages

Creating a new RPM package generally involves creating a spec file and using the RPM program to build an RPM file from a source code archive according to the directions in the spec file. Creating a spec file requires extensive knowledge of the software being distributed, and so it generally must be created by a developer or other individual familiar with the software. There are many sources of documentation on how to create a spec file and RPM, so interested readers and developers should consult these sources. One excellent source is Red Hat's book Maximum RPM, which you can find at http://www.rpm.org/max-rpm.

Additional Functionality

RPM has some extra features that make it easier to use. These features don't include any extra functionality beyond the basic operations discussed earlier in this chapter, but rather make it easier and safer to user RPM for those operations. Specifically, RPM supports cryptographically signed RPM files and operations on RPM files over a network.

Most users of Red Hat Linux either install the system off of some physical medium such as a CD or install it over the network from a public server. In either case, it is extremely common for users to download RPM files from the Internet and install them on their systems. After all, network connectivity is a large part of the success of Linux systems. As a convenience for these operations, RPM has built-in support for automatically downloading RPM files from FTP or HTTP servers. This allows users to install an RPM package in one command rather than having to manually download a file. For example, the rather ponderous command

 rpm -i ftp://ftp.redhat.com/pub/redhat/redhat-7.3-en/os/i386/RedHat/RPMS/zsh- 4.0.4-5.i386.rpm

will install the zsh package for Red Hat Linux 7.3 from Red Hat's public FTP server.

However, downloading files over the network can be dangerous, as it's possible that hostile users could place "Trojan horse" packages that appear to be authentic but in fact contain hostile code that could damage a user's system or expose it to attack. To ameliorate this risk, RPM is able to use the GNU Privacy Guard (GPG) program to verify the authenticity of RPM files that have been cryptographically signed. You can access the GPG program via the gpg command. For more information, visit the GNU Privacy Guard web site at http://www.gnupg.org.

Mechanism vs. User Interface

So far, RPM has only been discussed as a tool for managing packages that the user already has in hand. RPM itself is also only a command line tool, and has no graphical user interface. The reason for these limitations is that RPM is focused strictly on package management. Red Hat's philosophy is to keep mechanism separate from user interface, and so RPM relies on additional tools to provide a pretty user interface.

Red Hat's up2date Program

Red Hat's preferred user interface tool is the up2date program. up2date is both a command-line program and a graphical program that communicates with a server to obtain software updates and install them. up2date keeps track of a system's profile (which is essentially a list of what RPMs the system has installed) and will update any packages for which there is a later version on the server. This allows users and administrators to closely track security and bug fixes released by the administrators of the server.

In a default Red Hat Linux installation, up2date is configured to use Red Hat's servers. Red Hat allows basic access to their servers for free but has a subscription fee plan based on up2date. Users who do not wish to pay for Red Hat's service (such as those managing a base of Red Hat Linux installations at a business) may either set up their own server or choose another provider if one is available.

Other GUI Programs

A myriad of other GUI programs act as a front end to RPM. For example, the GNOME and KDE projects each have their own such tools, named gnoRPM and kpackage, respectively. These programs are fairly similar in functionality and provide a point-and-click, drag-and-drop interface for managing RPM packages. Generally, these applications are also integrated into their desktops in various ways.

Fundamentally, RPM really is just a specification of a file format and a database of information on the RPMs installed in a system. Another packaging system can be compatible with the RPM format by simply supporting the file format. The Debian packaging format has some support for RPM files in their own tool, apt-get.

Examples of Using RPM

Like many tools on Unix systems, RPM is pretty sophisticated and even—dare I say it—arcane. Typically, much of the true power of RPM comes from using it with other commands. This section demonstrates some useful examples of RPM to perform common tasks. Table 4-3 lists some common commands and describes their output. Except where noted, these commands are generally safe—users should feel free to experiment with them.

Table 4-3: Useful RPM Commands
COMMAND	DESCRIPTION
`rpm -qf ‘which <command>’`	Shows the package that installed <command>.
`rpm -ql <package> \| grep /bin`	Locates program binaries (e.g., files in /usr/bin) installed by <package>.
`rpm -qfi ‘which <command>’`	Shows the description of the package that installed <command>; may be useful for commands with no man page.
`rpm -qfl ‘locate <file>’`	Shows all files in the same package as <file>. (Watch out—`locate` may return more than one result!)
`rpm -qa \| grep <package>`	Displays the exact package name for <package>; useful when you know a package is installed, but don't know its exact name.
`rpm -qi ‘rpm -qa \| grep <package>’`	Shows the description of a package whose name isn't fully known.
`rpm -e ‘rpm -qa \| grep <package>`’	Useful for erasing a package whose name isn't fully known. This command can be very dangerous—use it with caution.

If you have read this entire section, you should now have a working knowledge of Red Hat's packaging format, RPM. With this knowledge, you should be able to manage the software installed on your system, and with a little exploration of the detailed documentation available from Red Hat, you'll be using RPM like a pro. The rest of this chapter gives you a similar understanding of the other key aspects of Red Hat Linux.

< Free Open Study >