9.4. Packaging FormatsA released product usually consists of many files, all of which need to be carefully installed in different places on each customer's machine. An appropriate file format for this collection of files has to be chosen for the released package of a product. Sometimes simply unpacking a collection of files is enough for the customer to use the product. For instance, when a product is distributed as source code, a customer unpacks the source files and then follows the build and installation instructions that are part of the package. This section describes some of the packaging formats used to release software. For most products, simply extracting the files from a package is not nearly enough for a complete installation. Other steps in a successful installation may include running other programs, running tests on the customers' machines, and preserving existing data and configuration settings. The unpacking of files and each of the other steps could be done, one at a time, by the customer. However, it's often more convenient to run a single installation program and have it perform all of the different steps. Section 9.5, later in this chapter, describes some tools that can produce such installation programs, or installers.
The first guideline to follow when choosing a packaging format for your product is to use the most common format for each platform and language. Windows packages commonly use WinZip, Unix packages often use tar and then gzip or bzip2, and Java products are often distributed as JAR files. Red Hat Linux uses rpm files, Debian uses deb files, and some other GNU/Linux distributions have their own packaging formats. Since many products are downloaded rather than read from a CD or DVD, compressing the package before releasing it is a normal part of releasing a product.[2] Quite often, a build tool that is favored for a particular language will also support the most common packaging format for that language. For example, Ant is used to build many Java products and can also generate JAR files.
9.4.1. UnixThe original packaging format is the one used by tar, a tape archive program that dates back to the early days of Unix. To create an archive, or tarball, each file has an ASCII header with information about the file prefixed to it, and each header refers to the next file in the archive. Each header also includes a CRC (cyclic redundancy check) to ensure that corrupted headers are detected. If all the files in an archive are ASCII, then the whole tar archive is also ASCII. Older tar files had limits on filename length (Solaris tar still does, apparently), but newer versions do not. tar is most commonly used with a compression program such as gzip or bzip2 to produce compressed .tar.gz (alternatively, .tgz) or .tar.bz2 files, respectively. These compression programs are often fully integrated with tar nowadays, so creating a compressed tar file is done with a single command. By default, a tar file preserves the directory hierarchy and the permissions of the files inside it. However, there is no support in tar itself for cryptographically signing the generated tar files. Another problem with the tar format is that extracting individual files is slow, since all the links in the file headers have to be followed until the correct file is found. Although tar was originally Unix-based, some Windows tools such as WinZip can now also unpack tar archives.
Two other packaging formats are also encountered on Unix systems: cpio and pax. cpio (http://www.gnu.org/software/cpio/manual) is intended more for system backups. pax (which may stand for "portable archive exchange") is designed to combine the strengths of tar and cpio. A good introduction to pax can be found at http://www.onlamp.com/pub/a/bsd/2002/08/22/FreeBSD_Basics.html. Both cpio and pax can read tar archives. Though they are almost unheard of nowadays, you may come across shar archives, perhaps in old postings to USENET. These are simply shell scripts that unpack the files embedded within them. Other common packaging formats for GNU/Linux are Red Hat's rpm (http://rpm.org) and Debian's deb (http://www.debian.org/doc/FAQ/ch-pkg_basics.en.html), which both add more information to the package formats so that the installers that use them can track which files were installed from various packages by using a local database. Internally, rpm uses cpio archives and deb uses gzip-compressed tar archives. Both formats can contain the binary executables or the source files for a package. An extensive comparison of the differences between rpm, deb, and gzip'd tar files can be found at http://kitenet.net/~joey/pkg-comp. A handy tool for checking that rpms are correctly constructed is rpmlint, which used to be found at http://people.mandrakesoft.com/~flepied/projects/rpmlint. 9.4.2. WindowsWhile there are versions of tar for most platforms, the most common packaging format for Windows is zip, a freely documented format from PKWARE (http://www.pkware.com), which also sells applications such as PKZip to create zip archives. Other Windows tools such as WinZip (http://www.winzip.com) and 7-Zip (http://www.7-zip.org) also work with the zip format. Some of these tools can extract files from many other packaging formats as well. Info-ZIP (http://www.info-zip.org) is an open source, highly portable zip tool that runs on both Windows and Unix. Note that zip is not related to the compression utility gzip.
JAR, the standard packaging format for Java products, is an extension of the zip format, with optional signing and versioning abilities. Some of these extensions were later added to the zip format after JAR was defined. You can also use jar to zip and unzip zip files. |