Section 9.4. Packaging Formats


9.4. Packaging Formats

A released product usually consists of many files, all of which need to be carefully installed in different places on each customer's machine. An appropriate file format for this collection of files has to be chosen for the released package of a product. Sometimes simply unpacking a collection of files is enough for the customer to use the product. For instance, when a product is distributed as source code, a customer unpacks the source files and then follows the build and installation instructions that are part of the package. This section describes some of the packaging formats used to release software.

For most products, simply extracting the files from a package is not nearly enough for a complete installation. Other steps in a successful installation may include running other programs, running tests on the customers' machines, and preserving existing data and configuration settings. The unpacking of files and each of the other steps could be done, one at a time, by the customer. However, it's often more convenient to run a single installation program and have it perform all of the different steps. Section 9.5, later in this chapter, describes some tools that can produce such installation programs, or installers.

How an installation program actually packages a product's files is mostly irrelevant to the customer, though many installers do use one of the packaging formats described in this section. When you click on setup.exe for a Windows installer, you neither know nor really care how the files are actually packaged and compressed within the installer. However, if there is no separate installation program, then you will need to know how to extract the product's files from a particular packaging format.


The first guideline to follow when choosing a packaging format for your product is to use the most common format for each platform and language. Windows packages commonly use WinZip, Unix packages often use tar and then gzip or bzip2, and Java products are often distributed as JAR files. Red Hat Linux uses rpm files, Debian uses deb files, and some other GNU/Linux distributions have their own packaging formats. Since many products are downloaded rather than read from a CD or DVD, compressing the package before releasing it is a normal part of releasing a product.[2] Quite often, a build tool that is favored for a particular language will also support the most common packaging format for that language. For example, Ant is used to build many Java products and can also generate JAR files.

[2] The days of using 1.44MB floppy disks for installers have passed at last. Creating installers that would fit on those disksoften breaking them up into a dozen or more disk imageswas a pain for everyone concerned. "Please insert disk 14 of 20," indeed.

9.4.1. Unix

The original packaging format is the one used by tar, a tape archive program that dates back to the early days of Unix. To create an archive, or tarball, each file has an ASCII header with information about the file prefixed to it, and each header refers to the next file in the archive. Each header also includes a CRC (cyclic redundancy check) to ensure that corrupted headers are detected. If all the files in an archive are ASCII, then the whole tar archive is also ASCII.

Older tar files had limits on filename length (Solaris tar still does, apparently), but newer versions do not. tar is most commonly used with a compression program such as gzip or bzip2 to produce compressed .tar.gz (alternatively, .tgz) or .tar.bz2 files, respectively. These compression programs are often fully integrated with tar nowadays, so creating a compressed tar file is done with a single command. By default, a tar file preserves the directory hierarchy and the permissions of the files inside it. However, there is no support in tar itself for cryptographically signing the generated tar files. Another problem with the tar format is that extracting individual files is slow, since all the links in the file headers have to be followed until the correct file is found. Although tar was originally Unix-based, some Windows tools such as WinZip can now also unpack tar archives.

When tar and bzip2 fail, the most common reason is an incomplete download or a lack of space to decompress the files, not corrupted files in the archive. Obvious, but well worth remembering.


Two other packaging formats are also encountered on Unix systems: cpio and pax. cpio (http://www.gnu.org/software/cpio/manual) is intended more for system backups. pax (which may stand for "portable archive exchange") is designed to combine the strengths of tar and cpio. A good introduction to pax can be found at http://www.onlamp.com/pub/a/bsd/2002/08/22/FreeBSD_Basics.html. Both cpio and pax can read tar archives. Though they are almost unheard of nowadays, you may come across shar archives, perhaps in old postings to USENET. These are simply shell scripts that unpack the files embedded within them.

Other common packaging formats for GNU/Linux are Red Hat's rpm (http://rpm.org) and Debian's deb (http://www.debian.org/doc/FAQ/ch-pkg_basics.en.html), which both add more information to the package formats so that the installers that use them can track which files were installed from various packages by using a local database. Internally, rpm uses cpio archives and deb uses gzip-compressed tar archives. Both formats can contain the binary executables or the source files for a package. An extensive comparison of the differences between rpm, deb, and gzip'd tar files can be found at http://kitenet.net/~joey/pkg-comp. A handy tool for checking that rpms are correctly constructed is rpmlint, which used to be found at http://people.mandrakesoft.com/~flepied/projects/rpmlint.

9.4.2. Windows

While there are versions of tar for most platforms, the most common packaging format for Windows is zip, a freely documented format from PKWARE (http://www.pkware.com), which also sells applications such as PKZip to create zip archives. Other Windows tools such as WinZip (http://www.winzip.com) and 7-Zip (http://www.7-zip.org) also work with the zip format. Some of these tools can extract files from many other packaging formats as well. Info-ZIP (http://www.info-zip.org) is an open source, highly portable zip tool that runs on both Windows and Unix. Note that zip is not related to the compression utility gzip.

By default, unzipping an archive with WinZip converts any directory names that are entirely uppercase to all lowercase, though there is an option for disabling this overhelpful behavior.


JAR, the standard packaging format for Java products, is an extension of the zip format, with optional signing and versioning abilities. Some of these extensions were later added to the zip format after JAR was defined. You can also use jar to zip and unzip zip files.



Practical Development Environments
Practical Development Environments
ISBN: 0596007965
EAN: 2147483647
Year: 2004
Pages: 150

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net