Pack200 | JAR Archives

Java 5 introduced a new compression format called Pack200, designed specifically to compress JAR archives. Pack200 takes advantage of detailed knowledge of the format of JAR files to achieve much better compression at a lower cost than the general-purpose deflate algorithms used by zip, gzip, and the like. For example, every Java .class file begins with the 4-byte sequence 0xCAFEBABE (in hexadecimal). If you know that every .class file begins with these four bytes, you don't actually need to include them. You can strip them out when compressing and add them back in when decompressing, automatically saving four bytes per file in the archive. The algorithm has a lot of little Java-specific tricks like this. Pack200 won't compress War and Peace as well as zip, but it will compress .class files three to four times smaller than zip will.

The Pack200 format first reorganizes the archive to make it more suitable for compression, for instance by merging and sorting the constant pools in the different classes in the archive. It then throws away some details that zip would normally preserve but that aren't important to a JAR (Unix file permissions, for instance). Next, it compresses this carefully prepared archive with the deflate algorithm so that it can still be uncompressed with existing tools. Compared to a regular zip compression, a Pack200 compression is lossy; you don't get the same bytes out of it that went in. However, all the changes are changes that don't matter in the context of a JAR archive.

There is one problem. Converting an archive into Pack200 format tends to break digital signatures, because it reorganizes the files stored in the archive (and the contents of those files) to enable greater compression. To digitally sign a Pack200 archive, you should first normalize it:

Compress it with Pack200.
Decompress it with Pack200.
Sign the decompressed archive.
Compress it again with Pack200.

Of course, use the same options for each compression and decompression. You may sometimes also need to set the SEGMENT_LIMIT property to -1.

The JDK includes two tools that compress and decompress JARs in the Pack200 format, called, simply enough, pack200 and unpack200. These tools are customarily used to statically compress documents on a web server. If a browser indicates willingness to accept this format by including an Accept-encoding: pack200-gzip field in the HTTP header, the server will send it the .pack.gz form of the file rather than the original. While it would be possible for the server to compress these files on the fly, Pack200 compression normally takes more time than it would take to send the uncompressed file. Precompression with the pack200 tool is preferable.

The Pack200 format is also available to your programs through the Pack200 class in the java.util.jar package:

public abstract class Pack200 extends Object

The Pack200 class itself doesn't do anything except provide instances of its inner Packer and Unpacker interfaces that actually compress and decompress files. These are returned by the static newPacker( ) and newUnpacker( ) methods:

public static Pack200.Packer newPacker( )
public static Pack200.Unpacker newUnpacker( )

To convert an existing archive to Pack200 format, you pass it to the pack( ) method:

public void pack(JarFile in, OutputStream out) throws IOException
public void pack(JarInputStream in, OutputStream out) throws IOException

This packs the existing JAR file or input stream and writes it onto an OutputStream you provide. (Files are not converted in place.) Close the OutputStream when you're done, as the pack( ) method does not close it for you. You can, in fact, pack several JAR files onto the same OutputStream by repeatedly invoking pack( ) and not closing the OutputStream until you're done.

Example 11-2 is a simple program that packs an existing JAR file using Pack200. The convention is that the file is suffixed with .pack (or .pack.gz if the .pack file is subsequently compressed further with gzip).

Example 11-2. Packing a JAR archive

import java.io.*;
import java.util.jar.*;
public class Packer200 {
 public static void main(String[] args) {
 OutputStream out = null;
 try {
 JarFile f = new JarFile(args[0]);
 Pack200.Packer packer = Pack200.newPacker( );
 out = new FileOutputStream(args[0] + ".pack");
 packer.pack(f, out);
 }
 catch (IOException ex) {
 ex.printStackTrace( );
 }
 finally {
 if (out != null) {
 try {
 out.close( );
 }
 catch (IOException ex) {
 System.err.println("Error closing file: " + ex.getMessage( ));
 }
 }
 }
 }
}

Provided you're using Java 5, decompression of Pack200 archives is mostly automatic. The usual JARFile and JarInputStream classes can detect that an archive was compressed with Pack200 and decompress it accordingly. However, you might need to manually convert a Pack200 archive to a regular JAR archive for use with earlier versions of Java. For this purpose, the Pack200.Unpacker interface has an unpack( ) method:

public void unpack(File in, OutputStream out) throws IOException
public void unpack(InputStream in, OutputStream out) throws IOException

The unpack( ) method does not close its OutputStream either, and you can also unpack several Pack200 files onto the same OutputStream by repeatedly invoking unpack( ). Close the OutputStream when you're done.

Example 11-3 is a simple program that unpacks a Pack200 file. Unlike Example 11-2, the command-line pack200 tool bundled with the JDK tends to run a final gzip over the entire packed archive when it's done to get even more compression. Thus, if the input filename ends in .pack.gz, a chained GZIPInputStream decompresses the file before passing it to the unpacker.

Example 11-3. Unpacking a JAR archive

import java.io.*;
import java.util.jar.*;
import java.util.zip.GZIPInputStream;
public class Unpacker200 {
 public static void main(String[] args) {
 String inName = args[0];
 String outName;
 if (inName.endsWith(".pack.gz")) {
 outName = inName.substring(0, inName.length( )-8);
 }
 else if (inName.endsWith(".pack")) {
 outName = inName.substring(0, inName.length( )-5);
 }
 else {
 outName = inName + ".unpacked";
 }
 JarOutputStream out = null;
 InputStream in = null;
 try {
 Pack200.Unpacker unpacker = Pack200.newUnpacker( );
 out = new JarOutputStream(new FileOutputStream(outName));
 in = new FileInputStream(inName);
 if (inName.endsWith(".gz")) in = new GZIPInputStream(in);
 unpacker.unpack(in, out);
 }
 catch (IOException ex) {
 ex.printStackTrace( );
 }
 finally {
 if (out != null) {
 try {
 out.close( );
 }
 catch (IOException ex) {
 System.err.println("Error closing file: " + ex.getMessage( ));
 }
 }
 if (in != null) {
 try {
 in.close( );
 }
 catch (IOException ex) {
 System.err.println("Error closing file: " + ex.getMessage( ));
 }
 }
 }
 }
}

The default options are reasonable, but for extreme tuning you may want to set additional properties. For both the packer and unpacker, this is controlled by a map of properties. This map is returned by the properties( ) method:

public SortedMap properties( )

The map is live. Modifying a property or adding a property to the map returned by this method immediately affects the behavior of the corresponding Packer or Unpacker object. For example, this code fragment tells the packer not to respect the original order of the archive entries:

Map properties = packer.properties( );
properties.put(Packer.KEEP_FILE_ORDER, Packer.FALSE);

Both the names and values in this map are strings, even when the strings hold values that are semantically numbers or Booleans. For instance, Packer.FALSE is the string "false", not Boolean.FALSE. The names of all the standard properties and some of the possible values are available as named constants in the Packer class, as follows:

Pack200.Packer.SEGMENT_LIMIT ("pack.segment.limit")

Memory-limited J2ME environments may not be able to load the entire archive at once. The Pack200 algorithm can split archives into multiple segments, each of which can be decompressed separately from the other segments, at the cost of a somewhat larger total file size. The default value is "1000000" (one million bytes). If the archive grows larger than this, it will be split into multiple segments. You can adjust the segment size to fit the needs of your device, generally making it smaller for devices with less memory.

Two values are special: "0" stores each file and class in its own segment; "-1" stores the complete contents in a single segment regardless of size.

Pack200.Packer.KEEP_FILE_ORDER ("pack.keep.file.order")

Set Pack200.Packer.TRUE ("true") if the JAR entries cannot be reordered during packing and Pack200.Packer.FALSE ("false") if the JAR entries can be reordered during packing. The default is "true", but "false" should improve the compression.

Pack200.Packer.EFFORT ("pack.effort")

A single digit ("0" to "9") indicating the trade-off between time and compression. "0" is no compression at all; "9" is maximum compression.

Pack200.Packer.DEFLATE_HINT ("pack.deflate.hint")

By default each archive entry contains a hint for the decoder indicating how it was stored. If you set this property to either Pack200.Packer.TRUE or Pack200.Packer.FALSE, individual hints will not be used for each file in the archive. Instead, the entire archive will be hinted as either using compression (true) or not (false).

Pack200.Packer.MODIFICATION_TIME ("pack.modification.time")

The default value, Pack200.Packer.KEEP, maintains the last modification time of each entry in the archive. Setting this to Pack200.Packer.LATEST signals the compressor to set all entries within each archive segment to the same last modification time, thereby saving a little space.

Pack200.Packer.UNKNOWN_ATTRIBUTE ("pack.unknown.attribute")

This property defines what to do when a .class file being compressed contains an unrecognized attribute. The default value, Pack200.Packer.PASS, logs a warning and does not attempt to compress the file. Setting this to Pack200.Packer.STRIP indicates that any such attributes should be removed and the remaining file should be compressed. Setting this to Pack200.Packer.ERROR indicates that the entire operation should fail and an exception should be thrown.

Pack200.Packer.PASS_FILE_PFX ("pack.pass.file.")

All files in the archive whose paths begin with this string are not compressed. For example, setting this to "com/elharo/io/ui" would exclude all files in the com.elharo.io.ui package and its subpackages from compression. This can be a complete filename to uniquely identify a file, as in "com/elharo/io/ui/StreamedTextArea.class".

To exclude multiple prefixes, just set new properties that all begin with Pack200.Packer.PASS_FILE_PFX ("pack.pass.file."); for example, Pack200.Packer.PASS_FILE_PFX+1 ("pack.pass.file.1"), Pack200.Packer.PASS_FILE_PFX+2 ("pack.pass.file.2"), and so on.

Pack200.Packer.CLASS_ATTRIBUTE_PFX("pack.class.attribute.") Pack200.Packer.FIELD_ATTRIBUTE_PFX("pack.field.attribute.") Pack200.Packer.METHOD_ATTRIBUTE_PFX("pack.method.attribute.") Pack200.Packer.CODE_ATTRIBUTE_PFX("pack.code.attribute.")

These four values are used to specify what the Pack200 algorithm does with particular attributes in Java .class files. Each of these can be set to Pack200.Packer.PASS, Pack200.Packer.STRIP, or Pack200.Packer.ERROR to indicate what should happen to a particular attribute. For example, to remove the coverage table generated for the JCOV profiler, set the Pack200.Packer.CODE_ATTRIBUTE_PFX+"CoverageTable" ("pack.code.attribute.CoverageTable") property to Pack200.Packer.STRIP.

Besides the three mnemonic constants, you can also set one of these values to a string in a special language defined in the Pack200 specification that specifies precisely how each attribute is laid out in the file. This is really quite advanced and only worth it if you're trying to squeeze every last byte out of a JAR to fit it into an extremely memory-constrained environment.

Pack200.Packer.PROGRESS ("pack.progress")

This read-only property indicates the approximate percentage of the data that has been compressed; i.e., it is a number between 0 and 100. If this is "-1", the operation has stalled. This property applies to unpackers as well as packers.

Nonstandard properties not in this list are mostly ignored. You can set any property you like, but if it's not in this list, it won't change anything. There are a few undocumented properties that do not have mnemonic constants. The only one I've encountered is "strip.debug"; when it's set to "true", all debugging symbols are removed from the packed JAR.

For example, this code fragment sets up a packer for maximum compression, at the possible cost of taking more time and memory to compress and decompress:

Pack200.Packer packer = Pack200.newPacker( );
Map p = packer.properties( );
p.put(Pack200.Packer.SEGMENT_LIMIT, "-1");
p.put(Pack200.Packer.KEEP_FILE_ORDER, Pack200.Packer.FALSE);
p.put(Pack200.Packer.DEFLATE_HINT, Pack200.Packer.TRUE);
p.put(Pack200.Packer.MODIFICATION_TIME, Pack200.Packer.LATEST);
p.put(Pack200.Packer.UNKNOWN_ATTRIBUTE, Pack200.Packer.STRIP);
p.put(Pack200.Packer.EFFORT, "9");

I'm not sure the extra effort is really worth it, though. When testing this, the default options compressed the Saxon 8 JAR archive from 2,457,114 bytes to 628,945 bytes, an impressive 74.4% reduction. Adding these options reduced the final file size to 585,837 bytes, a savings of an additional 1.7%. However, the time to compress jumped dramatically from almost instantaneous to "go get a soda" territory (if not quite all the way to "brew a pot of coffee"), and this was on quite fast hardware. It might be worth doing this if you're only compressing once and then distributing the compressed archive many thousands of times, but I wouldn't try it when compressing archives dynamically.

Both Packer and Unpacker support PropertyChangeListener if you need to monitor the state of various properties:

public void addPropertyChangeListener(PropertyChangeListener listener)
public void removePropertyChangeListener(PropertyChangeListener listener)

Mostly, you'll set the properties yourself, so there's little need to listen for changes. However, monitoring the PROGRESS property does allow you to keep a progress bar or other indicator up to date, so users can tell whether they've got time to make some coffee or just to grab a soda out of the fridge. For instance, Example 11-4 is a simple program that shows a progress bar while the packer is compressing a JAR.

Example 11-4. A ProgressMonitor for packing or unpacking

import java.awt.Component;
import javax.swing.ProgressMonitor;
import java.beans.*;
import java.util.jar.Pack200;
public class PackProgressMonitor extends ProgressMonitor
 implements PropertyChangeListener {
 public PackProgressMonitor(Component parent) {
 super(parent, null, "Packing", -1, 100);
 }
 public void propertyChange(PropertyChangeEvent event) {
 if (event.getPropertyName( ).equals(Pack200.Packer.PROGRESS)) {
 String newValue = (String) event.getNewValue( );
 int value = Integer.parseInt(newValue);
 this.setProgress(value);
 }
 }
}

I'm not sure how useful this is. While the compression can easily take long enough for the progress bar to pop up, the packing code doesn't seem to call propertyChange( ) very frequently. In my tests it was called only three times, once at 0, once at 50, and once at 100. Still, if an operation is going to take a while, it's better to show the user some sign of progress, even if it's a less than perfectly accurate one.

Basic I/O

Introducing I/O

Output Streams

Input Streams

Data Sources

File Streams

Network Streams

Filter Streams

Print Streams

Data Streams

Streams in Memory

Compressing Streams

JAR Archives

Cryptographic Streams

Object Serialization

New I/O

Buffers

Channels

Nonblocking I/O

The File System

Working with Files

File Dialogs and Choosers

Text

Character Sets and Unicode

Readers and Writers

Formatted I/O with java.text

Devices

The Java Communications API

USB

The J2ME Generic Connection Framework

Bluetooth

Character Sets

Character Sets