Chapter 11. InputOutput Facilities

CONTENTS

Chapter 11. Input/Output Facilities

11.1 Streams
11.2 Files
11.3 Serialization
11.4 Data Compression
11.5 The NIO Package

In this chapter, we continue our exploration of the Java API by looking at many of the classes in the java.io and java.nio packages. These packages offer a rich set of tools for basic I/O and also provide the framework on which all file and network communication in Java is built.

Figure 11-1 shows the class hierarchy of these packages.

Figure 11-1. The java.io package

figs/lj2.1101.gif

We'll start by looking at the stream classes in java.io, which are subclasses of the basic InputStream, OutputStream, Reader, and Writer classes. Then we'll examine the File class and discuss how you can interact with the filesystem using classes in java.io. We'll also take a quick look at the data compression classes provided in java.util.zip. Finally, we'll begin our investigation of the new java.nio package, introduced in Java 1.4. The NIO package adds significant new functionality for building high performance services.

11.1 Streams

Most fundamental I/O in Java is based on streams. A stream represents a flow of data, or a channel of communication with (at least conceptually) a writer at one end and a reader at the other. When you are working with the java.io package to perform terminal input and output, reading or writing files, or communicating through sockets in Java, you are using various types of streams. Later in this chapter we'll look at the NIO package, which introduces a similar concept called a channel. But for now we'll start by summarizing the available types of streams.

InputStream/OutputStream: Abstract classes that define the basic functionality for reading or writing an unstructured sequence of bytes. All other byte streams in Java are built on top of the basic InputStream and OutputStream.
Reader/Writer: Abstract classes that define the basic functionality for reading or writing a sequence of character data, with support for Unicode. All other character streams in Java are built on top of Reader and Writer.
InputStreamReader/OutputStreamWriter: "Bridge" classes that convert bytes to characters and vice versa. Remember: in Unicode, a character is not a byte!
DataInputStream/DataOutputStream: Specialized stream filters that add the ability to read and write simple data types, such as numeric primitives and String objects, in a universal format.
ObjectInputStream/ObjectOutputStream: Specialized stream filters that are capable of writing whole serialized Java objects and reconstructing them.
BufferedInputStream/BufferedOutputStream/BufferedReader/BufferedWriter: Specialized stream filters that add buffering for additional efficiency.
PrintWriter: A specialized character stream that makes it simple to print text.
PipedInputStream/PipedOutputStream/PipedReader/PipedWriter: "Double-ended" streams that normally occur in pairs. Data written into a PipedOutputStream or PipedWriter is read from its corresponding PipedInputStream or PipedReader.
FileInputStream/FileOutputStream/FileReader/FileWriter: Implementations of InputStream, OutputStream, Reader, and Writer that read from and write to files on the local filesystem.

Streams in Java are one-way streets. The java.io input and output classes represent the ends of a simple stream, as shown in Figure 11-2. For bidirectional conversations, you'll use one of each type of stream.

Figure 11-2. Basic input and output stream functionality

figs/lj2.1102.gif

InputStream and OutputStream are abstract classes that define the lowest-level interface for all byte streams. They contain methods for reading or writing an unstructured flow of byte-level data. Because these classes are abstract, you can't create a generic input or output stream. Java implements subclasses of these for activities such as reading from and writing to files and communicating with sockets. Because all byte streams inherit the structure of InputStream or OutputStream, the various kinds of byte streams can be used interchangeably. A method specifying an InputStream as an argument can, of course, accept any subclass of InputStream. Specialized types of streams can also be layered to provide features, such as buffering, filtering, or handling larger data types.

Reader and Writer are very much like InputStream and OutputStream, except that they deal with characters instead of bytes. As true character streams, these classes correctly handle Unicode characters, which was not always the case with byte streams. Often, a bridge is needed between these character streams and the byte streams of physical devices such as disks and networks. InputStreamReader and OutputStreamWriter are special classes that use an encoding scheme to translate between character and byte streams.

We'll discuss all the interesting stream types in this section, with the exception of FileInputStream, FileOutputStream, FileReader, and FileWriter. We'll postpone the discussion of file streams until the next section, where we'll cover issues involved with accessing the filesystem in Java.

11.1.1 Terminal I/O

The prototypical example of an InputStream object is the standard input of a Java application. Like stdin in C or cin in C++, this is the source of input to a command-line (non-GUI) program. It is the input stream from the environment usually a terminal window or the output of another command. The java.lang.System class, a general repository for system-related resources, provides a reference to standard input in the static variable System.in. It also provides streams for standard output and standard error in the out and err variables, respectively. The following example shows the correspondence:

InputStream stdin = System.in;   OutputStream stdout = System.out;   OutputStream stderr = System.err;

This example hides the fact that System.out and System.err aren't really OutputStream objects, but more specialized and useful PrintStream objects. We'll explain these later, but for now we can reference out and err as OutputStream objects, because they are a type of OutputStream as well.

We can read a single byte at a time from standard input with the InputStream's read() method. If you look closely at the API, you'll see that the read() method of the base InputStream class is an abstract method. What lies behind System.in is a particular implementation of InputStream; the subclass provides a real implementation of the read() method.

try {       int val = System.in.read( );       ...   }   catch ( IOException e ) {       ...   }

Note that the return type of read() in this example is int, not byte as you'd expect. That's because Java's input stream read() method uses a convention of the C language. Although read() provides only a byte of information, its return type is int. This allows it to use the special return value of an integer -1, indicating that end of stream has been reached. You'll need to test for this condition when using the simple read() method. If an error occurs during the read, an IOException is thrown. All basic input and output stream commands can throw an IOException, so you should arrange to catch and handle them appropriately.

To retrieve the value as a byte, perform a cast:

byte b = (byte) val;

Be sure to check for the end-of-stream condition before you perform the cast.

An overloaded form of read() fills a byte array with as much data as possible up to the capacity of the array, and returns the number of bytes read:

byte [] buff = new byte [1024];   int got = System.in.read( buff );

We can also check the number of bytes available for reading on an InputStream with the available() method. Using that information, we could create an array of exactly the right size:

int waiting = System.in.available( );   if ( waiting > 0 ) {       byte [] data = new byte [ waiting ];       System.in.read( data );       ...   }

However, the reliability of this technique depends on the ability of the underlying stream implementation to detect how much data can be retrieved. It generally works for files but should not be relied upon for all types of streams.

These read() methods block until at least some data is read (at least one byte). You must, in general, check the returned value to determine how much data you got and if you need to read more.

InputStream provides the skip() method as a way of jumping over a number of bytes. Depending on the implementation of the stream, skipping bytes may be more efficient than reading them. The close() method shuts down the stream and frees up any associated system resources. It's a good idea to close a stream when you are done using it.

11.1.2 Character Streams

Some InputStream and OutputStream subclasses of early versions of Java included methods for reading and writing strings, but most of them operated by naively assuming that a 16-bit Unicode character was equivalent to an 8-bit byte in the stream. Unfortunately, this works only for Latin-1 (ISO 8859-1) characters. To remedy this, the character stream classes Reader and Writer were introduced. Two special classes, InputStreamReader and OutputStreamWriter, bridge the gap between the world of character streams and the world of byte streams. These are character streams that are wrapped around an underlying byte stream. An encoding scheme is used to convert between bytes and characters. An encoding scheme name can be specified in the constructor of InputStreamReader or OutputStreamWriter. The default constructor can also be used; it uses the system's default encoding scheme. For example, let's parse a human-readable string from the standard input into an integer. We'll assume that the bytes coming from System.in use the system's default encoding scheme:

try {      InputStreamReader converter = new InputStreamReader(System.in);     BufferedReader in = new BufferedReader(converter);            String text = in.readLine( );      int i = NumberFormat.getInstance(  ).parse(text).intValue( );  }   catch ( IOException e ) { }  catch ( ParseException pe ) { }

First, we wrap an InputStreamReader around System.in. This object converts the incoming bytes of System.in to characters using the default encoding scheme. Then, we wrap a BufferedReader around the InputStreamReader. BufferedReader gives us the readLine() method, which we can use to convert a full line of text into a String. The string is then parsed into an integer using the techniques described in Chapter 9.

We could have programmed the previous example using only byte streams, and it might have worked for users in the United States, at least. But character streams correctly support Unicode strings. Unicode was designed to support almost all the written languages of the world. If you want to write a program that works in any part of the world, in any language, you definitely want to use streams that don't mangle Unicode.

So how do you decide when you need a byte stream (InputStream or OutputStream) and when you need a character stream? If you want to read or write character strings, use some variety of Reader or Writer. Otherwise, a byte stream should suffice. Let's say, for example, that you want to read text from a file that was written by an earlier Java application. In this case, you could simply create a FileReader, which will convert the bytes in the file to characters using the system's default encoding scheme. If you have a file in a specific encoding scheme, you can create an InputStreamReader with the specified encoding scheme wrapped around a FileInputStream and read characters from it.

Another example comes from the Internet. Web servers serve files as byte streams. If you want to read Unicode strings with a particular encoding scheme from a file on the network, you'll need an appropriate InputStreamReader wrapped around the InputStream of the web server's socket (as we'll see in Chapter 12).

11.1.3 Stream Wrappers

What if we want to do more than read and write a sequence of bytes or characters? We can use a "filter" stream, which is a type of InputStream, OutputStream, Reader, or Writer that wraps another stream and adds new features. A filter stream takes the target stream as an argument in its constructor and delegates calls to it after doing some additional processing of its own. For example, you could construct a BufferedInputStream to wrap the system standard input:

InputStream bufferedIn = new BufferedInputStream( System.in );

The BufferedInputStream is a type of filter stream that reads ahead and buffers a certain amount of data. (We'll talk more about it later in this chapter.) The BufferedInputStream wraps an additional layer of functionality around the underlying stream. Figure 11-3 shows this arrangement for a DataInputStream.

As you can see from the previous code snippet, the BufferedInputStream filter is a type of InputStream. Because filter streams are themselves subclasses of the basic stream types, they can be used as arguments to the construction of other filter streams. This allows filter streams to be layered on top of one another to provide different combinations of features. For example, we could first wrap our System.in with a BufferedInputStream and then wrap the BufferedInputStream with a DataInputStream for reading special data types.

There are four superclasses corresponding to the four types of filter streams: FilterInputStream, FilterOutputStream, FilterReader, and FilterWriter. The first two are for filtering byte streams, and the last two are for filtering character streams. These superclasses provide the basic machinery for a "no op" filter (a filter that doesn't do anything) by delegating all their method calls to their underlying stream. Real filter streams subclass these and override various methods to add their additional processing. We'll make an example filter stream a little later in this chapter.

Figure 11-3. Layered streams

figs/lj2.1103.gif

11.1.3.1 Data streams

DataInputStream and DataOutputStream are filter streams that let you read or write strings and primitive data types comprised of more than a single byte. DataInputStream and DataOutputStream implement the DataInput and DataOutput interfaces, respectively. These interfaces define the methods required for streams that read and write strings and Java primitive numeric and boolean types in a machine-independent manner.

You can construct a DataInputStream from an InputStream and then use a method such as readDouble() to read a primitive data type:

DataInputStream dis = new DataInputStream( System.in );   double d = dis.readDouble( );

This example wraps the standard input stream in a DataInputStream and uses it to read a double value. readDouble() reads bytes from the stream and constructs a double from them. The DataInputStream methods expect the bytes of numeric data types to be in network byte order, a standard that specifies that the high-order bytes are sent first (also known as "big endian," as we'll discuss later).

The DataOutputStream class provides write methods that correspond to the read methods in DataInputStream. For example, writeInt() writes an integer in binary format to the underlying output stream.

The readUTF() and writeUTF() methods of DataInputStream and DataOutputStream read and write a Java String of Unicode characters using the UTF-8 "transformation format." UTF-8 is an ASCII-compatible encoding of Unicode characters commonly used for the transmission and storage of Unicode text. This differs from the Reader and Writer streams that can use arbitrary encodings that may not preserve all the Unicode characters.

We can use a DataInputStream with any kind of input stream, whether it be from a file, a socket, or standard input. The same applies to using a DataOutputStream, or, for that matter, any other specialized streams in java.io.

11.1.3.2 Buffered streams

The BufferedInputStream, BufferedOutputStream, BufferedReader, and BufferedWriter classes add a data buffer of a specified size to the stream path. A buffer can increase efficiency by reducing the number of physical read or write operations that correspond to read() or write() method calls. You create a buffered stream with an appropriate input or output stream and a buffer size. (You can also wrap another stream around a buffered stream, so that it benefits from the buffering.) Here's a simple buffered input stream called bis:

BufferedInputStream bis =   new BufferedInputStream(myInputStream, 4096); ...  bis.read(  );

In this example, we specify a buffer size of 4096 bytes. If we leave off the size of the buffer in the constructor, a reasonably sized one is chosen for us. On our first call to read(), bis tries to fill the entire 4096-byte buffer with data. Thereafter, calls to read() retrieve data from the buffer, which is refilled as necessary.

A BufferedOutputStream works in a similar way. Calls to write() store the data in a buffer; data is actually written only when the buffer fills up. You can also use the flush() method to wring out the contents of a BufferedOutputStream at any time. The flush() method is actually a method of the OutputStream class itself. It's important because it allows you to be sure that all data in any underlying streams and filter streams has been sent (before, for example, you wait for a response).

Some input streams such as BufferedInputStream support the ability to mark a location in the data and later reset the stream to that position. The mark() method sets the return point in the stream. It takes an integer value that specifies the number of bytes that can be read before the stream gives up and forgets about the mark. The reset() method returns the stream to the marked point; any data read after the call to mark() is read again.

This functionality is especially useful when you are reading the stream in a parser. You may occasionally fail to parse a structure and so must try something else. In this situation, you can have your parser generate an error (a homemade ParseException) and then reset the stream to the point before it began parsing the structure:

BufferedInputStream input;   ...   try {       input.mark( MAX_DATA_STRUCTURE_SIZE );       return( parseDataStructure( input ) );   }   catch ( ParseException e ) {       input.reset( );       ...   }

The BufferedReader and BufferedWriter classes work just like their byte-based counterparts but operate on characters instead of bytes.

11.1.3.3 PrintWriter

Another useful wrapper stream is java.io.PrintWriter. This class provides a suite of overloaded print() methods that turn their arguments into strings and push them out the stream. A complementary set of println() methods adds a newline to the end of the strings. PrintWriter is an unusual character stream because it can wrap either an OutputStream or another Writer.

PrintWriter is the more capable big brother of the older PrintStream byte stream. The System.out and System.err streams are PrintStream objects; you have already seen such streams strewn throughout this book:

System.out.print("Hello world...\n");   System.out.println("Hello world...");   System.out.println( "The answer is: " + 17 );   System.out.println( 3.14 );

PrintWriter and PrintStream have a strange, overlapping history. Early versions of Java did not have the Reader and Writer classes and streams such as PrintStream, which must of necessity convert bytes to characters; those versions simply made assumptions about the character encoding. As of Java 1.1, the PrintStream class was enhanced to translate characters to bytes using the system's default encoding scheme. For all new development, however, use a PrintWriter instead of a PrintStream. Because a PrintWriter can wrap an OutputStream, the two classes are more or less interchangeable.

When you create a PrintWriter object, you can pass an additional boolean value to the constructor. If this value is true, the PrintWriter automatically performs a flush() on the underlying OutputStream or Writer each time it sends a newline:

boolean autoFlush = true;   PrintWriter p = new PrintWriter( myOutputStream, autoFlush );

When this technique is used with a buffered output stream, it corresponds to the behavior of terminals that send data line by line.

Unlike methods in other stream classes, the methods of PrintWriter and PrintStream do not throw IOExceptions. This makes life a lot easier for printing text, which is a very common operation. Instead, if we are interested, we can check for errors with the checkError() method:

System.out.println( reallyLongString );   if ( System.out.checkError( ) )                // uh oh

11.1.4 Pipes

Normally, our applications are directly involved with one side of a given stream at a time. PipedInputStream and PipedOutputStream (or PipedReader and PipedWriter), however, let us create two sides of a stream and connect them together, as shown in Figure 11-4. This can be used to provide a stream of communication between threads, for example, or as a "loopback" for testing.

Figure 11-4. Piped streams

figs/lj2.1104.gif

To create a byte-stream pipe, we use both a PipedInputStream and a PipedOutputStream. We can simply choose a side and then construct the other side using the first as an argument:

PipedInputStream pin = new PipedInputStream( );   PipedOutputStream pout = new PipedOutputStream( pin );

Alternatively:

PipedOutputStream pout = new PipedOutputStream( );   PipedInputStream pin = new PipedInputStream( pout );

In each of these examples, the effect is to produce an input stream, pin, and an output stream, pout, that are connected. Data written to pout can then be read by pin. It is also possible to create the PipedInputStream and the PipedOutputStream separately and then connect them with the connect() method.

We can do exactly the same thing in the character-based world, using PipedReader and PipedWriter in place of PipedInputStream and PipedOutputStream.

Once the two ends of the pipe are connected, use the two streams as you would other input and output streams. You can use read() to read data from the PipedInputStream (or PipedReader) and write() to write data to the PipedOutputStream (or PipedWriter). If the internal buffer of the pipe fills up, the writer blocks and waits until space is available. Conversely, if the pipe is empty, the reader blocks and waits until some data is available.

One advantage to using piped streams is that they provide stream functionality in our code without compelling us to build new, specialized streams. For example, we can use pipes to create a simple logging or "console" facility for our application. We can send messages to the logging facility through an ordinary PrintWriter, and then it can do whatever processing or buffering is required before sending the messages off to their ultimate destination. Because we are dealing with string messages, we use the character-based PipedReader and PipedWriter classes. The following example shows the skeleton of our logging facility:

//file: LoggerDaemon.java import java.io.*;      class LoggerDaemon extends Thread {       PipedReader in = new PipedReader( );           LoggerDaemon( ) {           start( );       }          public void run( ) {           BufferedReader bin = new BufferedReader( in );           String s;               try {              while ( (s = bin.readLine( )) != null ) {                   // process line of data              }           } catch (IOException e ) { }       }          PrintWriter getWriter( ) throws IOException {           return new PrintWriter( new PipedWriter( in ) );       }   }      class myApplication {       public static void main ( String [] args ) throws IOException {         PrintWriter out = new LoggerDaemon(  ).getWriter( );              out.println("Application starting...");           // ...           out.println("Warning: does not compute!");           // ...       }   }

LoggerDaemon reads strings from its end of the pipe, the PipedReader named in. LoggerDaemon also provides a method, getWriter(), which returns a PipedWriter that is connected to its input stream. To begin sending messages, we create a new LoggerDaemon and fetch the output stream. In order to read strings with the readLine() method, LoggerDaemon wraps a BufferedReader around its PipedReader. For convenience, it also presents its output pipe as a PrintWriter, rather than a simple Writer.

One advantage of implementing LoggerDaemon with pipes is that we can log messages as easily as we write text to a terminal or any other stream. In other words, we can use all our normal tools and techniques. Another advantage is that the processing happens in another thread, so we can go about our business while the processing takes place.

11.1.5 Streams from Strings and Back

StringReader is another useful stream class; it essentially wraps stream functionality around a String. Here's how to use a StringReader:

String data = "There once was a man from Nantucket...";   StringReader sr = new StringReader( data );      char T = (char)sr.read( );   char h = (char)sr.read( );   char e = (char)sr.read( );

Note that you will still have to catch IOExceptions thrown by some of the StringReader's methods.

The StringReader class is useful when you want to read data in a String as if it were coming from a stream, such as a file, pipe, or socket. For example, suppose you create a parser that expects to read from a stream, but you want to provide an alternative method that also parses a big string. You can easily add one using StringReader.

Turning things around, the StringWriter class lets us write to a character buffer via an output stream. The internal buffer grows as necessary to accommodate the data. When we are done we can fetch the contents of the buffer as a String. In the following example, we create a StringWriter and wrap it in a PrintWriter for convenience:

StringWriter buffer = new StringWriter( );   PrintWriter out = new PrintWriter( buffer );      out.println("A moose once bit my sister.");   out.println("No, really!");      String results = buffer.toString( );

First we print a few lines to the output stream, to give it some data, then retrieve the results as a string with the toString() method. Alternately, we could get the results as a StringBuffer object using the getBuffer() method.

The StringWriter class is useful if you want to capture the output of something that normally sends output to a stream, such as a file or the console. A PrintWriter wrapped around a StringWriter is a viable alternative to using a StringBuffer to construct large strings piece by piece.

The ByteArrrayInputStream and ByteArrayOutputStream work with bytes in the same way the previous examples worked with characters. You can write byte data to a ByteArrayOutputStream and retrieve it later with the toByteArray() method. Conversely, you can construct a ByteArrayInputStream from a byte array as StringReader does with a String.

11.1.6 The rot13InputStream Class

Before we leave streams, let's try our hand at making one of our own. We mentioned earlier that specialized stream wrappers are built on top of the FilterInputStream and FilterOutputStream classes. It's quite easy to create our own subclass of FilterInputStream that can be wrapped around other streams to add new functionality.

The following example, rot13InputStream, performs a rot13 (rotate by 13 letters) operation on the bytes that it reads. rot13 is a trivial obfuscation algorithm that shifts alphabetic characters to make them not quite human-readable (it simply passes over nonalphabetic characters without modifying them). rot13 is cute because it's symmetric; "un-rot13" some text, simply it again. We use the rot13InputStream class in the "Content and Protocol Handlers" section of the expanded material on the CD that comes with this book (view CD content online at http://examples.oreilly.com/learnjava2/CD-ROM/). So we've put the class in the learningjava.io package to facilitate reuse. Here's our rot13InputStream class:

//file: rot13InputStream.java package learningjava.io; import java.io.*;      public class rot13InputStream extends FilterInputStream {          public rot13InputStream ( InputStream i ) {           super( i );       }          public int read( ) throws IOException {           return rot13( in.read( ) );       }         private int rot13 ( int c ) {          if ( (c >= 'A') && (c <= 'Z') )              c=(((c-'A')+13)%26)+'A';          if ( (c >= 'a') && (c <= 'z') )              c=(((c-'a')+13)%26)+'a';          return c;      }  }

The FilterInputStream needs to be initialized with an InputStream; this is the stream to be filtered. We provide an appropriate constructor for the rot13InputStream class and invoke the parent constructor with a call to super(). FilterInputStream contains a protected instance variable, in, in which it stores a reference to the specified InputStream, making it available to the rest of our class.

The primary feature of a FilterInputStream is that it delegates its input tasks to the underlying InputStream. So, for instance, a call to FilterInputStream's read() method simply turns around and calls the read() method of the underlying InputStream, to fetch a byte. The filtering happens when we do our extra work on the data as it passes through. In our example, the read() method fetches a byte from the underlying InputStream, in, and then performs the rot13 shift on the byte before returning it. Note that the rot13() method shifts alphabetic characters while simply passing over all other values, including the end-of-stream value (-1). Our subclass is now a rot13 filter.

read() is the only InputStream method that FilterInputStream overrides. All other normal functionality of an InputStream, such as skip() and available(), is unmodified, so calls to these methods are answered by the underlying InputStream.

Strictly speaking, rot13InputStream works only on an ASCII byte stream since the underlying algorithm is based on the Roman alphabet. A more generalized character-scrambling algorithm would have to be based on FilterReader to handle 16-bit Unicode classes correctly. (Anyone want to try rot32768?) We should also note that we have not fully implemented our filter: we should also override the version of read() that takes a byte array and range specifiers, perhaps delegating it to our own read. Unless we do so, a reader using that method would get the raw stream .

11.2 Files

Working with files in Java poses some conceptual problems. The host filesystem lies outside of Java's virtual environment, in the real world, and can therefore still suffer from architecture and implementation differences. Java tries to mask some of these differences by providing information to help an application tailor itself to the local environment; we'll mention these areas as they occur.

11.2.1 The java.io.File Class

The java.io.File class encapsulates access to information about a file or directory entry in the filesystem. It can be used to get attribute information about a file, list the entries in a directory, and perform basic filesystem operations such as removing a file or making a directory. While the File object handles these tasks, it doesn't provide direct access for reading and writing file data; there are specialized streams for that purpose.

11.2.1.1 File constructors

You can create an instance of File from a String pathname:

File fooFile = new File( "/tmp/foo.txt" );   File barDir = new File( "/tmp/bar" );

You can also create a file with a relative path:

File f = new File( "foo" );

In this case, Java works relative to the current directory of the Java interpreter. You can determine the current working directory by checking the user.dir property in the System Properties list:

System.getProperty("user.dir"));

An overloaded version of the File constructor lets you specify the directory path and filename as separate String objects:

File fooFile = new File( "/tmp", "foo.txt" );

With yet another variation, you can specify the directory with a File object and the filename with a String:

File tmpDir = new File( "/tmp" );   File fooFile = new File ( tmpDir, "foo.txt" );

None of the File constructors throw any exceptions. This means the object is created whether or not the file or directory actually exists; it isn't an error to create a File object for a nonexistent file. You can use the object's exists() instance method to find out whether the file or directory exists. The File object simply exists as a handle for getting information about what is (potentially at least) a file or directory.

11.2.1.2 Path localization

One of the reasons that working with files in Java is problematic is that pathnames are expected to follow the conventions of the local filesystem. Java's designers intend to provide an abstraction that provides ways to work with some system-dependent filename features, such as the file separator, path separator, device specifier, and root directories.

On some systems, Java can also compensate for differences such as the direction of the file separator slashes in a pathname. For example, in the current implementation on Windows platforms, Java accepts paths with either forward slashes or backslashes. However, under Solaris, Java accepts only paths with forward slashes.

Your best bet is to make sure you follow the filename conventions of the host filesystem. If your application has a GUI that is opening and saving files at the user's request, you should be able to handle that functionality with the Swing JFileDialog class. This class encapsulates a graphical file-selection dialog box. The methods of the JFileDialog take care of system-dependent filename features for you.

If your application needs to deal with files on its own behalf, however, things get a little more complicated. The File class contains a few static variables to make this task possible. File.separator defines a String that specifies the file separator on the local host (e.g., / on Unix and Macintosh systems and \ on Windows systems); File.separatorChar provides the same information as a char.

You can use this system-dependent information in several ways. Probably the simplest way to localize pathnames is to pick a convention you use internally, for instance the forward slash (/), and do a String replace to substitute for the localized separator character:

// we'll use forward slash as our standard   String path = "mail/1999/june/merle";   path = path.replace('/', File.separatorChar);   File mailbox = new File( path );

Alternately, you could work with the components of a pathname and build the local pathname when you need it:

String [] path = { "mail", "1999", "june", "merle" };         StringBuffer sb = new StringBuffer(path[0]);   for (int i=1; i< path.length; i++)       sb.append( File.separator + path[i] );   File mailbox = new File( sb.toString( ) );

One thing to remember is that Java interprets the backslash character (\) as an escape character when used in a String. To get a backslash in a String, you have to use \\.

Another issue to grapple with is that some operating systems use special identifiers for the roots of filesystems. For example, Windows uses C:\. Should you need it, the File class provides the static method listRoots(), which returns an array of File objects corresponding to the filesystem root directories.

11.2.1.3 File operations

Once we have a File object, we can use it to ask for information about the file or directory and to perform standard operations on it. A number of methods let us ask certain questions about the File. For example, isFile() returns true if the File represents a file while isDirectory() returns true if it's a directory. isAbsolute() indicates whether the File has an absolute or relative path specification.

Components of the File pathname are available through the following methods: getName(), getPath(), getAbsolutePath(), and getParent(). getName() returns a String for the filename without any directory information; getPath() returns the directory information without the filename. If the File has an absolute path specification, getAbsolutePath() returns that path. Otherwise it returns the relative path appended to the current working directory. getParent() returns the parent directory of the File.

The string returned by getPath() or getAbsolutePath() may not follow the same case conventions as the underlying filesystem. You can retrieve the filesystem's own or "canonical" version of the file's path using the method getCanonicalPath(). In Windows, for example, you can create a File object whose getAbsolutePath() is C:\Autoexec.bat but whose getCanonical-Path() is C:\AUTOEXEC.BAT. This is useful for comparing filenames that may have been supplied with different case conventions or for showing them to the user.

You can get or set the modification time of a file or directory with lastModified() and setLastModified() methods. The value is a long that is the number of milliseconds since the epoch (Jan 1, 1970, 00:00:00 GMT). We can also get the size of the file in bytes with length().

Here's a fragment of code that prints some information about a file:

File fooFile = new File( "/tmp/boofa" );      String type = fooFile.isFile( ) ? "File " : "Directory ";   String name = fooFile.getName( );   long len = fooFile.length( );   System.out.println(type + name + ", " + len + " bytes " );

If the File object corresponds to a directory, we can list the files in the directory with the list() method or the listFiles() method:

String [] fileNames = fooFile.list( ); File [] files = fooFile.listFiles( );

list() returns an array of String objects that contains filenames. listFiles() returns an array of File objects. Note that in neither case are the files guaranteed to be in any kind of order (alphabetical, for example). You can use the Collections API to sort strings alphabetically like so:

List list = Arrays.asList( sa ); Collections.sort(l);

If the File refers to a nonexistent directory, we can create the directory with mkdir() or mkdirs(). The mkdir() method creates a single directory; mkdirs() creates all the intervening directories in a File specification. Use renameTo() to rename a file or directory and delete() to delete a file or directory.

Although we can create a directory using the File object, this isn't the most common way to create a file; that's normally done implicitly with a FileOutputStream or FileWriter, as we'll discuss in a moment. The exception is the createNewFile() method, which can be used to attempt to create a new zero-length file at the location pointed to by the File object. The useful thing about this method is that the operation is guaranteed to be "atomic" with respect to all other file creation. createNewFile() returns a boolean value that tells you whether the file was created.

You can use this to implement simple file locking from Java. (The NIO package supports true file locks, as we'll see later). This is useful in combination with deleteOnExit(), which flags the file to be automatically removed when the Java Virtual Machine exits. Another file creation method related to the File class itself is the static method createTempFile(), which creates a file in a specified location using an automatically generated unique name. This, too, is useful in combination with deleteOnExit().

The toURL() method converts a file path to a file: URL object. We'll talk about URLs in Chapter 13. They are an abstraction that allows you to point to any kind of object anywhere on the Net. Converting a File reference to a URL may be useful for consistency with more general routines that deal with URLs.

Table 11-1 summarizes the methods provided by the File class.

Table 11-1. File methods
Method	Return type	Description
`canRead()`	`Boolean`	Is the file (or directory) readable?
`canWrite()`	`Boolean`	Is the file (or directory) writable?
`createNewFile()`	`Boolean`	Creates a new file
`createTempFile` `(String` `pfx`, `String` `sfx`)	`File`	Static method to create a new file, with the specified prefix and suffix, in the default temp file directory
`delete()`	`Boolean`	Deletes the file (or directory)
`deleteOnExit()`	`Void`	When it exits, Java runtime system deletes the file
`exists()`	`boolean`	Does the file (or directory) exist?
`getAbsolutePath()`	`String`	Returns the absolute path of the file (or directory)
`getCanonicalPath()`	`String`	Returns the absolute, case-correct path of the file (or directory)
`getName()`	`String`	Returns the name of the file (or directory)
`getParent()`	`String`	Returns the name of the parent directory of the file (or directory)
`getPath()`	`String`	Returns the path of the file (or directory)
`isAbsolute()`	`boolean`	Is the filename (or directory name) absolute?
`isDirectory()`	`boolean`	Is the item a directory?
`isFile()`	`boolean`	Is the item a file?
`lastModified()`	`long`	Returns the last modification time of the file (or directory)
`length()`	`long`	Returns the length of the file
`list()`	`String []`	Returns a list of files in the directory
`listfiles()`	`File[]`	Returns the contents of the directory as an array of `File` objects
`mkdir()`	`boolean`	Creates the directory
`Mkdirs()`	`boolean`	Creates all directories in the path
`renameTo(File` `dest)`	`boolean`	Renames the file (or directory)
`setLastModified()`	`boolean`	Sets the last-modified time of the file (or directory)
`setReadOnly()`	`boolean`	Sets the file to read-only status
`toURL()`	`java.net.URL`	Generates a URL object for the file (or directory)

11.2.2 File Streams

Java provides two specialized streams for reading from and writing to files in the filesystem: FileInputStream and FileOutputStream. These streams provide the basic InputStream and OutputStream functionality applied to reading and writing files. They can be combined with the filter streams described earlier to work with files in the same way we do other stream communications.

Because FileInputStream is a subclass of InputStream, it inherits all standard InputStream functionality for reading a file. FileInputStream provides only a low-level interface to reading data, however, so you'll typically wrap it with another stream, such as a DataInputStream.

You can create a FileInputStream from a String pathname or a File object:

FileInputStream in = new FileInputStream( "/etc/passwd" );

When you create a FileInputStream, the Java runtime system attempts to open the specified file. Thus, the FileInputStream constructors can throw a FileNotFoundException if the specified file doesn't exist, or an IOException if some other I/O error occurs. You must catch these exceptions in your code. When the stream is first created, its available() method and the File object's length() method should return the same value. To save resources, you can call the close() method when you are done with the file.

To read characters from a file, you can wrap an InputStreamReader around a FileInputStream. If you want to use the default character-encoding scheme, you can use the FileReader class instead, which is provided as a convenience. FileReader works just like FileInputStream, except that it reads characters instead of bytes and wraps a Reader instead of an InputStream.

The following class, ListIt, is a small utility that sends the contents of a file or directory to standard output:

//file: ListIt.java import java.io.*;      class ListIt {       public static void main ( String args[] ) throws Exception {           File file =  new File( args[0] );              if ( !file.exists(  ) || !file.canRead( ) ) {               System.out.println( "Can't read " + file );               return;           }              if ( file.isDirectory( ) ) {               String [] files = file.list( );               for (int i=0; i< files.length; i++)                   System.out.println( files[i] );           } else               try {                   FileReader fr = new FileReader ( file );                  BufferedReader in = new BufferedReader( fr );                  String line;                  while ((line = in.readLine( )) != null)                  System.out.println(line);              }               catch ( FileNotFoundException e ) {                  System.out.println( "File Disappeared" );               }       }   }

ListIt constructs a File object from its first command-line argument and tests the File to see whether it exists and is readable. If the File is a directory, ListIt outputs the names of the files in the directory. Otherwise, ListIt reads and outputs the file.

FileOutputStreamis a subclass of OutputStream, so it inherits all the standard OutputStream functionality for writing to a file. Just like FileInputStream though, FileOutputStream provides only a low-level interface to writing data. You'll typically wrap another stream, such as a DataOutputStream or a PrintWriter, around the FileOutputStream to provide higher-level functionality.

You can create a FileOutputStream from a String pathname or a File object. Unlike FileInputStream, however, the FileOutputStream constructors don't throw a FileNotFoundException. If the specified file doesn't exist, the FileOutputStream creates the file. The FileOutputStream constructors can throw an IOException if some other I/O error occurs, so you still need to handle this exception.

If the specified file does exist, the FileOutputStream opens it for writing. When you subsequently call the write() method, the new data overwrites the current contents of the file. If you need to append data to an existing file, you can use a form of the constructor that accepts an append flag:

FileInputStream fooOut = new FileOutputStream( fooFile );   FileInputStream pwdOut = new FileOutputStream( "/etc/passwd", true );

Another way to append data to files is with RandomAccessFile, as we'll discuss shortly.

To write characters (instead of bytes) to a file, you can wrap an OutputStreamWriter around a FileOutputStream. If you want to use the default character-encoding scheme, you can use instead the FileWriter class, which is provided as a convenience. FileWriter works just like FileOutputStream, except that it writes characters instead of bytes and wraps a Writer instead of an OutputStream.

The following example reads a line of data from standard input and writes it to the file /tmp/foo.txt:

String s = new BufferedReader(      new InputStreamReader( System.in ) ).readLine( );   File out = new File( "/tmp/foo.txt" );   FileWriter fw = new FileWriter ( out );   PrintWriter pw = new PrintWriter( fw )   pw.println( s );  fw.close( );

Notice how we wrapped a PrintWriter around the FileWriter to facilitate writing the data. Also, to be a good filesystem citizen, we've called the close() method when we're done with the FileWriter.

11.2.3 The java.io.RandomAccessFile Class

The java.io.RandomAccessFile class provides the ability to read and write data at a specified location in a file. RandomAccessFile implements both the DataInput and DataOutput interfaces, so you can use it to read and write strings and primitive types. In other words, RandomAccessFile defines the same methods for reading and writing data as DataInputStream and DataOutputStream. However, because the class provides random, rather than sequential, access to file data, it's not a subclass of either InputStream or OutputStream.

You can create a RandomAccessFile from a String pathname or a File object. The constructor also takes a second String argument that specifies the mode of the file. Use r for a read-only file or rw for a read-write file. Here's how we would start to create a simple database to keep track of user information:

try {       RandomAccessFile users =         new RandomAccessFile( "Users", "rw" )  } catch (IOException e) { ... }

When you create a RandomAccessFile in read-only mode, Java tries to open the specified file. If the file doesn't exist, RandomAccessFile throws an IOException. If, however, you're creating a RandomAccessFile in read-write mode, the object creates the file if it doesn't exist. The constructor can still throw an IOException if another I/O error occurs, so you still need to handle this exception.

After you have created a RandomAccessFile, call any of the normal reading and writing methods, just as you would with a DataInputStream or DataOutputStream. If you try to write to a read-only file, the write method throws an IOException.

What makes a RandomAccessFile special is the seek() method. This method takes a long value and uses it to set the location for reading and writing in the file. You can use the getFilePointer() method to get the current location. If you need to append data to the end of the file, use length() to determine that location, then seek() to it. You can write or seek beyond the end of a file, but you can't read beyond the end of a file. The read() method throws an EOFException if you try to do this.

Here's an example of writing some data to a user database:

users.seek( userNum * RECORDSIZE );   users.writeUTF( userName );   users.writeInt( userID );

Of course, in this na ve example we assume that the String length for userName, along with any data that comes after it, fits within the specified record size.

11.2.4 Applets and Files

Unless otherwise restricted, a Java application can read and write to the host filesystem with the same level of access as the user running the Java interpreter. For security reasons, untrusted applets and applications are not permitted to read from or write to arbitrary places in the filesystem. The ability of untrusted code to read and write files, as with any kind of system resource, is under the control of the system security policy, through a SecurityManager object. A security policy is set by the application that is running the untrusted code, such as appletviewer or a Java-enabled web browser. All filesystem access must first pass the scrutiny of the SecurityManager.

Some web browsers allow untrusted applets to have access to specific files designated by the user. Netscape Navigator and Internet Explorer currently do not allow untrusted applets any access to the filesystem. However, as we'll see in Chapter 22, signed applets can be given arbitrary access to the filesystem, just like a standalone Java application.

It's not unusual to want an applet to maintain some kind of state information on the system on which it's running. But for a Java applet that is restricted from access to the local filesystem, the only option is to store data over the network on its server (or possibly in a client-side cookie). Applets have at their disposal powerful general means for communicating data over networks. The only limitation is that, by convention, an applet's network communication is restricted to the server that launched it. This limits the options for where the data will reside.

Currently, the only way for a Java program to send data to a server is through a network socket or tools such as RMI, which run over sockets. In Chapter 11 we'll take a detailed look at building networked applications with sockets. With the tools described in that chapter, it's possible to build powerful client/server applications. Sun also has a Java extension called WebNFS, which allows applets and applications to work with files on an NFS server in much the same way as the ordinary File API.

11.2.5 Loading Application Resources

We often package data files and other objects with our applications. Java provides many ways to access these resources. In a standalone application, we can simply open files and read the bytes. In both standalone applications and applets, we can construct URLs to well-known locations. The problem with these methods is that we generally have to know where our application lives in order to find our data. This is not always as easy as it seems. What is needed is a universal way to access resources associated with our application and our application's individual classes. The Class class's getResource() method provides just this.

What does getResource() do for us? To construct a URL to a file, we normally have to figure out a home directory for our code and construct a path relative to that. As we'll see in Chapter 22, in an applet, we could use getCodeBase() or getDocumentBase() to find the base URL and then use that base to create the URL for the resource we want. But these methods don't help a standalone application, and there's no reason that a standalone application and an applet shouldn't be written in the same way anyway. To solve this problem, the getResource() method provides a standard way to get objects relative to a given class file or to the system classpath. getResource() returns a special URL that uses the class's class loader. This means that no matter where the class came from a web server, the local filesystem, or even a JAR file we can simply ask for an object, get a URL for the object, and use the URL to access the object.

getResource() takes as an argument a slash-separated pathname for the resource and returns a URL. There are two kinds of paths: absolute and relative. An absolute path begins with a slash, for example, /foo/bar/blah.txt. In this case, the search for the object begins at the top of the classpath. If there is a directory foo/bar in the classpath, getResource() searches that directory for the blah.txt file. A relative URL does not begin with a slash. In this case, the search begins at the location of the class file, whether it is local, on a remote server, or in a JAR file (either local or remote). So if we were calling getResource() on a class loader that loaded a class in the foo.bar package, we could refer to the file as blah.txt. In this case, the class itself would be loaded from the directory foo/bar somewhere on the classpath, and we'd expect to find the file in the same directory.

For example, here's an application that looks up some resources:

//file: FindResources.java package mypackage;  import java.net.URL; import java.io.IOException;   public class FindResources {    public static void main( String [] args ) throws IOException {      // absolute from the classpath      URL url = FindResources.class.getResource("/mypackage/foo.txt");     // relative to the class location      url = FindResources.class.getResource("foo.txt");      // another relative document      url = FindResources.class.getResource("docs/bar.txt");    } }

The FindResources class belongs to the mypackage package, so its class file will live in a mypackage directory somewhere on the classpath. FindResources locates the document foo.txt using an absolute and then a relative URL. At the end, FindResources uses a relative path to reach a document in the mypackage/docs directory. In each case we refer to the FindResources's Class object using the static .class notation. Alternatively, if we had an instance of the object, we could use its getClass() method to reach the Class object.

For an applet, the search is similar but occurs on the host from which the applet was loaded. getResource() first checks any JAR files loaded with the applet, and then searches the normal remote applet classpath, constructed relative to the applet's codebase URL.

getResource() returns a URL for whatever type of object you reference. This could be a text file or properties file that you want to read as a stream, or it might be an image or sound file or some other object. If you want the data as a stream, the Class class also provides a getResourceAsStream() method. In the case of an image, you'd probably hand the URL over to the getImage() method of a Swing component for loading.

11.3 Serialization

Using a DataOutputStream, you could write an application that saves the data content of your objects as simple types. However Java provides an even more powerful mechanism called object serialization that does almost all the work for you. In its simplest form, object serialization is an automatic way to save and load the state of an object. However, object serialization has depths that we cannot plumb within the scope of this book, including complete control over the serialization process and interesting conundrums such as class versioning.

Basically, an object of any class that implements the Serializable interface can be saved and restored from a stream. Special stream subclasses, ObjectInputStream and ObjectOutputStream, are used to serialize primitive types and objects. Subclasses of Serializable classes are also serializable. The default serialization mechanism saves the value of an object's nonstatic and nontransient (see the following explanation) member variables.

One of the most important (and tricky) things about serialization is that when an object is serialized, any object references it contains are also serialized. Serialization can capture entire "graphs" of interconnected objects and put them back together on the receiving end (we'll demonstrate this in an upcoming example). The implication is that any object we serialize must contain only references to other Serializable objects. We can take control by marking nonserializable members as transient or overriding the default serialization mechanisms. The transient modifier can be applied to any instance variable to indicate that its contents are not useful outside of the current context and should never be saved.

In the following example, we create a Hashtable and write it to a disk file called h.ser. The Hashtable object is serializable because it implements the Serializable interface.

//file: Save.java import java.io.*;  import java.util.*;     public class Save {    public static void main(String[] args) {      Hashtable h = new Hashtable( );      h.put("string", "Gabriel Garcia Marquez");      h.put("int", new Integer(26));      h.put("double", new Double(Math.PI));            try {        FileOutputStream fileOut = new FileOutputStream("h.ser");        ObjectOutputStream out = new ObjectOutputStream(fileOut);        out.writeObject(h);      }      catch (Exception e) {        System.out.println(e);      }    }  }

First we construct a Hashtable with a few elements in it. Then, in the three lines of code inside the try block, we write the Hashtable to a file called h.ser, using the writeObject() method of ObjectOutputStream. The ObjectOutputStream class is a lot like the DataOutputStream class, except that it includes the powerful writeObject() method.

The Hashtable we created has internal references to the items it contains. Thus, these components are automatically serialized along with the Hashtable. We'll see this in the next example when we deserialize the Hashtable.

//file: Load.java import java.io.*;  import java.util.*;     public class Load {    public static void main(String[] args) {      try {        FileInputStream fileIn = new FileInputStream("h.ser");        ObjectInputStream in = new ObjectInputStream(fileIn);        Hashtable h = (Hashtable)in.readObject( );        System.out.println(h.toString( ));      }      catch (Exception e) {        System.out.println(e);      }    }  }

In this example, we read the Hashtable from the h.ser file, using the readObject() method of ObjectInputStream. The ObjectInputStream class is a lot like DataInputStream, except that it includes the readObject() method. The return type of readObject() is Object, so we need to cast it to a Hashtable. Finally, we print out the contents of the Hashtable using its toString() method.

11.3.1 Initialization with readObject( )

Often simple deserialization alone is not enough to reconstruct the full state of an object. For example, the object may have had transient fields representing state that could not be serialized, such as network connections, event registration, or decoded image data. Objects have an opportunity to do their own setup after deserialization by implementing a special method named readObject().

Not to be confused with the readObject() method of the ObjectInputStream, this method is implemented by the serializable object itself. The readObject() method must have a specific signature, and it must be private. The following snippet is taken from an animated JavaBean that we'll talk about in Chapter 21:

private void readObject(ObjectInputStream s)     throws IOException, ClassNotFoundException  {     s.defaultReadObject(  );     initialize(  );     if ( isRunning )         start(  ); }

When the readObject() method with this signature exists in an object it is called during the deserialization process. The argument to the method is the ObjectInputStream doing the object construction. We delegate to its defaultReadObject() method to do the normal deserialization and then do our custom setup. In this case we call one of our methods, named initialize(), and optionally a method called start().

We'll talk more about serialization in Chapter 21 when we discuss JavaBeans. There we'll see that it is even possible to serialize a graphical GUI component in mid-use and bring it back to life later.

11.4 Data Compression

The java.util.zip package contains classes you can use for data compression. In this section, we'll talk about how to use these classes. We'll also present two useful example programs that build on what you have just learned about streams and files. The classes in the java.util.zip package support two widespread compression formats: GZIP and ZIP.

11.4.1 Compressing Data

The java.util.zip class provides two FilterOutputStream subclasses to write compressed data to a stream. To write compressed data in the GZIP format, simply wrap a GZIPOutputStream around an underlying stream and write to it. The following is a complete example that shows how to compress a file using the GZIP format.

//file: GZip.java import java.io.*;  import java.util.zip.*;     public class GZip {    public static int sChunk = 8192;       public static void main(String[] args) {      if (args.length != 1) {        System.out.println("Usage: GZip source");        return;      }      // create output stream     String zipname = args[0] + ".gz";      GZIPOutputStream zipout;      try {        FileOutputStream out = new FileOutputStream(zipname);        zipout = new GZIPOutputStream(out);      }      catch (IOException e) {        System.out.println("Couldn't create " + zipname + ".");        return;      }      byte[] buffer = new byte[sChunk];      // compress the file     try {        FileInputStream in = new FileInputStream(args[0]);        int length;        while ((length = in.read(buffer, 0, sChunk)) != -1)          zipout.write(buffer, 0, length);        in.close( );      }      catch (IOException e) {        System.out.println("Couldn't compress " + args[0] + ".");      }      try { zipout.close( ); }      catch (IOException e) {}    }  }

First we check to make sure we have a command-line argument representing a filename. Then we construct a GZIPOutputStream wrapped around a FileOutputStream representing the given filename, with the .gz suffix appended. With this in place, we open the source file. We read chunks of data and write them into the GZIPOutputStream. Finally, we clean up by closing our open streams.

Writing data to a ZIP archive file is a little more involved but still quite manageable. While a GZIP file contains only one compressed file, a ZIP file is actually a collection of files, some (or all) of which may be compressed. Each item in the ZIP file is represented by a ZipEntry object. When writing to a ZipOutputStream, you'll need to call putNextEntry() before writing the data for each item. The following example shows how to create a ZipOutputStream. You'll notice it's just like creating a GZIPOutputStream:

ZipOutputStream zipout;  try {    FileOutputStream out = new FileOutputStream("archive.zip");    zipout = new ZipOutputStream(out);  }  catch (IOException e) {}

Let's say we have two files we want to write into this archive. Before we begin writing, we need to call putNextEntry(). We'll create a simple entry with just a name. There are other fields in ZipEntry that you can set, but most of the time you won't need to bother with them.

try {    ZipEntry entry = new ZipEntry("First");    zipout.putNextEntry(entry);    ZipEntry entry = new ZipEntry("Second");    zipout.putNextEntry(entry);    . . . }  catch (IOException e) {}

11.4.2 Decompressing Data

To decompress data, you can use one of the two FilterInputStream subclasses provided in java.util.zip. To decompress data in the GZIP format, simply wrap a GZIPInputStream around an underlying FileInputStream and read from it. The following is a complete example that shows how to decompress a GZIP file:

//file: GUnzip.java import java.io.*;  import java.util.zip.*;     public class GUnzip {    public static int sChunk = 8192;    public static void main(String[] args) {      if (args.length != 1) {        System.out.println("Usage: GUnzip source");        return;      }      // create input stream     String zipname, source;      if (args[0].endsWith(".gz")) {        zipname = args[0];        source = args[0].substring(0, args[0].length( ) - 3);      }      else {        zipname = args[0] + ".gz";        source = args[0];      }      GZIPInputStream zipin;      try {        FileInputStream in = new FileInputStream(zipname);        zipin = new GZIPInputStream(in);      }      catch (IOException e) {        System.out.println("Couldn't open " + zipname + ".");        return;      }      byte[] buffer = new byte[sChunk];      // decompress the file     try {        FileOutputStream out = new FileOutputStream(source);        int length;        while ((length = zipin.read(buffer, 0, sChunk)) != -1)          out.write(buffer, 0, length);        out.close( );      }      catch (IOException e) {        System.out.println("Couldn't decompress " + args[0] + ".");     }      try { zipin.close( ); }      catch (IOException e) {}    }  }

First we check to make sure we have a command-line argument representing a filename. If the argument ends with .gz, we figure out what the filename for the uncompressed file should be. Otherwise, we use the given argument and assume the compressed file has the .gz suffix. Then we construct a GZIPInputStream wrapped around a FileInputStream, representing the compressed file. With this in place, we open the target file. We read chunks of data from the GZIPInputStream and write them into the target file. Finally, we clean up by closing our open streams.

Again, the ZIP archive presents a little more complexity than the GZIP file. When reading from a ZipInputStream, you should call getNextEntry() before reading each item. When getNextEntry() returns null, there are no more items to read. The following example shows how to create a ZipInputStream. You'll notice it's just like creating a GZIPInputStream:

ZipInputStream zipin;  try {    FileInputStream in = new FileInputStream("archive.zip");    zipin = new ZipInputStream(in);  }  catch (IOException e) {}

Suppose we want to read two files from this archive. Before we begin reading, we need to call getNextEntry(). At the least, the entry will give us a name of the item we are reading from the archive:

try {    ZipEntry first = zipin.getNextEntry( );  }  catch (IOException e) {}

At this point, you can read the contents of the first item in the archive. When you come to the end of the item, the read() method will return -1. Now you can call getNextEntry() again to read the second item from the archive:

try {    ZipEntry second = zipin.getNextEntry( );  }  catch (IOException e) {}

If you call getNextEntry(), and it returns null, there are no more items, and you have reached the end of the archive.

11.5 The NIO Package

The java.nio package is a major new addition in Java 1.4. The name NIO stands for "new I/O," which may seem to imply that it is to be a replacement for the java.io package. In fact, much of the NIO functionality does overlap with existing APIs. NIO was added primarily to address specific issues of scalability for large systems, especially in networked applications. That said, NIO also provides several new features Java lacked in basic I/O, so there are some tools here that you'll want to look at even if you aren't planning to write any large or high-performance services. The primary features of NIO are outlined in the following sections.

11.5.1 Asynchronous I/O

Most of the need for the NIO package was driven by the desire to add nonblocking and selectable I/O to Java. Prior to NIO, most read and write operations in Java were bound to threads that were forced to block for unpredictable amounts of time. Although certain APIs such as Sockets (which we'll see in Chapter 12) provided specific means to limit how long an I/O call could take, this was a workaround to compensate for the lack of a more general mechanism. Prior to the introduction of threads, in many languages I/O could still be done efficiently by setting I/O streams to a nonblocking mode and testing them for their readiness to send or receive data. In a nonblocking mode, a read or write does only as much work as can be done immediately filling or emptying a buffer and then returning. Combined with the ability to test for readiness, this allows a single thread to continuously service many channels efficiently. The main thread "selects" a stream that is ready and works with it until it blocks, then moves to another. On a single processor system, this is fundamentally equivalent to using multiple threads. Even now, this style of processing has advantages when using a pool of threads (rather than just one). We'll discuss this in detail in Chapter 12 when we discuss networking and building servers that can handle many clients simultaneously.

In addition to nonblocking and selectable I/O, the NIO package enables closing and interrupting I/O operations asynchronously. As discussed in Chapter 8, prior to NIO there was no reliable way to stop or wake up a thread blocked in an I/O operation. With NIO, threads blocked in I/O operations always wake up when interrupted or when the channel is closed by anyone. Additionally, if you interrupt a thread while it is blocked in an NIO operation, its channel is automatically closed. (Closing the channel because the thread is interrupted might seem too strong, but usually it's the right thing to do.)

11.5.2 Performance

Channel I/O is designed around the concept of buffers, which are a more sophisticated form of array, tailored to working with communications. The NIO package supports the concept of direct buffers, buffers that maintain their memory outside the Java virtual machine, in the native host operating system. Since all real I/O operations ultimately have to work with the host OS, by maintaining the buffer space there, some operations can be made much more efficient. Data can be transferred without first copying it into Java and back out.

11.5.3 Mapped and Locked Files

NIO provides two general-purpose file-related features memory-mapped files and file locking. We'll discuss memory-mapped files later, but suffice it to say that they allow you to work with file data as if it were all magically resident in memory. File locking supports the concept of shared and exclusive locks on regions of files useful for concurrent access by multiple applications.

11.5.4 Channels

While java.io deals with streams, java.nio works with channels. A channel is an endpoint for communication. Although in practice channels are similar to streams, the underlying notion of a channel is a bit more abstract and primitive. Whereas streams in java.io are defined in terms of input or output with methods to read and write bytes, the basic channel interface says nothing about how communications happen. It simply defines whether the channel is open or closed, via the methods isOpen() and close(). Implementations of channels for files, network sockets, or arbitrary devices then add their own methods for operations such as reading, writing, or transferring data. The following channels are provided by NIO:

FileChannel
Pipe.SinkChannel, Pipe.SourceChannel
SocketChannel, ServerSocketChannel, DatagramChannel

We'll cover FileChannel in this chapter. The Pipe channels are simply the channel equivalents of the java.io Pipe facilities. We'll talk about Socket and Datagram channels in Chapter 12.

All these basic channels implement the ByteChannel interface, designed for channels that have read and write methods such as I/O streams. ByteChannels read and write ByteBuffers, however, not byte arrays.

In addition to these native channels, you can bridge to channels from java.io I/O streams and readers and writers for interoperability. Know that, if you mix these features, you may not get the full benefits of performance and asynchronous I/O.

11.5.5 Buffers

Most of the utilities of the java.io and java.net packages operate on byte arrays. The corresponding tools of the NIO package are built around ByteBuffers (with another type of buffer, CharBuffer, serving as a bridge to the text world). Byte arrays are simple, so why are buffers necessary? They serve several purposes.

They formalize the usage patterns for buffered data and they provide for things like read-only buffers and keep track of read/write positions and limits within a large buffer space. They also provide a mark/reset facility like that of BufferedInputStream.
They provide additional APIs for working with raw data representing primitive types. You can create buffers that "view" your byte data as a series of larger primitives such as shorts, ints, or floats. The most general type of data buffer, ByteBuffer, includes methods that let you read and write all primitive types like DataOutputStream does for streams.
They abstract the underlying storage of the data, allowing for special optimizations by Java. Specifically, buffers may be allocated as direct buffers that use native buffers of the host operating system instead of arrays in Java's memory. The NIO Channel facilities that work with buffers can recognize direct buffers automatically and try to optimize I/O to use them. For example, a read from a file channel into a Java byte array normally requires Java to copy the data for the read from the host operating system into Java's memory. But with a direct buffer the data can remain outside Java's normal memory space, in the host operating system.

11.5.5.1 Buffer operations

Buffer is a subclass of java.nio.Buffer object. The base Buffer is something like an array with state. The base Buffer class does not specify what type of elements it holds (that is for subtypes to decide), but it does define functionality common to all data buffers. A Buffer has a fixed size called its capacity. Although all the standard Buffers provide "random access" to their contents, a Buffer expects to be read and written sequentially, so Buffers maintain the notion of a position where the next element is read or written. In addition to the position a Buffer can maintain two other pieces of state information: a limit, which is a position that is a "soft" limit to the extent of a read or write, and a mark, which can be used to remember an earlier position for future recall.

Implementations of Buffer add specific, typed get and put methods that read and write the buffer contents. For example, ByteBuffer is a buffer of bytes and it has get() and put() methods that read and write bytes and arrays of bytes (along with many other useful methods we'll discuss later). Getting from and putting to the Buffer changes the position marker, so the Buffer keeps track of its contents somewhat like a stream. Attempting to read or write past the limit marker generates a BufferUnderflowException or BufferOverflowException, respectively.

The mark, position, limit, and capacity values always obey the formula:

mark

position

limit

capacity

The position for reading and writing the Buffer is always greater than the mark, which serves as a lower bound, and the limit, which serves as an upper bound. The capacity represents the physical extent of the buffer space.

You can set the position and limit markers explicitly with the position() and limit() methods. But several convenience methods are provided for the common usage patterns. The reset() method sets the position back to the mark. If no mark has been set, an InvalidMarkException is thrown. The clear() method resets the position to zero and makes the limit the capacity, readying the buffer for new data (the mark is discarded). Note that the clear() method does not actually do anything to the data in the buffer; it simply changes the position markers.

The flip() method is used for the common pattern of writing data into the buffer and then reading it back out. flip makes the current position the limit and then resets the current position to zero (any mark is thrown away). This saves having to keep track of how much data was read. Another method, rewind(), simply resets the position to zero, leaving the limit alone. You might use it to write the same data again. Here is a snippet of code that uses these methods to read data from a channel and writes it to two channels:

ByteBuffer buff = ... while ( inChannel.read( buff ) > 0 ) { // position = ?       buff.flip(  );    // limit = position; position = 0;     outChannel.write( buff );      buff.rewind(  );  // position = 0      outChannel2.write( buff );     buff.clear(  );   // position = 0; limit = capacity }

This might be confusing the first time you look at it because here the read from the Channel is actually a write to the Buffer and vice versa. Because this example writes all the available data up to the limit, either flip() or rewind() have the same effect in this case.

11.5.5.2 Buffer types

As stated earlier, various buffer types add get and put methods for reading and writing specific data types. There is a buffer type for each of the Java primitive types: ByteBuffer , CharBuffer, ShortBuffer, IntBuffer, LongBuffer , FloatBuffer and DoubleBuffer. Each provides get and put methods for reading and writing its type and arrays of its type. Of these, ByteBuffer is the most flexible. Because it has the "finest grain" of all the buffers, it has been given a full complement of get and put methods for reading and writing all the other data types, as well as byte. Here are some ByteBuffer methods:

byte get(  ) char getChar(  ) short getShort(  ) int getInt(  ) long getLong(  ) float getFloat(  ) double getDouble(  )    void put(byte b) void put(ByteBuffer src) void put(byte[] src, int offset, int length)  void put(byte[] src) void putChar(char value) void putShort(short value) void putInt(int value) void putLong(long value) void putFloat(float value) void putDouble(double value)

As we said, all the standard buffers also support random access. So for each of the aforementioned methods of ByteBuffer, there is an additional form that takes an index:

getLong( int index ) putLong( int index, long value )

But that's not all. ByteBuffer can also provide "views" of itself as any of the larger grained types. For example, you can fetch a ShortBuffer view of a ByteBuffer with the asShortBuffer() method. The ShortBuffer view is backed by the ByteBuffer, which means that they work on the same data, and changes to either one affect the other. The view buffer's extent starts at the ByteBuffer's current position, and its capacity is a function of the remaining number of bytes, divided by the new type's size. (For example, shorts and floats consume two bytes each, longs and doubles four.) View buffers are convenient for reading and writing large blocks of a contiguous type within a ByteBuffer.

CharBuffers are interesting as well, primarily because of their integration with Strings. Both CharBuffers and Strings implement the java.lang.CharSequence interface. This is the interface that provides the standard charAt() and length() methods. Because of this, newer APIs (such as the java.util.regex package) allow you to use a CharBuffer or a String interchangeably. In this case, the buffer acts like a String with user-configurable start and end positions.

11.5.5.3 Byte order

Now, since we're talking about reading and writing types larger than a byte here, the question arises: in what order do the bytes of multibyte values (e.g. shorts, ints) get written? There are two camps in this world: "big endian" and "little endian."^[1] Big endian means that the most significant bytes come first; little endian is the reverse. If you're writing binary data for consumption by some native application, this is important. Intel-compatible computers use little endian, and many workstations that run Unix use big endian. The ByteOrder class encapsulates the choice. You can specify the byte order to use with the ByteArray order() method, using the identifiers ByteOrder.BIG_ENDIAN and ByteOrder.LITTLE_ENDIAN like so:

byteArray.order( ByteOrder.BIG_ENDIAN );

You can retrieve the native ordering for your platform using the static ByteOrder.nativeOrder() method.

11.5.5.4 Allocating buffers

You can create a buffer either by allocating it explicitly using allocate() or by wrapping an existing array type. Each buffer type has a static allocate() method that takes a capacity (size) and also a wrap() method that takes an existing array:

CharBuffer cbuf = CharByffer.allocate( 64*1024 );

A direct buffer is allocated in the same way, with the allocateDirect() method:

ByteBuffer bbuf = ByteByffer.allocateDirect( 64*1024 );

As we described earlier, direct buffers can use native host operating-system memory structures that are optimized for use with some kinds of I/O operations. The tradeoff is that allocating a direct buffer is a little slower than a plain buffer, so you should try to use them for longer term buffers. (For example, on a 400-MHz Sparc Ultra 60, it took about 10 milliseconds to allocate a 1-MB direct buffer versus 2 milliseconds for a plain buffer of the same size.)

11.5.6 Character Encoders and Decoders

Character encoders and decoders turn characters into raw bytes and vice versa, mapping from the Unicode standard to particular encoding schemes. Encoders and decoders have always existed in Java for use by Reader and Writer streams and in the methods of the String class that work with byte arrays. However, prior to Java 1.4, there was no API for working with encoding explicitly; you simply referred to encoders and decoders wherever necessary by name as a String. The java.nio.charset package formalizes the idea of a Unicode character set with the Charset class.

The Charset class is a factory for Charset instances, which know how to encode character buffers to byte buffers and decode byte buffers to character buffers. You can look up a character set by name with the static Charset.forName() method and use it in conversions:

Charset charset = Charset.forName("US-ASCII"); CharBuffer charBuff = charset.decode( byteBuff );  // to ascii ByteBuffer byteBuff = charset.encode( charBuff );  // and back

You can also test to see if an encoding is available with the static Charset.isSupported() method.

The following character sets are guaranteed to be supplied:

US-ASCII
ISO-8859-1
UTF-8
UTF-16BE
UTF-16LE
UTF-16

You can list all the encoders available on your platform using the static availableCharsets() method:

Map map = Charset.availableCharsets(  ); Iterator it = map.keySet().iterator(  ); while ( it.hasNext(  ) )      System.out.println( it.next(  ) );

The result of availableCharsets() is a map because character sets may have "aliases" and appear under more than one name.

In addition to the buffer-oriented classes of the java.nio package, the InputStreamReader and OutputStreamWriter bridge classes of the java.io package have been updated to work with Charset as well. You can specify the encoding as a Charset object or by name.

11.5.6.1 CharsetEncoder and CharsetDecoder

You can get more control over the encoding and decoding process by creating an instance of CharsetEncoder or CharsetDecoder (codec) with the Charset newEncoder() and newDecoder() methods. In our earlier example, we assumed that all the data was available in a single buffer. More often, however, we might have to process data as it arrives in chunks. The encoder/decoder API allows for this by providing more general encode() and decode() methods that take a flag specifying whether more data is expected. The codec needs to know this because it might have been left hanging in the middle of a multibyte character conversion when the data ran out. If it knows that more data is coming, it will not throw an error on this incomplete conversion. In the following snippet, we use a decoder to read from a ByteBuffer bbuff and accumulate character data into a CharBuffer cbuff:

CharsetDecoder decoder = Charset.forName("US-ASCII").newDecoder(  );    boolean done = false; while ( !done ) {     bbuff.clear(  );     done = ( in.read( bbuff ) == -1 );     bbuff.flip(  );     decoder.decode( bbuff, cbuff, done ); } cbuff.flip(  ); // use cbuff. . .

Here we look for the end of input condition on the in channel to set the flag done. The encode() and decode() methods also return a special result object, CoderResult, that can determine the progress of encoding. The methods isError() , isUnderflow(), and isOverflow() on the CoderResult specify why encoding stopped: for an error, a lack of bytes on the input buffer, or a full output buffer, respectively.

11.5.7 FileChannel

Now that we've covered the basics of channels and buffers, it's time to look at a real channel type. The FileChannel is the NIO equivalent of the java.io.RandomAccessFile, but it provides several basic new features, in addition to some performance optimizations. You will want to use a FileChannel in place of a plain java.io file stream if you wish to use file locking, memory mapped file access, or perform highly optimized data transfer between files or between file and network channels.

A FileChannel is constructed from a FileInputStream, FileOutputStream, or RandomAccessFile:

FileChannel readOnlyFc = new FileInputStream("file.txt").getChannel(  ); FileChannel readWriteFc =      new RandomAccessFile("file.txt", "rw").getChannel(  );

FileChannels for file input and output streams are read-only or write-only, respectively. To get a read-write FileChannel you must construct a RandomAccessFile with read-write permissions, as in the previous example.

Using a FileChannel is just like a RandomAccessFile, but it works with ByteBuffer instead of byte arrays:

bbuf.clear(  ); readOnlyFc.position( index ); readOnlyFc.read( bbuf ); bbuf.flip(  ); readWriteFc.write( bbuf );

You can control how much data is read and written either by setting buffer position and limit markers or using another form of read/write that takes a buffer starting position and length. You can also read and write to a random position using:

readWriteFc.read( bbuf, index ) readWriteFc.write( bbuf, index2 );

In each case, the actual number of bytes read or written depends on several factors. The operation tries to read or write to the limit of the buffer and the vast majority of the time that is what happens with local file access. But the operation is only guaranteed to block until at least one byte has been processed. Whatever happens, the number of bytes processed is returned, and the buffer position is updated accordingly. This is one of the things that is convenient about buffers; they can manage the count for you. Like standard streams, the channel read() method returns -1 upon reaching the end of input.

The size of the file is always available with the size() method. It can change if you write past the end of the file. Conversely, you can truncate the file to a specified length with the truncate() method.

11.5.7.1 Concurrent access

FileChannels are safe for use by multiple threads and guarantee that data "viewed" by them is consistent across channels in the same VM. However no guarantees are made about how quickly writes are propagated to the storage mechanism. If you need to be sure that data is safe before moving on, you can use the force() method to flush changes to disk. The force() method takes a boolean argument indicating whether or not file metadata, including timestamp and permissions, must be written. Some systems keep track of reads on files as well as writes, so you can save a lot of updates if you set the flag to false, which indicates that you don't care about syncing that data immediately.

As with all Channels, a FileChannel may be closed by any thread. Once closed all its read/write and position-related methods throw a ClosedChannelException.

11.5.7.2 File locking

FileChannels support exclusive and shared locks on regions of files through the lock() method:

FileLock fileLock = fileChannel.lock(  ); int start = 0, len = fileChannel2.size(  ); FileLock readLock = fileChannel2.lock( start, len, true );

Locks may be either shared or exclusive. An exclusive lock prevents others from acquiring a lock of any kind on the specified file or file region. A shared lock allows others to acquire overlapping shared locks but not exclusive locks. These are useful as write locks and read locks respectively. When you are writing, you don't want others to be able to write until you're done, but when reading, you need only to block others from writing, not reading concurrently.

The simple lock() method in the previous example attempts to acquire an exclusive lock for the whole file. The second form accepts a starting and length parameter, as well as a flag indicating whether the lock should be shared (or exclusive). The FileLock object returned by the lock() method can be used to release the lock:

fileLock.release(  );

Note that file locks are a cooperative API; they do not necessarily prevent anyone from reading or writing to the locked file contents. In general the only way to guarantee that locks are obeyed is for both parties to attempt to acquire the lock and use it. Also, shared locks are not implemented on some systems, in which case all requested locks are exclusive. You can test if a lock is shared with the isShared() method.

11.5.7.3 Memory mapped files

One of the most interesting new features offered through FileChannel is the ability to map a file into memory. When a file is memory mapped, like magic it becomes accessible through a single ByteBuffer just as if the entire file was read into memory at once. The implementation of this is extremely efficient, generally among the fastest ways to access the data. In fact, for working with large files, memory mapping can save a lot of resources and time.

This may seem counterintuitive; we're getting a conceptually easier way to access our data and it's also faster and more efficient? What's the catch? There really is no catch. The reason for this is that all modern operating systems are based on the idea of virtual memory. In a nutshell, that means the operating system makes disk space act like memory by continually paging (swapping 4K blocks called "pages") between memory and disk, transparent to the applications. Operating systems are very good at this; they efficiently cache the data the application is using and let go of what is not. So memory mapping a file is really just taking advantage of what the OS is doing internally.

A good example of where a memory-mapped file would be useful is in a database. Imagine a 100-MB file containing records indexed at various positions. By mapping the file we can work with a standard ByteBuffer, reading and writing data at arbitrary positions and let the native operating system read and write the underlying data in fine grained pages, as necessary. We could emulate this behavior with RandomAccessFile or FileChannel, but we would have to explicitly read and write data into buffers first, and the implementation would almost certainly not be as efficient.

A mapping is created with the FileChannel map() method. For example:

FileChannel fc = new RandomAccessFile("index.db", "rw").getChannel(  ); MappedByteBuffer mappedBuff =      fc.map( FileChannel.MapMode.READ_WRITE, 0, fc.size(  ) );

The map() method returns a MappedByteBuffer, which is simply the standard ByteBuffer with a few additional methods relating to the mapping. The most important is force(), which ensures that any data written to the buffer is flushed out to permanent storage on the disk. The READ_ONLY and READ_WRITE constant identifiers of the FileChannel.MapMode static inner class specify the type of access. Read-write access is available only when mapping a read-write file channel. Data read through the buffer is always consistent within the same Java VM. It can also be consistent across applications on the same host machine, but this is not guaranteed.

Again, a MappedByteBuffer acts just like a ByteBuffer. Continuing with the previous example, we could decode the buffer with a character decoder and search for a pattern like so:

CharBuffer cbuff = Charset.forName("US-ASCII").decode( mappedBuff ); Matcher matcher = Pattern.compile("abc*").matcher( cbuff ); while ( matcher.find(  ) )         print( matcher.start(  )+": "+matcher.group(0) );

Here we have effectively implemented the Unix grep command in about five lines of code (thanks to the fact that the Regular Expression API can work with our CharBuffer as a CharSequence). Of course in this example, the CharBuffer allocated by the decode() method is as large as the mapped file and must be held in memory. More generally, we can use the CharsetDecoder shown earlier to iterate through a large mapped space.

11.5.7.4 Direct transfer

The final feature of FileChannel that we'll look at is performance optimization. FileChannel supports two highly optimized data transfer methods: transferFrom() and transferTo(), which move data between the file channel and another channel. These methods can take advantage of direct buffers internally to move data between the channels as fast as possible, often without copying the bytes into Java's memory space at all. The following example is currently the fastest way to implement a file copy in Java:

import java.io.*; import java.nio.*; import java.nio.channels.*;    public class CopyFile {     public static void main( String [] args ) throws Exception     {         String fromFileName = args[0];         String toFileName = args[1];         FileChannel in = new FileInputStream( fromFileName ).getChannel(  );         FileChannel out = new FileOutputStream( toFileName ).getChannel(  );         in.transferTo( 0, (int)in.size(  ), out );         in.close(  );         out.close(  );     } }

11.5.8 Scaleable I/O with NIO

We've laid the groundwork for using the NIO package in this chapter but left out some of the important pieces. In the next chapter, we'll see more of the real motivation for java.nio when we talk about nonblocking and selectable I/O. In addition to the performance optimizations that can be made through direct buffers, these capabilities make possible a design for network servers that uses fewer threads and can scale well to large systems. We'll also look at the other significant Channel types: SocketChannel, ServerSocketChannel, and DatagramChannel.

[1] The terms "big endian" and "little endian" come from Jonathan Swift's novel Gulliver's Travels, where it denoted two camps of Lilliputians: those who eat their eggs from the big end and those who eat them from the little end.

CONTENTS