New IO

	Core Java™ 2: Volume I - Fundamentals By Cay S. Horstmann, Gary Cornell
	Table of Contents

	Chapter 12. Streams and Files

New I/O

SDK 1.4 contains a number of features for improved input/output processing, collectively called the "new I/O," in the java.nio package. (Of course, the "new" moniker is somewhat regrettable because, a few years down the road, the package won't be new any longer.)

The package includes support for the following features:

Memory-mapped files;
File locking;
Character set encoders and decoders;
Non-blocking I/O.

We discuss only the first three features here. Non-blocking I/O requires the use of threads, which are covered in Volume 2.

Memory-Mapped Files

Most operating systems can take advantage of the virtual memory implementation to "map" a file, or a region of a file, into memory. Then the file can be accessed as if it was an in-memory array, which is much faster than the traditional file operations.

At the end of this section, you will find two short programs that compute the CRC32 checksum of a file. One version reads the file data with the read method, the other uses a memory-mapped file. The second version is dramatically faster. On one machine, we got the following timing data when computing the checksum of the 22MB file rt.jar in the jre/lib directory of the SDK.

Traditional File Input : 350 seconds
Memory-Mapped File: 8 seconds

Of course, the exact values will differ greatly from one machine to another, but it is obvious that the performance gain can be substantial.

The java.nio package makes memory mapping quite simple. Here is what you need to do.

First, get a channel from the file. A channel is a new abstraction for disk files that lets you access operating system features such as memory mapping, file locking, and fast data transfers between files. You get a channel by calling the getChannel method that has been added to the FileInputStream, FileOutputStream, and RandomAccessFile class.

 FileInputStream in = new FileInputStream(. . .); FileChannel channel = in.getChannel();

Then you get a MappedByteBuffer from the channel by calling the map method of the FileChannel class. You specify the area of the file that you want to map and a mapping mode. Three modes are supported:

FileChannel.MapMode.READ_ONLY: The resulting buffer is read-only. Any attempt to write to the buffer results in a ReadOnlyBufferException.
FileChannel.MapMode.READ_WRITE: The resulting buffer is writable, and the changes will be written back to the file at some point in time. Note that other programs that have mapped the same file may not see those changes immediately. The exact behavior of simultaneous file mapping by multiple programs is operating-system dependent.
FileChannel.MapMode.PRIVATE: The resulting buffer is writable, but any changes are private to this buffer and are not propagated to the file.

Once you have the buffer, you can read and write data, using the methods of the ByteBuffer class and the Buffer superclass.

Buffers support both relative and absolute data access. A buffer has a position that is advanced by get and put operations. For example, you can traverse all bytes in the buffer as

 while (buffer.hasRemaining()) {    byte b = buffer.get();    . . . }

Alternatively, you can use absolute addressing:

 for (int i = 0; i < buffer.limit(); i++) {    byte b = buffer.get(i);    . . . }

You can also read and write arrays of bytes with the methods

 get(byte[] bytes) get(byte[], int offset, int length)

Finally, there are methods

 getInt getLong getShort getChar getFloat getDouble

to read primitive type values that are stored as binary values in the file. For example, the integer 1234 would be stored as a sequence of four bytes

 00 00 04 D2

(in hexadecimal), since 1234 = 4 x 256 + 13 x 16 + 2. Actually, there are two ways of storing 1234 as a binary value, depending on the byte ordering. The byte ordering given above is called "big-endian" since the more significant bytes come first. Another ordering, called "little-endian," starts with the least significant byte:

 D2 04 00 00

By default, Java uses big-endian ordering, probably because that is the ordering used in Sun SPARCS processors. However, the Intel processors use little-endian byte ordering. Therefore, file formats with a PC pedigree often store binary numbers in little-endian order. If you need to process such a file, simply call

 buffer.order(ByteOrder.LITTLE_ENDIAN);

To find out the current byte order of a buffer, call

 ByteOrder b = buffer.order()

This pair of methods does not use the set/get naming convention.

To write numbers to a buffer, use one of the methods

 putInt putLong putShort putChar putFloat putDouble

Example 12-8 and 12-9 are two programs that compute the 32-bit cyclic redundancy checksum (CRC32) of a file. That quantity is a checksum that is often used to determine whether a file has been corrupted. Corruption of a file makes it very likely that the checksum has changed. The java.util.zip package contains a class CRC32 that computes the checksum of a sequence of bytes, using the following loop:

 CRC32 crc = new CRC32(); while (more bytes)    crc.update(next byte) long checksum = crc.getValue();

For a nice explanation of the CRC algorithm, see http://-www.relisoft.com/Science/CrcMath.html.

The details of the CRC computation are not important. We just use it as an example of a useful file operation.

The first program uses traditional file input, and the second uses memory mapping. As we already said, the second one is much faster. Try it out. Run both programs with the same file, as

 java CRC filename

and

 java NIOCRC filename

Example 12-8 CRC.java

  1. import java.io.*;  2. import java.util.zip.*;  3.  4. /**  5.    This program computes the CRC checksum of a file, using  6.    an input stream.  7.    Usage: java CRC filename  8. */  9. public class CRC 10. { 11.    public static void main(String[] args) throws IOException 12.    { 13.       InputStream in = new FileInputStream(args[0]); 14.       CRC32 crc = new CRC32(); 15.       int c; 16.       long start = System.currentTimeMillis(); 17.       while((c = in.read()) != -1) 18.          crc.update(c); 19.       long end = System.currentTimeMillis(); 20.       System.out.println(Long.toHexString(crc.getValue())); 21.       System.out.println((end - start) + " milliseconds"); 22.    } 23. 24. }

Example 12-9 NIOCRC.java

  1. import java.io.*;  2. import java.nio.*;  3. import java.nio.channels.*;  4. import java.util.zip.*;  5.  6. /**  7.    This program computes the CRC checksum of a file, using  8.    a memory-mapped file.  9.    Usage: java CRC filename 10. */ 11. public class NIOCRC 12. { 13.    public static void main(String[] args) throws Exception 14.    { 15.       FileInputStream in = new FileInputStream(args[0]); 16.       FileChannel channel = in.getChannel(); 17. 18.       CRC32 crc = new CRC32(); 19.       long start = System.currentTimeMillis(); 20. 21.       MappedByteBuffer buffer = channel.map( 22.          FileChannel.MapMode.READ_ONLY, 0, (int)channel.size()); 23.       while (buffer.hasRemaining()) 24.          crc.update(buffer.get()); 25. 26.       long end = System.currentTimeMillis(); 27.       System.out.println(Long.toHexString(crc.getValue())); 28.       System.out.println((end - start) + " milliseconds"); 29.    } 30. }

`java.io.FileInputStream` 1.0

FileChannel getChannel() 1.4
returns a channel for accessing this stream.

`java.io.FileOutputStream` 1.0

FileChannel getChannel() 1.4
returns a channel for accessing this stream.

`java.io.RandomAccessFile` 1.0

FileChannel getChannel() 1.4
returns a channel for accessing this file.

`java.nio.channels.FileChannel` 1.4

MappedByteBuffer map(FileChannel.MapMode mode, long position, long size)
maps a region of the file to memory.
Parameters:
mode
One of the constants READ_ONLY, READ_WRITE, or PRIVATE in the FileChannel.MapMode class

position
The start of the mapped region

size
The size of the mapped region

`java.nio.Buffer` 1.4

boolean hasRemaining()
returns true if the current buffer position has not yet reached the buffer's limit position.
int limit()
returns the limit position of the buffer, that is, the first position at which no more values are available.

`java.nio.ByteBuffer` 1.4

byte get()
gets a byte from the current position and advances the current position to the next byte.
byte get(int index)
gets a byte from the specified index.
ByteBuffer put(byte b)
puts a byte to the current position and advances the current position to the next byte. Returns a reference to this buffer.
ByteBuffer put(int index, byte b)
puts a byte at the specified index. Returns a reference to this buffer.
ByteBuffer get(byte[] destination)
ByteBuffer get(byte[] destination, int offset, int length)
fill a byte array, or a region of a byte array, with bytes from the buffer, and advance the current position by the number of bytes read. If there are not enough bytes remaining in the buffer, then no bytes are read, and a BufferUnderflowException is thrown. Return a reference to this buffer.
Parameters:
destination
The byte array to be filled

offset
The offset of the region to be filled

length
The length of the region to be filled
ByteBuffer put(byte[] source)
ByteBuffer put(byte[] source, int offset, int length)
put all bytes from a byte array, or the bytes from a region of a byte array, into buffer, and advances the current position by the number of bytes read. If there are not enough bytes remaining in the buffer, then no bytes are written, and a BufferOverflowException is thrown. Returns a reference to this buffer.
Parameters:
source
The byte array to be written

offset
The offset of the region to be written

length
The length of the region to be written
Xxx getXxx()
Xxx getXxx(int index)
ByteBuffer putXxx(xxx value)
ByteBuffer putXxx(int index, xxx value)
These methods are used for relative and absolute reading and writing of binary numbers. Xxx is one of Int, Long, Short, Char, Float, or Double.
ByteBuffer order(ByteOrder order)
ByteOrder order()
set or get the byte order. The value for order is one of the constants BIG_ENDIAN or LITTLE_ENDIAN of the ByteOrder class.

File Locking

Consider a situation in which multiple simultaneously executing programs need to modify the same file. Clearly, the programs need to communicate in some way, or the file can easily become damaged.

File locks can be used to control access to a file or a range of bytes within a file. However, file locking varies greatly among operating systems, which explains why file locking capabilities were absent from prior versions of the SDK.

Frankly, file locking is not all that common in application programs. Many applications use a database for data storage, and the database has mechanisms for resolving concurrent access problems. If you store information in flat files and are worried about concurrent access, you may well find it simpler to start using a database rather than designing complex file locking schemes.

Still, there are situations where file locking is essential. Suppose your application saves a configuration file with user preferences. If a user invokes two instances of the application, it could happen that both of them want to write the configuration file at the same time. In that situation, the first instance should lock the file. When the second instance finds the file locked, it can decide to wait until the file is unlocked, or simply skip the writing process.

To lock a file, call either the lock or tryLock method of the FileChannel class:

 FileLock lock = channel.lock();

 FileLock lock = channel.tryLock();

The first call blocks until the lock becomes available. The second call returns immediately, either with the lock, or null if the lock is not available. The file remains locked until the channel is closed or the release method is invoked on the lock.

You can also lock a portion of the file with the call

 FileLock lock(long start, long size, boolean exclusive)

 FileLock tryLock(long start, long size, boolean exclusive)

The exclusive flag is true to lock the file for both reading and writing. It is false for a shared lock, which allows multiple processes to read from the file, while preventing any process from acquiring an exclusive lock. Not all operating systems support shared locks. You may get an exclusive lock even if you just asked for a shared one. Call the isShared method of the FileLock class to find out which kind you have.

If you lock the tail portion of a file, and the file subsequently grows beyond the locked portion, the additional area is not locked. To lock all bytes, use a size of Long.MAX_VALUE.

Keep in mind that file locking is system-dependent. Here are some points to watch for:

On some systems, file locking is merely advisory. If an application fails to get a lock, it may still write to a file that another application has currently locked.
On some systems, you cannot simultaneously lock a file and map it into memory.
File locks are held by the entire Java virtual machine. If two programs are launched by the same virtual machine (such as an applet or application launcher), then they can't each acquire a lock on the same file. The lock and tryLock methods will throw an OverlappingFileLockException if the virtual machine already holds another overlapping lock on the same file.
On some systems, closing a channel releases all locks on the underlying file held by the Java virtual machine. You should therefore avoid multiple channels on the same locked file.
Locking files on a networked file system is highly system-dependent and should probably be avoided.

`java.nio.channels.FileChannel` 1.4

FileLock lock()
acquires an exclusive lock on the entire file. This method blocks until the lock is acquired.
FileLock tryLock()
acquires an exclusive lock on the entire file, or returns null if the lock cannot be acquired.
FileLock lock(long position, long size, boolean shared)
FileLock tryLock(long position, long size, boolean shared)
These methods acquire a lock on a region of the file. The first method blocks until the lock is acquired, and the second method returns null if the lock cannot be acquired.
Parameters:
position
The start of the region to be locked

size
The size of the region to be locked

shared
true for a shared lock, false for an exclusive lock

`java.nio.channels.FileLock` 1.4

void release()
releases this lock.

Character Sets

In the past, international character sets have been handled rather unsystematically throughout the Java library. The java.nio package unifies character set conversion with the introduction of the Charset class. (Note that the s is lowercase.)

A character set maps between sequences of 16-bit Unicode characters and byte sequences used in a local character encoding. One of the most popular character encodings is ISO-8859-1, an 8-bit encoding of the first 256 Unicode characters. Gaining in importance is ISO-8859-15, which replaces some of the less useful characters of ISO-8859-1 with accented letters used in French and Finnish, and, more importantly, replaces the "international currency" character with the Euro symbol () in code point 0xA4. Other examples for character encodings are the variable-byte encodings commonly used for Japanese and Chinese.

The SDK now uses the character set names standardized in the IANA Character Set Registry (http://www.iana.org/assignments/character-sets). These names differ slightly from those used in previous versions. For example, the "official" name of ISO-8859-1 is now "ISO-8859-1" and no longer "ISO8859_1", which was the preferred name in SDK 1.3. For compatibility with other naming conventions, each character set can have a number of aliases. For example, ISO-8859-1 has aliases

 ISO8859-1 ISO_8859_1 ISO8859_1 ISO_8859-1 ISO_8859-1:1987 8859_1 latin1 l1 csISOLatin1 iso-ir-100 cp819 IBM819 IBM-819 819

Character set names are case-insensitive.

You obtain a CharSet by calling the static forName method with either the official name or one of its aliases:

 Charset cset = Charset.forName("ISO-8859-1");

The aliases method returns a Set object of the aliases. A Set is a collection that we will discuss in Volume 2; here is the code to iterate through the set elements:

 Set aliases = cset.aliases(); Iterator iter = aliases.iterator(); while (iter.hasNext()) {    String alias = (String)iter.next();    . . . }

All virtual machines must support the character encodings given in Table 12-3.

Table 12-3. Required character encodings
Standard Name	Description
`US-ASCII`	7-bit US ASCII code
`ISO-8859-1`	8-bit ISO Latin 1 alphabet
`UTF-8`	8-bit Unicode Transformation Format
`UTF-16BE`	16-bit Unicode Transformation Format, big-endian byte order
`UTF-16LE`	16-bit Unicode Transformation Format, little-endian byte order
`UTF-16`	16-bit Unicode Transformation Format, Byte order is big-endian by default, but can be set explicitly by a byte order mark (`'\uFFFE'`).

An excellent reference for the "ISO 8859 alphabet soup" is http://-czyborra.com/charsets/iso8859.html. See RFC 2279 (http://ietf.org/rfc/rfc2279.txt) and RFC 2781 (http://ietf.org/rfc/rfc2781.txt) for definitions of UTF-8 and UTF-16.

International versions of Java support many more encodings. There is even a mechanism for adding additional character set providers see the SDK documentation for details. To find out which character sets are available in a particular implementation, call the static availableCharsets method. It returns a SortedMap, another collection class. Use this code to find out the names of all available character sets:

 Set names = Charset.availableCharsets().keySet(); Iterator iter = names.iterator(); while (iter.hasNext()) {    String name = (String)iter.next();    . . . }

Once you have a character set, you can use it to convert between Unicode strings and encoded byte sequences. Here is how you encode a Unicode string.

 String str = . . .; ByteBuffer buffer = cset.encode(str); byte[] bytes = buffer.array();

Conversely, to decode a byte sequence, you need a byte buffer. Use the static wrap method of the ByteBuffer array to turn a byte array into a byte buffer. The result is a CharBuffer. Call its toString method to get a string.

 byte[] bytes = . . .; ByteBuffer bbuf = ByteBuffer.wrap(bytes, offset, length); CharBuffer cbuf = cset.decode(bbuf); String str = cbuf.toString();

`java.nio.charset.Charset` 1.4

static SortedMap availableCharsets()
gets all available character sets for this virtual machine. Returns a map whose keys are character set names and whose values are character sets.
static Charset forName(String name)
gets a character set for the given name.
Set aliases()
returns the set of alias names for this character set.
ByteBuffer encode(String str)
encodes the given string into a sequence of bytes.
CharBuffer decode(ByteBuffer buffer)
decodes the given character sequence. Unrecognized inputs are converted to the Unicode "replacement character" ('\uFFFD').

`java.nio.ByteBuffer` 1.4

byte[] array()
returns the array of bytes that this buffer manages.
static ByteBuffer wrap(byte[] bytes)
static ByteBuffer wrap(byte[] bytes, int offset, int length)
return a byte buffer that manages the given array of bytes or the given range.

`java.nio.CharBuffer`

char[] array()
returns the array of characters that this buffer manages.

Top

New I/O

Memory-Mapped Files

Example 12-8 CRC.java

Example 12-9 NIOCRC.java

java.io.FileInputStream 1.0

java.io.FileOutputStream 1.0

java.io.RandomAccessFile 1.0

java.nio.channels.FileChannel 1.4

java.nio.Buffer 1.4

java.nio.ByteBuffer 1.4

File Locking

java.nio.channels.FileChannel 1.4

java.nio.channels.FileLock 1.4

Character Sets

Table 12-3. Required character encodings

java.nio.charset.Charset 1.4

java.nio.ByteBuffer 1.4

java.nio.CharBuffer

`java.io.FileInputStream` 1.0

`java.io.FileOutputStream` 1.0

`java.io.RandomAccessFile` 1.0

`java.nio.channels.FileChannel` 1.4

`java.nio.Buffer` 1.4

`java.nio.ByteBuffer` 1.4

`java.nio.channels.FileChannel` 1.4

`java.nio.channels.FileLock` 1.4

`java.nio.charset.Charset` 1.4

`java.nio.ByteBuffer` 1.4

`java.nio.CharBuffer`