Integers | Data Streams

The DataOutputStream class has methods for writing all of Java's primitive integer data types: byte, short, int, and long. The DataInputStream class has methods to read these types. It also has methods for reading two integer data types not directly supported by Java or the DataOutputStream class: the unsigned byte and the unsigned int.

8.2.1. Integer Formats

While Java's platform independence guarantees that you don't have to worry about the precise data formats when working exclusively in Java, you frequently need to read data created by a program written in another language. Similarly, it's not unusual to have to write data that will be read by a program written in a different language. For example, most Java network clients talk to servers written in other languages, and most Java network servers talk to clients written in other languages. You cannot naïvely assume that the data format Java uses is a data format other programs will understand; you must take care to understand and recognize the data formats being used.

Although other schemes are possible, almost all modern computers have standardized on binary arithmetic performed on integers composed of an integral number of 8-bit bytes. Furthermore, they've standardized on two's complement arithmetic for signed numbers. In two's complement arithmetic, the most significant bit is 1 for a negative number and 0 for a positive number. The absolute value of a negative number is calculated by taking the complement of the number and adding 1. In Java terms, this means (-n == ~n + 1) is true where n is a negative int.

Regrettably, this is about all that's been standardized. One big difference between computer architectures is the size of an int. Probably the majority of modern computers still use 4-byte integers that can hold a value between -2,147,483,648 and 2,147,483,647. However, some systems are moving to 64-bit architectures where the native integer ranges from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 and takes 8 bytes; and many older and smaller systems use 16-bit integers with a far narrower range (from -32,768 to 32,767). Exactly how many bytes a C compiler uses for each int is platform-dependent, which is one of many reasons C code isn't as portable as one might wish. The sizes of C's short and long are even less predictable and may or may not be the same as the size of a C int. Java always uses a 2-byte short, a 4-byte int, and an 8-byte long, and this is one of the reasons Java code is more portable than C code. However, you must be aware of varying integer widths when your Java code needs to communicate binary numbers with programs written in other languages.

C compilers also allow various unsigned types. For example, an unsigned byte is a binary number between 0 and 255; an unsigned 2-byte integer is a number between 0 and 65,535; an unsigned 4-byte integer is a number between 0 and 4,294,967,295. Java doesn't have any unsigned numeric data types (unless you count char), but the DataInputStream class does provide two methods to read unsigned bytes and unsigned shorts.

Perhaps worst of all, modern computers are split almost down the middle between those that use a big-endian and those that use a little-endian ordering of the bytes in an integer. In a little-endian design, used on X86 architectures, the most significant byte is at the highest address in memory. On the other hand, on a big-endian system, the most significant byte is at the lowest address in memory.

For example, consider the number 1,108,836,360. In hexadecimal, this number is written as 0x42178008. On a big-endian system, the bytes are ordered much as they are in a hex literalthat is, 42, 17, 80, 08. On the other hand, on a little-endian system, this order is reversed: 08, 80, 17, 42. If 1,108,836,360 is written into a file on a little-endian system and then read on a big-endian system without any special treatment, it comes out as 0x08801742, or 142,612,29not the same thing at all.

Java uses big-endian integers exclusively. Data input streams read and data output streams write big-endian integers. Most Internet protocols that rely on binary numbers, such as the time protocol, implicitly assume "network byte order," which is a fancy way of saying "big-endian." And finally, almost all computers manufactured today, except those based on the X86 architecture, use big-endian byte orders, so X86 is really the odd one out. However, X86 is the 1000-pound gorilla of computer architectures, so it's impossible to ignore it or the data formats it supports. Later in this chapter, I'll develop a class for reading little-endian data.

8.2.2. The Char Format

Unicode characters (more specifically, the UTF-16 code points used for Java chars) are two bytes long and are interpreted as an unsigned number between 0 and 65,535. This means they have an "endianness" problem too. The Unicode standard specifically does not require a particular endianness of text written in Unicode; both big- and little-endian encodings are allowed. The Unicode standard does suggest that character 65,279 (0xFEFF in hex) be placed at the beginning of each file of Unicode text. Thus, by reading the first character, you can determine the endianness of the file and take appropriate action. For example, if you're reading a Unicode file containing little-endian data using big-endian methods, the first character will appear as 0xFFFE (65,534), signaling that something is wrong. Java's data stream classes always read and write chars and strings in big-endian order.

8.2.3. Writing Integers

The DataOutputStream class has the usual three write( ) methods you'll find in any output stream class:

public void write(int b) throws IOException
public void write(byte[] data) throws IOException
public void write(byte[] data, int offset, int length)
 throws IOException

These methods behave exactly as they do in the superclass, so I won't discuss them further here.

The DataOutputStream class also declares the following void methods that write signed integer types onto its underlying output stream:

public final void writeByte(int b) throws IOException
public final void writeShort(int s) throws IOException
public final void writeInt(int i) throws IOException
public final void writeLong(long l) throws IOException

Because Java doesn't fully support the byte or short types, the writeByte( ) and writeShort( ) methods each take an int as an argument. The excess bytes in the int are ignored before the byte or short is written. Thus writeByte( ) writes only the low-order byte of its argument. writeShort( ) writes only the low-order two bytes of its argument, higher-order byte firstthat is, big-endian order. The writeInt( ) and writeLong( ) methods write all of the bytes of their arguments in big-endian order. These methods can throw IOExceptions if the underlying stream throws an IOException.

Example 8-1 fills a file called 1000.dat with the integers between 1 and 1000. This filename is used to construct a FileOutputStream. This stream is then chained to a DataOutputStream whose writeInt( ) method writes the data into the file.

Example 8-1. One thousand ints

import java.io.*;
public class File1000 {
 public static void main(String args[]) {
 DataOutputStream dos = null;
 try {
 dos = new DataOutputStream(new FileOutputStream("1000.dat"));
 for (int i = 1; i <= 1000; i++) {
 dos.writeInt(i);
 }
 }
 catch (IOException ex) {System.err.println(ex);}
 finally {
 try { if (dos != null) dos.close( ); }
 catch (IOException ex) { /* Not much else we can do */ }
 }
 }
}

Let me emphasize that the numbers written by this program or by any other data output stream are binary numbers. They are not text strings such as 1, 2, 3, 4, 5, ...999, 1000. If you try to open 1000.dat with a text editor, you'll see a lot of gibberish or an error message. The data this program writes is meant to be read by other programs, not by people.

8.2.4. Reading Integers

DataInputStream has the usual three read( ) methods it inherits from its superclass; these methods read a byte and return an int. They behave exactly as they do in the superclass, so I won't discuss them further:

public int read( ) throws IOException
public int read(byte[] data) throws IOException
public int read(byte[] data, int offset, int length) throws IOException

The DataInputStream class declares the following methods that return signed integer types:

public final byte readByte( ) throws IOException
public final short readShort( ) throws IOException
public final char readChar( ) throws IOException
public final int readInt( ) throws IOException
public final long readLong( ) throws IOException

Each of the integer read( ) methods read the necessary number of bytes and convert them into the appropriate integer type. readByte( ) reads a single byte and returns a signed byte between -128 and 127. readShort( ) reads two bytes and returns a short between -32,768 and 32,767. readInt( ) reads 4 bytes and returns an int between -2,147,483,648 and 2,147,483,647. readLong( ) reads 8 bytes and returns a long between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807. All numbers are read as big-endian.

-1 is a valid return value for these methods. Therefore, if the end of stream is encountered while reading, a java.io.EOFException, which is a subclass of java.io.IOException, is thrown. An EOFException can be thrown while more bytes of data remain in the stream. For example, readInt( ) reads 4 bytes. If only two bytes are left in the stream, those two bytes are read and the EOFException is thrown. However, at this point, those two bytes are lost. You can't go back and reread those two bytes as a short. (If the underlying stream supports marking and resetting, you could mark before each read and reset on an EOFException.)

The DataInputStream class also has two methods that read unsigned bytes and shorts:

public final int readUnsignedByte( ) throws IOException
public final int readUnsignedShort( ) throws IOException

Since Java has no unsigned byte or unsigned short data type, both of these methods return an int. readUnsignedByte( ) returns an int between 0 and 255, and readUnsignedShort( ) returns an int between 0 and 65,535. However, both methods still indicate end of stream with an EOFException rather than by returning -1.

Example 8-2 interprets a file as 4-byte signed integers, reads them, and prints them out. You might use this to read the output of Example 8-1. However, it is not necessarily the case that the program or person who created the file actually intended it to contain 32-bit, two's complement integers. The file contains bytes, and these bytes may be interpreted as ints, with the possible exception of one to three bytes at the end of the file (if the file's length is not an even multiple of 4 bytes). Therefore, it's important to be very careful about what you read.

Example 8-2. The IntReader program

import java.io.*;
public class IntReader {
 public static void main(String[] args) throws IOException {
 DataInputStream din = null;
 try {
 FileInputStream fin = new FileInputStream(args[0]);
 System.out.println("-----------" + args[0] + "-----------");
 din = new DataInputStream(fin);
 while (true) {
 int theNumber = din.readInt( );
 System.out.println(theNumber);
 } // end while
 } // end try
 catch (EOFException ex) {
 // normal termination
 din.close( );
 }
 catch (IOException ex) {
 // abnormal termination
 System.err.println(ex);
 }
 } // end main
} // end IntReader

This program opens the files named on the command line with a file input stream. The file input stream is chained to a data input stream, which reads successive integers until an IOException occurs. IntReader does not print an error message in the event of an EOFException since that now indicates normal termination.

Basic I/O

Introducing I/O

Output Streams

Input Streams

Data Sources

File Streams

Network Streams

Filter Streams

Print Streams

Data Streams

Streams in Memory

Compressing Streams

JAR Archives

Cryptographic Streams

Object Serialization

New I/O

Buffers

Channels

Nonblocking I/O

The File System

Working with Files

File Dialogs and Choosers

Text

Character Sets and Unicode

Readers and Writers

Formatted I/O with java.text

Devices

The Java Communications API

USB

The J2ME Generic Connection Framework

Bluetooth

Character Sets

Character Sets