The Complete Stream Zoo

	Core Java™ 2: Volume I - Fundamentals By Cay S. Horstmann, Gary Cornell
	Table of Contents

	Chapter 12. Streams and Files

Unlike C, which gets by just fine with a single type FILE*, Java has a whole zoo of more than 60 (!) different stream types (see Figures 12-1 and 12-2). Library designers claim that there is a good reason to give users a wide choice of stream types: it is supposed to reduce programming errors. For example, in C, some people think it is a common mistake to send output to a file that was open only for reading. (Well, it is not that common, actually.) Naturally, if you do this, the output is ignored at run time. In Java and C++, the compiler catches that kind of mistake because an InputStream (Java) or istream (C++) has no methods for output.

Figure 12-1. Input and Output stream hierarchy

graphics/12fig01.gif

Figure 12-2. Reader and Writer hierarchy

graphics/12fig02.gif

(We would argue that in C++, and even more so in Java, the main tool that the stream interface designers have against programming errors is intimidation. The sheer complexity of the stream libraries keeps programmers on their toes.)

ANSI C++ gives you more stream types than you want, such as istream, ostream, iostream, ifstream, ofstream, fstream, wistream, wifstream, istrstream, and so on (18 classes in all). But Java really goes overboard with streams and gives you the separate classes for selecting buffering, lookahead, random access, text formatting, or binary data.

Let us divide the animals in the stream class zoo by how they are used. Four abstract classes are at the base of the zoo: InputStream, OutputStream, Reader, and Writer. You do not make objects of these types, but other methods can return them. For example, as you saw in Chapter 10, the URL class has the method openStream that returns an InputStream. You then use this InputStream object to read from the URL. As we mentioned before, the InputStream and OutputStream classes let you read and write only individual bytes and arrays of bytes; they have no methods to read and write strings and numbers. You need more-capable child classes for this. For example, DataInputStream and DataOutputStream let you read and write all the basic Java types.

For Unicode text, on the other hand, as we mentioned before, you use classes that descend from Reader and Writer. The basic methods of the Reader and Writer classes are similar to the ones for InputStream and OutputStream.

 abstract int read() abstract void write(int b)

They work just as the comparable methods do in the InputStream and OutputStream classes except, of course, the read method returns either a Unicode character (as an integer between 0 and 65535) or 1 when you have reached the end of the file.

Finally, there are streams that do useful stuff, for example, the ZipInputStream and ZipOutputStream that let you read and write files in the familiar ZIP compression format.

Layering Stream Filters

FileInputStream and FileOutputStream give you input and output streams attached to a disk file. You give the file name or full path name of the file in the constructor. For example,

 FileInputStream fin = new FileInputStream("employee.dat");

looks in the current directory for a file named "employee.dat".

Since the backslash character is the escape character in Java strings, be sure to use \\ for Windows-style path names ("C:\\Windows\\win.ini"). In Windows, you can also use a single forward slash ("C:/Windows/win.ini") since most Windows file handling system calls will interpret forward slashes as file separators. However, this is not recommended the behavior of the Windows system functions is subject to change, and on other operating systems, the file separator may yet be different. Instead, for portable programs, you should use the correct file separator character. It is stored in the constant string File.separator.

You can also use a File object (see the end of the chapter for more on file objects):

 File f = new File("employee.dat"); FileInputStream fin = new FileInputStream(f);

Like the abstract InputStream and OutputStream classes, these classes only support reading and writing on the byte level. That is, we can only read bytes and byte arrays from the object fin.

 byte b = (byte)fin.read();

Since all the classes in java.io interpret relative path names as starting with the user's current working directory, you may want to know this directory. You can get at this information via a call to System.getProperty("user.dir").

As you will see in the next section, if we just had a DataInputStream, then we could read numeric types:

 DataInputStream din = . . .; double s = din.readDouble();

But just as the FileInputStream has no methods to read numeric types, the DataInputStream has no method to get data from a file.

Java uses a clever mechanism to separate two kinds of responsibilities. Some streams (such as the FileInputStream and the input stream returned by the openStream method of the URL class) can retrieve bytes from files and other more exotic locations. Other streams (such as the DataInputStream and the PrintWriter) can assemble bytes into more useful data types. The Java programmer has to combine the two into what are often called filtered streams by feeding an existing stream to the constructor of another stream. For example, to be able to read numbers from a file, first create a FileInputStream and then pass it to the constructor of a DataInputStream.

 FileInputStream fin = new FileInputStream("employee.dat"); DataInputStream din = new DataInputStream(fin); double s = din.readDouble();

It is important to keep in mind that the data input stream that we created with the above code does not correspond to a new disk file. The newly created stream still accesses the data from the file attached to the file input stream, but the point is that it now has a more capable interface.

If you look at Figure 12-1 again, you can see the classes FilterInputStream and FilterOutputStream. You combine their child classes into a new filtered stream to construct the streams you want. For example, by default, streams are not buffered. That is, every call to read contacts the operating system to ask it to dole out yet another byte. If you want buffering and data input for a file named employee.dat in the current directory, you need to use the following rather monstrous sequence of constructors:

 DataInputStream din = new DataInputStream    (new BufferedInputStream       (new FileInputStream("employee.dat")));

Notice that we put the DataInputStream last in the chain of constructors because we want to use the DataInputStream methods, and we want them to use the buffered read method. Regardless of the ugliness of the above code, it is necessary: you must be prepared to continue layering stream constructors until you have access to the functionality you want.

Sometimes you'll need to keep track of the intermediate streams when chaining them together. For example, when reading input, you often need to peek at the next byte to see if it is the value that you expect. Java provides the PushbackInputStream for this purpose.

 PushbackInputStream pbin = new PushbackInputStream    (new BufferedInputStream       (new FileInputStream("employee.dat")));

Now you can speculatively read the next byte

 int b = pbin.read();

and throw it back if it isn't what you wanted.

 if (b != '<') pbin.unread(b);

But reading and unreading are the only methods that apply to the pushback input stream. If you want to look ahead and also read numbers, then you need both a pushback input stream and a data input stream reference.

 DataInputStream din = new DataInputStream    (pbin = new PushbackInputStream       (new BufferedInputStream       (new FileInputStream("employee.dat"))));

Of course, in the stream libraries of other programming languages, niceties such as buffering and lookahead are automatically taken care of, so it is a bit of a hassle in Java that one has to resort to layering stream filters in these cases. But the ability to mix and match filter classes to construct truly useful sequences of streams does give you an immense amount of flexibility. For example, you can read numbers from a compressed ZIP file by using the following sequence of streams (see Figure 12-3).

 ZipInputStream zin    = new ZipInputStream(new FileInputStream("employee.zip")); DataInputStream din = new DataInputStream(zin);

Figure 12-3. A sequence of filtered stream

graphics/12fig03.gif

(See the section on ZIP file streams later in this chapter for more on Java's ability to handle ZIP files.)

All in all, apart from the rather monstrous constructors that are needed to layer streams, the ability to mix and match streams is a very useful feature of Java!

`java.io.FileInputStream` 1.0

FileInputStream(String name)
creates a new file input stream, using the file whose path name is specified by the name string.
FileInputStream(File f)
creates a new file input stream, using the information encapsulated in the File object. (The File class is described at the end of this chapter.)

`java.io.FileOutputStream` 1.0

FileOutputStream(String name)
creates a new file output stream specified by the name string. Path names that are not absolute are resolved relative to the current working directory. Caution: This method automatically deletes any existing file with the same name.
FileOutputStream(String name, boolean append)
creates a new file output stream specified by the name string. Path names that are not absolute are resolved relative to the current working directory. If the append parameter is true, then data is added at the end of the file. An existing file with the same name will not be deleted.
FileOutputStream(File f)
creates a new file output stream using the information encapsulated in the File object. (The File class is described at the end of this chapter.) Caution: This method automatically deletes any existing file with the same name as the name of f.

`java.io.BufferedInputStream` 1.0

BufferedInputStream(InputStream in)
creates a new buffered stream with a default buffer size. A buffered input stream reads characters from a stream without causing a device access every time. When the buffer is empty, a new block of data is read into the buffer.
BufferedInputStream(InputStream in, int n)
creates a new buffered stream with a user-defined buffer size.

`java.io.BufferedOutputStream` 1.0

BufferedOutputStream(OutputStream out)
creates a new buffered stream with a default buffer size. A buffered output stream collects characters to be written without causing a device access every time. When the buffer fills up, or when the stream is flushed, the data is written.
BufferedOutputStream(OutputStream out, int n)
creates a new buffered stream with a user-defined buffer size.

`java.io.PushbackInputStream` 1.0

PushbackInputStream(InputStream in)
constructs a stream with one-byte lookahead.
PushbackInputStream(InputStream in, int size)
constructs a stream with a pushback buffer of specified size.
void unread(int b)
pushes back a byte, which is retrieved again by the next call to read. You can push back only one character at a time.
Parameters:
b
The byte to be read again

Data Streams

You often need to write the result of a computation or read one back. The data streams support methods for reading back all of the basic Java types. To write a number, character, Boolean value, or string, use one of the following methods of the DataOutput interface:

 writeChars writeByte writeInt writeShort writeLong writeFloat writeDouble writeChar writeBoolean writeUTF

For example, writeInt always writes an integer as a 4-byte binary quantity regardless of the number of digits, and writeDouble always writes a double as an 8-byte binary quantity. The resulting output is not humanly readable but the space needed will be the same for each data type, and reading it back in will be faster. (See the section on the PrintWriter class later in this chapter for how to output numbers as human readable text.)

There are two different methods of storing integers and floating-point numbers in memory, depending on the platform you are using. Suppose, for example, you are working with a 4-byte quantity, like an int or a float. This can be stored in such a way that the first of the 4 bytes in memory holds the most significant byte (MSB) of the value, the so-called big-endian method, or it can hold the least significant byte (LSB) first, which is called, naturally enough, the little-endian method. For example, the SPARC uses big-endian; the Pentium, little-endian. This can lead to problems. For example, when saving a file using C or C++, the data is saved exactly as the processor stores it. That makes it challenging to move even the simplest data files from one platform to another. In Java, all values are written in the big-endian fashion, regardless of the processor. That makes Java data files platform independent.

The writeUTF method writes string data using Unicode Text Format (UTF). UTF format is as follows. A 7-bit ASCII value (that is, a 16-bit Unicode character with the top 9 bits zero) is written as one byte:

 0a₆a₅a₄a₃a₂a₁a₀

A 16-bit Unicode character with the top 5 bits zero is written as a 2-byte sequence:

 110a₁₀a₉a₈a₇a₆    10a₅a₄a₃a₂a₁a₀

(The top zero bits are not stored.)

All other Unicode characters are written as 3-byte sequences:

 1110a₁₅a₁₄a₁₃a₁₂   10a₁₁a₁₀a₉a₈a₇a₆   10a₅a₄a₃a₂a₁a₀

This is a useful format for text consisting mostly of ASCII characters because ASCII characters still take only a single byte. On the other hand, it is not a good format for Asiatic languages, for which you are better off directly writing sequences of double-byte Unicode characters. Use the writeChars method for that purpose.

Note that the top bits of a UTF byte determine the nature of the byte in the encoding scheme.

`0xxxxxxx`	`:`	ASCII
`10xxxxxx`	`:`	Second or third byte
`110xxxxx`	`:`	First byte of 2-byte sequence
`1110xxxx`	`:`	First byte of 3-byte sequence

To read the data back in, use the following methods:

`readInt`	`readDouble`
`readShort`	`readChar`
`readLong`	`readBoolean`
`readFloat`	`readUTF`

The binary data format is compact and platform independent. Except for the UTF strings, it is also suited to random access. The major drawback is that binary files are not readable by humans.

`java.io.DataInput` 1.0

boolean readBoolean()
reads in a Boolean value.
byte readByte()
reads an 8-bit byte.
char readChar()
reads a 16-bit Unicode character.
double readDouble()
reads a 64-bit double.
float readFloat()
reads a 32-bit float.
void readFully(byte[] b)
reads bytes into the array b , blocking until all bytes are read.
Parameters:
b
The buffer into which the data is read
void readFully(byte[] b, int off, int len)
reads bytes into the array b, blocking until all bytes are read.
Parameters:
b
The buffer into which the data is read

off
The start offset of the data

len
The maximum number of bytes read
int readInt()
reads a 32-bit integer.
String readLine()
reads in a line that has been terminated by a \n, \r, \r\n, or EOF. Returns a string containing all bytes in the line converted to Unicode characters.
long readLong()
reads a 64-bit long integer.
short readShort()
reads a 16-bit short integer.
String readUTF()
reads a string of characters in UTF format.
int skipBytes(int n)
skips n bytes, blocking until all bytes are skipped.
Parameters:
n
The number of bytes to be skipped

`java.io.DataOutput` 1.0

void writeBoolean(boolean b)
writes a Boolean value.
void writeByte(byte b)
writes an 8-bit byte.
void writeChar(char c)
writes a 16-bit Unicode character.
void writeChars(String s)
writes all characters in the string.
void writeDouble(double d)
writes a 64-bit double.
void writeFloat(float f)
writes a 32-bit float.
void writeInt(int i)
writes a 32-bit integer.
void writeLong(long l)
writes a 64-bit long integer.
void writeShort(short s)
writes a 16-bit short integer.
void writeUTF(String s)
writes a string of characters in UTF format.

Random-Access File Streams

The RandomAccessFile stream class lets you find or write data anywhere in a file. It implements both the DataInput and DataOutput interfaces. Disk files are random access, but streams of data from a network are not. You open a random-access file either for reading only or for both reading and writing. You specify the option by using the string "r" (for read access) or "rw" (for read/write access) as the second argument in the constructor.

 RandomAccessFile in = new RandomAccessFile("employee.dat", "r"); RandomAccessFile inOut    = new RandomAccessFile("employee.dat", "rw");

When you open an existing file as a RandomAccessFile, it does not get deleted.

A random-access file also has a file pointer setting that comes with it. The file pointer always indicates the position of the next record that will be read or written. The seek method sets the file pointer to an arbitrary byte position within the file. The argument to seek is a long integer between zero and the length of the file in bytes.

The getFilePointer method returns the current position of the file pointer.

To read from a random-access file, you use the same methods such as readInt and readUTF as for DataInputStream objects. That is no accident. These methods are actually defined in the DataInput interface that both DataInputStream and RandomAccessFile implement.

Similarly, to write a random-access file, you use the same writeInt and writeUTF methods as in the DataOutputStream class. These methods are defined in the DataOutput interface that is common to both classes.

The advantage of having the RandomAccessFile class implement both DataInput and DataOutput is that this lets you use or write methods whose argument types are the DataInput and DataOutput interfaces.

 class Employee {  . . .    read(DataInput in) { . . . }    write(DataOutput out) { . . . } }

Note that the read method can handle either a DataInputStream or a RandomAccessFile object because both of these classes implement the DataInput interface. The same is true for the write method.

`java.io.RandomAccessFile` 1.0

RandomAccessFile(String name, String mode)
Parameters:
name
System-dependent file name

mode
"r" for reading only, or "rw" for reading and writing
RandomAccessFile(File file, String mode)
Parameters:
file
A File object encapsulating a system-dependent file name. (The File class is described at the end of this chapter.)

mode
"r" for reading only, or "rw" for reading and writing
long getFilePointer()
returns the current location of the file pointer.
void seek(long pos)
sets the file pointer to pos bytes from the beginning of the file.
long length()
returns the length of the file in bytes.

Text Streams

In the last section, we discussed binary input and output. While binary I/O is fast and efficient, it is not easily readable by humans. In this section, we will focus on text I/O. For example, if the integer 1234 is saved in binary, it is written as the sequence of bytes 00 00 04 D2 (in hexadecimal notation). In text format, it is saved as the string "1234".

Unfortunately, doing this in Java requires a bit of work, because, as you know, Java uses Unicode characters. That is, the character encoding for the string "1234" really is 00 31 00 32 00 33 00 34 (in hex). However, at the present time most environments where your Java programs will run use their own character encoding. This may be a single-byte, double-byte, or variable-byte scheme. For example, under Windows, the string would need to be written in ASCII, as 31 32 33 34, without the extra zero bytes. If the Unicode encoding were written into a text file, then it would be quite unlikely that the resulting file will be humanly readable with the tools of the host environment. To overcome this problem, as we mentioned before, Java now has a set of stream filters that bridges the gap between Unicode-encoded text and the character encoding used by the local operating system. All of these classes descend from the abstract Reader and Writer classes, and the names are reminiscent of the ones used for binary data. For example, the InputStreamReader class turns an input stream that contains bytes in a particular character encoding into a reader that emits Unicode characters. Similarly, the OutputStreamWriter class turns a stream of Unicode characters into a stream of bytes in a particular character encoding.

For example, here is how you make an input reader that reads keystrokes from the console and automatically converts them to Unicode.

 InputStreamReader in = new InputStreamReader(System.in);

This input stream reader assumes the normal character encoding used by the host system. For example, under Windows, it uses the ISO 8859-1 encoding (also known as ISO Latin-1 or, among Windows programmers, as "ANSI code"). You can choose a different encoding by specifying it in the constructor for the InputStreamReader. This takes the form

 InputStreamReader(InputStream, String)

where the string describes the encoding scheme that you want to use. For example,

 InputStreamReader in = new InputStreamReader(new    FileInputStream("kremlin.dat"), "8859_5");

Tables 12-1 andTable 12-2 list the currently supported encoding schemes.

Local encoding schemes cannot represent all Unicode characters. If a character cannot be represented, it is transformed to a ?

Table 12-1. Basic character encodings (in `rt.jar`)
Name	Description
`ASCII`	American Standard Code for Information Exchange
`Cp1252`	Windows Latin-1
`ISO8859_1`	ISO 8859-1, Latin alphabet No. 1
`UnicodeBig`	Sixteen-bit Unicode Transformation Format, big-endian byte order, with byte-order mark
`UnicodeBigUnmarked`	Sixteen-bit Unicode Transformation Format, big-endian byte order
`UnicodeLittle`	Sixteen-bit Unicode Transformation Format, little-endian byte order, with byte-order mark
`UnicodeLittleUnmarked`	Sixteen-bit Unicode Transformation Format, little-endian byte order
`UTF8`	Eight-bit Unicode Transformation Format
`UTF-16`	Sixteen-bit Unicode Transformation Format, byte order specified by a mandatory initial byte-order mark

Table 12-2. Extended Character Encodings (in `i18n.jar`)
Name	Description
`Big5`	Big5, Traditional Chinese
`Cp037`	USA, Canada (Bilingual, French), Netherlands, Portugal, Brazil, Australia
`Cp273`	IBM Austria, Germany
`Cp277`	IBM Denmark, Norway
`Cp278`	IBM Finland, Sweden
`Cp280`	IBM Italy
`Cp284`	IBM Catalan/Spain, Spanish Latin America
`Cp285`	IBM United Kingdom, Ireland
`Cp297`	IBM France
`Cp420`	IBM Arabic
`Cp424`	IBM Hebrew
`Cp437`	MS-DOS United States, Australia, New Zealand, South Africa
`Cp500`	EBCDIC 500V1
`Cp737`	PC Greek
`Cp775`	PC Baltic
`Cp838`	IBM Thailand extended SBCS
`Cp850`	MS-DOS Latin-1
`Cp852`	MS-DOS Latin-2
`Cp855`	IBM Cyrillic
`Cp856`	IBM Hebrew
`Cp857`	IBM Turkish
`Cp858`	Variant of `Cp850` with Euro character
`Cp860`	MS-DOS Portuguese
`Cp861`	MS-DOS Icelandic
`Cp862`	PC Hebrew
`Cp863`	MS-DOS Canadian French
`Cp864`	PC Arabic
`Cp865`	MS-DOS Nordic
`Cp866`	MS-DOS Russian
`Cp868`	MS-DOS Pakistan
`Cp869`	IBM Modern Greek
`Cp870`	IBM Multilingual Latin-2
`Cp871`	IBM Iceland
`Cp874`	IBM Thai
`Cp875`	IBM Greek
`Cp918`	IBM Pakistan (Urdu)
`Cp921`	IBM Latvia, Lithuania (AIX, DOS)
`Cp922`	IBM Estonia (AIX, DOS)
`Cp930`	Japanese Katakana-Kanji mixed with 4370 UDC, superset of 5026
`Cp933`	Korean Mixed with 1880 UDC, superset of 5029
`Cp935`	Simplified Chinese Host mixed with 1880 UDC, superset of 5031
`Cp937`	Traditional Chinese Host mixed with 6204 UDC, superset of 5033
`Cp939`	Japanese Latin Kanji mixed with 4370 UDC, superset of 5035
`Cp942`	IBM OS/2 Japanese, superset of Cp932
`Cp942C`	Variant of Cp942
`Cp943`	IBM OS/2 Japanese, superset of Cp932 and `Shift-JIS`
`Cp943C`	Variant of Cp943
`Cp948`	OS/2 Chinese (Taiwan) superset of 938
`Cp949`	PC Korean
`Cp949C`	Variant of Cp949
`Cp950`	PC Chinese (Hong Kong, Taiwan)
`Cp964`	AIX Chinese (Taiwan)
`Cp970`	AIX Korean
`Cp1006`	IBM AIX Pakistan (Urdu)
`Cp1025`	IBM Multilingual Cyrillic: Bulgaria, Bosnia, Herzegovinia, Macedonia (FYR)
`Cp1026`	IBM Latin-5, Turkey
`Cp1046`	IBM Arabic - Windows
`Cp1097`	IBM Iran (Farsi)/Persian
`Cp1098`	IBM Iran (Farsi)/Persian (PC)
`Cp1112`	IBM Latvia, Lithuania
`Cp1122`	IBM Estonia
`Cp1123`	IBM Ukraine
`Cp1124`	IBM AIX Ukraine
`Cp1140`	Variant of Cp037 with Euro character
`Cp1141`	Variant of Cp273 with Euro character
`Cp1142`	Variant of Cp277 with Euro character
`Cp1143`	Variant of Cp278 with Euro character
`Cp1144`	Variant of Cp280 with Euro character
`Cp1145`	Variant of Cp284 with Euro character
`Cp1146`	Variant of Cp285 with Euro character
`Cp1147`	Variant of Cp297 with Euro character
`Cp1148`	Variant of Cp500 with Euro character
`Cp1149`	Variant of Cp871 with Euro character
`Cp1250`	Windows Eastern European
`Cp1251`	Windows Cyrillic
`Cp1253`	Windows Greek
`Cp1254`	Windows Turkish
`Cp1255`	Windows Hebrew
`Cp1256`	Windows Arabic
`Cp1257`	Windows Baltic
`Cp1258`	Windows Vietnamese
`Cp1381`	IBM OS/2, DOS People's Republic of China (PRC)
`Cp1383`	IBM AIX People's Republic of China (PRC)
`Cp33722`	IBM-eucJP - Japanese (superset of 5050)
`EUC_CN`	GB2312, EUC encoding, Simplified Chinese
`EUC_JP`	JIS X 0201, 0208, 0212, EUC encoding, Japanese
`EUC_KR`	KS C 5601, EUC encoding, Korean
`EUC_TW`	CNS11643 (Plane 1-3), EUC encoding, Traditional Chinese
`GBK`	GBK, Simplified Chinese
`ISO2022CN`	ISO 2022 CN, Chinese (conversion to Unicode only)
`ISO2022CN_CNS`	CNS 11643 in ISO 2022 CN form, Traditional Chinese (conversion from Unicode only)
`ISO2022CN_GB`	GB 2312 in ISO 2022 CN form, Simplified Chinese (conversion from Unicode only)
`ISO2022JP`	JIS X 0201, 0208 in ISO 2022 form, Japanese
`ISO2022KR`	ISO 2022 KR, Korean
`ISO8859_2`	ISO 8859-2, Latin alphabet No. 2
`ISO8859_3`	ISO 8859-3, Latin alphabet No. 3
`ISO8859_4`	ISO 8859-4, Latin alphabet No. 4
`ISO8859_5`	ISO 8859-5, Latin/Cyrillic alphabet
`ISO8859_6`	ISO 8859-6, Latin/Arabic alphabet
`ISO8859_7`	ISO 8859-7, Latin/Greek alphabet
`ISO8859_8`	ISO 8859-8, Latin/Hebrew alphabet
`ISO8859_9`	ISO 8859-9, Latin alphabet No. 5
`ISO8859_13`	ISO 8859-13, Latin alphabet No. 7
`ISO8859_15_FDIS`	ISO 8859-15, Latin alphabet No. 9
`JIS0201`	JIS X 0201, Japanese
`JIS0208`	JIS X 0208, Japanese
`JIS0212`	JIS X 0212, Japanese
`JISAutoDetect`	Detects and converts from Shift-JIS, EUC-JP, ISO 2022 JP (conversion to Unicode only)
`Johab`	Johab, Korean
`KOI8_R`	KOI8-R, Russian
`MS874`	Windows Thai
`MS932`	Windows Japanese
`MS936`	Windows Simplified Chinese
`MS949`	Windows Korean
`MS950`	Windows Traditional Chinese
`MacArabic`	Macintosh Arabic
`MacCentralEurope`	Macintosh Latin-2
`MacCroatian`	Macintosh Croatian
`MacCyrillic`	Macintosh Cyrillic
`MacDingbat`	Macintosh Dingbat
`MacGreek`	Macintosh Greek
`MacHebrew`	Macintosh Hebrew
`MacIceland`	Macintosh Iceland
`MacRoman`	Macintosh Roman
`MacRomania`	Macintosh Romania
`MacSymbol`	Macintosh Symbol
`MacThai`	Macintosh Thai
`MacTurkish`	Macintosh Turkish
`MacUkraine`	Macintosh Ukraine
`SJIS`	Shift-JIS, Japanese
`TIS620`	TIS620, Thai

Because it is so common to want to attach a reader or writer to a file, there is a pair of convenience classes, FileReader and FileWriter, for this purpose. For example, the writer definition

 FileWriter out = new FileWriter("output.txt");

is equivalent to

 OutputStreamWriter out = new OutputStreamWriter(new    FileOutputStream("output.txt"));

Writing Text Output

For text output, you want to use a PrintWriter. A print writer can print strings and numbers in text format. Just as a DataOutputStream has useful output methods but no destination, a PrintWriter must be combined with a destination writer.

 PrintWriter out = new PrintWriter(new    FileWriter("employee.txt"));

You can also combine a print writer with a destination (output) stream.

 PrintWriter out = new PrintWriter(new    FileOutputStream("employee.txt"));

The PrintWriter(OutputStream) constructor automatically adds an OutputStreamWriter to convert Unicode characters to bytes in the stream.

To write to a print writer, you use the same print and println methods that you used with System.out. You can use these methods to print numbers (int, short, long, float, double), characters, Boolean values, strings, and objects.

Java veterans probably wonder whatever happened to the PrintStream class and to System.out. In Java 1.0, the PrintStream class simply truncated all Unicode characters to ASCII characters by dropping the top byte. Conversely, the readLine method of the DataInputStream turned ASCII to Unicode by setting the top byte to 0. Clearly, that was not a clean or portable approach, and it was fixed with the introduction of readers and writers in Java 1.1. For compatibility with existing code, System.in, System.out, and System.err are still streams, not readers and writers. But now the PrintStream class internally converts Unicode characters to the default host encoding in the same way as the PrintWriter. Objects of type PrintStream act exactly like print writers when you use the print and println methods, but unlike print writers, they allow you to send raw bytes to them with the write(int) and write(byte[]) methods.

For example, consider this code:

 String name = "Harry Hacker"; double salary = 75000; out.print(name); out.print(' '); out.println(salary);

This writes the characters

 Harry Hacker 75000

to the stream out. The characters are then converted to bytes and end up in the file employee.txt.

The println method automatically adds the correct end-of-line character for the target system ("\r\n" on Windows, "\n" on UNIX, "\r" on Macs) to the line. This is the string obtained by the call System.getProperty("line.separator").

If the writer is set to autoflush mode, then all characters in the buffer are sent to their destination whenever println is called. (Print writers are always buffered.) By default, autoflushing is not enabled. You can enable or disable autoflushing by using the PrintWriter(Writer, boolean) constructor and passing the appropriate Boolean as the second argument.

 PrintWriter out = new PrintWriter(new    FileWriter("employee.txt"), true); // autoflush

The print methods don't throw exceptions. You can call the checkError method to see if something went wrong with the stream.

You cannot write raw bytes to a PrintWriter. Print writers are designed for text output only.

`java.io.PrintWriter` 1.1

PrintWriter(Writer out)
creates a new PrintWriter, without automatic line flushing.
Parameters:
out
A character-output writer
PrintWriter(Writer out, boolean autoFlush)
creates a new PrintWriter.
Parameters:
out
A character-output writer

autoFlush
If true, the println methods will flush the output buffer
PrintWriter(OutputStream out)
creates a new PrintWriter, without automatic line flushing, from an existing OutputStream by automatically creating the necessary intermediate OutputStreamWriter.
Parameters:
out
An output stream
PrintWriter(OutputStream out, boolean autoFlush)
creates a new PrintWriter from an existing OutputStream but allows you to determine whether the writer autoflushes or not.
Parameters:
out
An output stream

autoFlush
If true, the println methods will flush the output buffer
void print(Object obj)
prints an object by printing the string resulting from toString.
Parameters:
obj
The object to be printed
void print(String s)
prints a Unicode string.
void println(String s)
prints a string followed by a line terminator. Flushes the stream if the stream is in autoflush mode.
void print(char[] s)
prints an array of Unicode characters.
void print(char c)
prints a Unicode character.
void print(int i)
prints an integer in text format.
void print(long l)
prints a long integer in text format.
void print(float f)
prints a floating-point number in text format.
void print(double d)
prints a double-precision floating-point number in text format.
void print(boolean b)
prints a Boolean value in text format.
boolean checkError()
returns true if a formatting or output error occurred. Once the stream has encountered an error, it is tainted and all calls to checkError return true.

Reading Text Input

As you know:

To write data in binary format, you use a DataOutputStream.
To write in text format, you use a PrintWriter.

Therefore, you might expect that there is an analog to the DataInputStream that lets you read data in text format. Unfortunately, Java does not provide such a class. (That is why we wrote our own Console class for use in the beginning chapters.) The only game in town for processing text input is the BufferedReader method it has a method, readLine, that lets you read a line of text. You need to combine a buffered reader with an input source.

 BufferedReader in = new BufferedReader(new    FileReader("employee.txt"));

The readLine method returns null when no more input is available. A typical input loop, therefore, looks like this:

 String line; while ((line = in.readLine()) != null) {    do something with line }

The FileReader class already converts bytes to Unicode characters. For other input sources, you need to use the InputStreamReader unlike the PrintWriter, the InputStreamReader has no automatic convenience method to bridge the gap between bytes and Unicode characters.

 BufferedReader in2 = new BufferedReader(new    InputStreamReader(System.in)); BufferedReader in3 = new BufferedReader(new    InputStreamReader(url.openStream()));

To read numbers from text input, you need to read a string first and then convert it.

 String s = in.readLine(); double x = Double.parseDouble(s);

That works if there is a single number on each line. Otherwise, you must work harder and break up the input string, for example, by using the StringTokenizer utility class. We will see an example of this later in this chapter.

Java has StringReader and StringWriter classes that allow you to treat a string as if it were a data stream. This can be quite convenient if you want to use the same code to parse both strings and data from a stream.

Top

Figure 12-1. Input and Output stream hierarchy

Figure 12-2. Reader and Writer hierarchy

Layering Stream Filters

Figure 12-3. A sequence of filtered stream

`java.io.FileInputStream` 1.0

`java.io.FileOutputStream` 1.0

`java.io.BufferedInputStream` 1.0

`java.io.BufferedOutputStream` 1.0

`java.io.PushbackInputStream` 1.0

Data Streams

`java.io.DataInput` 1.0

`java.io.DataOutput` 1.0

Random-Access File Streams

`java.io.RandomAccessFile` 1.0

Text Streams

Table 12-1. Basic character encodings (in `rt.jar`)

Table 12-2. Extended Character Encodings (in `i18n.jar`)

Writing Text Output

`java.io.PrintWriter` 1.1

Reading Text Input

The Complete Stream Zoo

Figure 12-1. Input and Output stream hierarchy

Figure 12-2. Reader and Writer hierarchy

Layering Stream Filters

Figure 12-3. A sequence of filtered stream

java.io.FileInputStream 1.0

java.io.FileOutputStream 1.0

java.io.BufferedInputStream 1.0

java.io.BufferedOutputStream 1.0

java.io.PushbackInputStream 1.0

Data Streams

java.io.DataInput 1.0

java.io.DataOutput 1.0

Random-Access File Streams

java.io.RandomAccessFile 1.0

Text Streams

Table 12-1. Basic character encodings (in rt.jar)

Table 12-2. Extended Character Encodings (in i18n.jar)

Writing Text Output

java.io.PrintWriter 1.1

Reading Text Input

`java.io.FileInputStream` 1.0

`java.io.FileOutputStream` 1.0

`java.io.BufferedInputStream` 1.0

`java.io.BufferedOutputStream` 1.0

`java.io.PushbackInputStream` 1.0

`java.io.DataInput` 1.0

`java.io.DataOutput` 1.0

`java.io.RandomAccessFile` 1.0

Table 12-1. Basic character encodings (in `rt.jar`)

Table 12-2. Extended Character Encodings (in `i18n.jar`)

`java.io.PrintWriter` 1.1