Unlike C, which gets by just fine with a single type FILE*, Java has a whole zoo of more than 60 (!) different stream types (see Figures 12-1 and 12-2). Library designers claim that there is a good reason to give users a wide choice of stream types: it is supposed to reduce programming errors. For example, in C, some people think it is a common mistake to send output to a file that was open only for reading. (Well, it is not actually that common.) Naturally, if you do this, the output is ignored at run time. In Java and C++, the compiler catches that kind of mistake because an InputStream (Java) or istream (C++) has no methods for output. Figure 12-1. Input and output stream hierarchyFigure 12-2. Reader and writer hierarchy(We would argue that in C++, and even more so in Java, the main tool that the stream interface designers have against programming errors is intimidation. The sheer complexity of the stream libraries keeps programmers on their toes.) C++ NOTE
Let us divide the animals in the stream class zoo by how they are used. Four abstract classes are at the base of the zoo: InputStream, OutputStream, Reader, and Writer. You do not make objects of these types, but other methods can return them. For example, as you saw in Chapter 10, the URL class has the method openStream that returns an InputStream. You then use this InputStream object to read from the URL. As we said, the InputStream and OutputStream classes let you read and write only individual bytes and arrays of bytes; they have no methods to read and write strings and numbers. You need more capable child classes for this. For example, DataInputStream and DataOutputStream let you read and write all the basic Java types. For Unicode text, on the other hand, as we said, you use classes that descend from Reader and Writer. The basic methods of the Reader and Writer classes are similar to the ones for InputStream and OutputStream. abstract int read() abstract void write(int b) They work just as the comparable methods do in the InputStream and OutputStream classes except, of course, the read method returns either a Unicode code unit (as an integer between 0 and 65535) or 1 when you have reached the end of the file. Finally, there are streams that do useful stuff, for example, the ZipInputStream and ZipOutputStream that let you read and write files in the familiar ZIP compression format. Moreover, JDK 5.0 introduces four new interfaces: Closeable, Flushable, Readable, and Appendable (see Figure 12-3). The first two interfaces are very simple, with methods void close() throws IOException Figure 12-3. The Closeable, Flushable, Readable, and Appendable interfacesand void flush() respectively. The classes InputStream, OutputStream, Reader, and Writer all implement the Closeable interface. OutputStream and Writer implement the Flushable interface. The Readable interface has a single method int read(CharBuffer cb) The CharBuffer class has methods for sequential and random read/write access. It represents an in-memory buffer or a memory-mapped file (see page 696). The Appendable interface has two methods, for appending single characters and character sequences: Appendable append(char c) Appendable append(CharSequence s) The CharSequence type is yet another interface, describing minimal properties of a sequence of char values. It is implemented by String, CharBuffer, and StringBuilder/StringBuffer (see page 656). Of the stream zoo classes, only Writer implements Appendable. java.io.Closeable 5.0
java.io.Flushable 5.0
java.lang.Readable 5.0
java.lang.Appendable 5.0
java.lang.CharSequence 1.4
Layering Stream FiltersFileInputStream and FileOutputStream give you input and output streams attached to a disk file. You give the file name or full path name of the file in the constructor. For example, FileInputStream fin = new FileInputStream("employee.dat"); looks in the current directory for a file named "employee.dat". CAUTION
You can also use a File object (see page 685 for more on file objects): File f = new File("employee.dat"); FileInputStream fin = new FileInputStream(f); Like the abstract InputStream and OutputStream classes, these classes support only reading and writing on the byte level. That is, we can only read bytes and byte arrays from the object fin. byte b = (byte) fin.read(); TIP
As you will see in the next section, if we just had a DataInputStream, then we could read numeric types: DataInputStream din = . . .; double s = din.readDouble(); But just as the FileInputStream has no methods to read numeric types, the DataInputStream has no method to get data from a file. Java uses a clever mechanism to separate two kinds of responsibilities. Some streams (such as the FileInputStream and the input stream returned by the openStream method of the URL class) can retrieve bytes from files and other more exotic locations. Other streams (such as the DataInputStream and the PrintWriter) can assemble bytes into more useful data types. The Java programmer has to combine the two into what are often called filtered streams by feeding an existing stream to the constructor of another stream. For example, to be able to read numbers from a file, first create a FileInputStream and then pass it to the constructor of a DataInputStream. FileInputStream fin = new FileInputStream("employee.dat"); DataInputStream din = new DataInputStream(fin); double s = din.readDouble(); It is important to keep in mind that the data input stream that we created with the above code does not correspond to a new disk file. The newly created stream still accesses the data from the file attached to the file input stream, but the point is that it now has a more capable interface. If you look at Figure 12-1 again, you can see the classes FilterInputStream and FilterOutputStream. You combine their subclasses into a new filtered stream to construct the streams you want. For example, by default, streams are not buffered. That is, every call to read contacts the operating system to ask it to dole out yet another byte. If you want buffering and the data input methods for a file named employee.dat in the current directory, you need to use the following rather monstrous sequence of constructors: DataInputStream din = new DataInputStream( new BufferedInputStream( new FileInputStream("employee.dat"))); Notice that we put the DataInputStream last in the chain of constructors because we want to use the DataInputStream methods, and we want them to use the buffered read method. Regardless of the ugliness of the above code, it is necessary: you must be prepared to continue layering stream constructors until you have access to the functionality you want. Sometimes you'll need to keep track of the intermediate streams when chaining them together. For example, when reading input, you often need to peek at the next byte to see if it is the value that you expect. Java provides the PushbackInputStream for this purpose. PushbackInputStream pbin = new PushbackInputStream( new BufferedInputStream( new FileInputStream("employee.dat"))); Now you can speculatively read the next byte int b = pbin.read(); and throw it back if it isn't what you wanted. if (b != '<') pbin.unread(b); But reading and unreading are the only methods that apply to the pushback input stream. If you want to look ahead and also read numbers, then you need both a pushback input stream and a data input stream reference. DataInputStream din = new DataInputStream( pbin = new PushbackInputStream( new BufferedInputStream( new FileInputStream("employee.dat")))); Of course, in the stream libraries of other programming languages, niceties such as buffering and lookahead are automatically taken care of, so it is a bit of a hassle in Java that one has to resort to layering stream filters in these cases. But the ability to mix and match filter classes to construct truly useful sequences of streams does give you an immense amount of flexibility. For example, you can read numbers from a compressed ZIP file by using the following sequence of streams (see Figure 12-4). ZipInputStream zin = new ZipInputStream(new FileInputStream("employee.zip")); DataInputStream din = new DataInputStream(zin); Figure 12-4. A sequence of filtered streams(See the section on ZIP file streams starting on page 643 for more on Java's ability to handle ZIP files.) All in all, apart from the rather monstrous constructors that are needed to layer streams, the ability to mix and match streams is a very useful feature of Java! java.io.FileInputStream 1.0
java.io.FileOutputStream 1.0
java.io.BufferedInputStream 1.0
java.io.BufferedOutputStream 1.0
java.io.PushbackInputStream 1.0
Data StreamsYou often need to write the result of a computation or read one back. The data streams support methods for reading back all the basic Java types. To write a number, character, Boolean value, or string, use one of the following methods of the DataOutput interface: writeChars writeByte writeInt writeShort writeLong writeFloat writeDouble writeChar writeBoolean writeUTF For example, writeInt always writes an integer as a 4-byte binary quantity regardless of the number of digits, and writeDouble always writes a double as an 8-byte binary quantity. The resulting output is not humanly readable, but the space needed will be the same for each value of a given type and reading it back in will be faster. (See the section on the PrintWriter class later in this chapter for how to output numbers as human-readable text.) NOTE
The writeUTF method writes string data by using a modified version of 8-bit Unicode Transformation Format. Instead of simply using the standard UTF-8 encoding (which is shown in Table 12-1), character strings are first represented in UTF-16 (see Table 12-2) and then the result is encoded using the UTF-8 rules. The modified encoding is different for characters with code higher than 0xFFFF. It is used for backwards compatibility with virtual machines that were built when Unicode had not yet grown beyond 16 bits.
Because nobody else uses this modification of UTF-8, you should only use the writeUTF method to write strings that are intended for a Java virtual machine; for example, if you write a program that generates bytecodes. Use the writeChars method for other purposes. NOTE
To read the data back in, use the following methods:
NOTE
java.io.DataInput 1.0
java.io.DataOutput 1.0
Random-Access File StreamsThe RandomAccessFile stream class lets you find or write data anywhere in a file. It implements both the DataInput and DataOutput interfaces. Disk files are random access, but streams of data from a network are not. You open a random-access file either for reading only or for both reading and writing. You specify the option by using the string "r" (for read access) or "rw" (for read/write access) as the second argument in the constructor. RandomAccessFile in = new RandomAccessFile("employee.dat", "r"); RandomAccessFile inOut = new RandomAccessFile("employee.dat", "rw"); When you open an existing file as a RandomAccessFile, it does not get deleted. A random-access file also has a file pointer setting that comes with it. The file pointer always indicates the position of the next record that will be read or written. The seek method sets the file pointer to an arbitrary byte position within the file. The argument to seek is a long integer between zero and the length of the file in bytes. The getFilePointer method returns the current position of the file pointer. To read from a random-access file, you use the same methods such as readInt and readChar as for DataInputStream objects. That is no accident. These methods are actually defined in the DataInput interface that both DataInputStream and RandomAccessFile implement. Similarly, to write a random-access file, you use the same writeInt and writeChar methods as in the DataOutputStream class. These methods are defined in the DataOutput interface that is common to both classes. The advantage of having the RandomAccessFile class implement both DataInput and DataOutput is that this lets you use or write methods whose argument types are the DataInput and DataOutput interfaces. class Employee { . . . read(DataInput in) { . . . } write(DataOutput out) { . . . } } Note that the read method can handle either a DataInputStream or a RandomAccessFile object because both of these classes implement the DataInput interface. The same is true for the write method. java.io.RandomAccessFile 1.0
Text StreamsIn the last section, we discussed binary input and output. While binary I/O is fast and efficient, it is not easily readable by humans. In this section, we will focus on text I/O. For example, if the integer 1234 is saved in binary, it is written as the sequence of bytes 00 00 04 D2 (in hexadecimal notation). In text format, it is saved as the string "1234". Unfortunately, doing this in Java requires a bit of work, because, as you know, Java uses Unicode characters. That is, the character encoding for the string "1234" really is 00 31 00 32 00 33 00 34 (in hex). However, at the present time most environments in which your Java programs will run use their own character encoding. This may be a single-byte, double-byte, or variable-byte scheme. For example, if you use Windows, the string would be written in ASCII, as 31 32 33 34, without the extra zero bytes. If the Unicode encoding were written into a text file, then it would be quite unlikely that the resulting file would be humanly readable with the tools of the host environment. To overcome this problem, Java has a set of stream filters that bridges the gap between Unicode-encoded strings and the character encoding used by the local operating system. All of these classes descend from the abstract Reader and Writer classes, and the names are reminiscent of the ones used for binary data. For example, the InputStreamReader class turns an input stream that contains bytes in a particular character encoding into a reader that emits Unicode characters. Similarly, the OutputStreamWriter class turns a stream of Unicode characters into a stream of bytes in a particular character encoding. For example, here is how you make an input reader that reads keystrokes from the console and automatically converts them to Unicode. InputStreamReader in = new InputStreamReader(System.in); This input stream reader assumes the normal character encoding used by the host system. For example, under Windows, it uses the ISO 8859-1 encoding (also known as ISO Latin-1 or, among Windows programmers, as "ANSI code"). You can choose a different encoding by specifying it in the constructor for the InputStreamReader. This takes the form InputStreamReader(InputStream, String) where the string describes the encoding scheme that you want to use. For example, InputStreamReader in = new InputStreamReader( new FileInputStream("kremlin.dat"), "ISO8859_5"); The next section has more information on character sets. Because it is so common to want to attach a reader or writer to a file, a pair of convenience classes, FileReader and FileWriter, is provided for this purpose. For example, the writer definition FileWriter out = new FileWriter("output.txt"); is equivalent to FileWriter out = new FileWriter(new FileOutputStream("output.txt")); Character SetsIn the past, international character sets have been handled rather unsystematically throughout the Java library. The java.nio package introduced in JDK 1.4 unifies character set conversion with the introduction of the Charset class. (Note that the s is lower case.) A character set maps between sequences of two-byte Unicode code units and byte sequences used in a local character encoding. One of the most popular character encodings is ISO-8859-1, a single-byte encoding of the first 256 Unicode characters. Gaining in importance is ISO-8859-15, which replaces some of the less useful characters of ISO-8859-1 with accented letters used in French and Finnish, and, more important, replaces the "international currency" character - with the Euro symbol () in code point 0xA4. Other examples for character encodings are the variable-byte encodings commonly used for Japanese and Chinese. The Charset class uses the character set names standardized in the IANA Character Set Registry (http://www.iana.org/assignments/character-sets). These names differ slightly from those used in previous versions. For example, the "official" name of ISO-8859-1 is now "ISO-8859-1" and no longer "ISO8859_1", which was the preferred name up to JDK 1.3. For compatibility with other naming conventions, each character set can have a number of aliases. For example, ISO-8859-1 has aliases ISO8859-1 ISO_8859_1 ISO8859_1 ISO_8859-1 ISO_8859-1:1987 8859_1 latin1 l1 csISOLatin1 iso-ir-100 cp819 IBM819 IBM-819 819 Character set names are case insensitive. You obtain a Charset by calling the static forName method with either the official name or one of its aliases: Charset cset = Charset.forName("ISO-8859-1"); The aliases method returns a Set object of the aliases. A Set is a collection that we discuss in Volume 2; here is the code to iterate through the set elements: Set<String> aliases = cset.aliases(); for (String alias : aliases) System.out.println(alias); NOTE
International versions of Java support many more encodings. There is even a mechanism for adding additional character set providers see the JDK documentation for details. To find out which character sets are available in a particular implementation, call the static availableCharsets method. It returns a SortedMap, another collection class. Use this code to find out the names of all available character sets: Set<String, Charset> charsets = Charset.availableCharsets(); for (String name : charsets.keySet()) System.out.println(name); Table 12-3 lists the character encodings that every Java implementation is required to have. Table 12-4 lists the encoding schemes that the JDK installs by default. The character sets in Tables 12-5 and 12-6 are installed only on operating systems that use non-European languages. The encoding schemes in Table 12-6 are supplied for compatibility with previous versions of the JDK.
Local encoding schemes cannot represent all Unicode characters. If a character cannot be represented, it is transformed to a ?. Once you have a character set, you can use it to convert between Unicode strings and encoded byte sequences. Here is how you encode a Unicode string. String str = . . .; ByteBuffer buffer = cset.encode(str); byte[] bytes = buffer.array(); Conversely, to decode a byte sequence, you need a byte buffer. Use the static wrap method of the ByteBuffer array to turn a byte array into a byte buffer. The result of the decode method is a CharBuffer. Call its toString method to get a string. byte[] bytes = . . .; ByteBuffer bbuf = ByteBuffer.wrap(bytes, offset, length); CharBuffer cbuf = cset.decode(bbuf); String str = cbuf.toString(); java.nio.charset.Charset 1.4
java.nio.ByteBuffer 1.4
java.nio.CharBuffer
How to Write Text OutputFor text output, you want to use a PrintWriter. A print writer can print strings and numbers in text format. Just as a DataOutputStream has useful output methods but no destination, a PrintWriter must be combined with a destination writer. PrintWriter out = new PrintWriter(new FileWriter("employee.txt")); You can also combine a print writer with a destination (output) stream. PrintWriter out = new PrintWriter(new FileOutputStream("employee.txt")); The PrintWriter(OutputStream) constructor automatically adds an OutputStreamWriter to convert Unicode characters to bytes in the stream. To write to a print writer, you use the same print and println methods that you used with System.out. You can use these methods to print numbers (int, short, long, float, double), characters, Boolean values, strings, and objects. NOTE
For example, consider this code: String name = "Harry Hacker"; double salary = 75000; out.print(name); out.print(' '); out.println(salary); This writes the characters Harry Hacker 75000 to the stream out. The characters are then converted to bytes and end up in the file employee.txt. The println method automatically adds the correct end-of-line character for the target system ("\r\n" on Windows, "\n" on UNIX, "\r" on Macs) to the line. This is the string obtained by the call System.getProperty("line.separator"). If the writer is set to autoflush mode, then all characters in the buffer are sent to their destination whenever println is called. (Print writers are always buffered.) By default, autoflushing is not enabled. You can enable or disable autoflushing by using the PrintWriter(Writer, boolean) constructor and passing the appropriate Boolean as the second argument. PrintWriter out = new PrintWriter(new FileWriter("employee.txt"), true); // autoflush The print methods don't throw exceptions. You can call the checkError method to see if something went wrong with the stream. NOTE
java.io.PrintWriter 1.1
How to Read Text InputAs you know:
Therefore, you might expect that there is an analog to the DataInputStream that lets you read data in text format. The closest analog is the Scanner class that we have used extensively. However, before JDK 5.0, the only game in town for processing text input was the BufferedReader method it has a method, readLine, that lets you read a line of text. You need to combine a buffered reader with an input source. BufferedReader in = new BufferedReader(new FileReader("employee.txt")); The readLine method returns null when no more input is available. A typical input loop, therefore, looks like this:
The FileReader class already converts bytes to Unicode characters. For other input sources, you need to use the InputStreamReader unlike the PrintWriter, the InputStreamReader has no automatic convenience method to bridge the gap between bytes and Unicode characters. BufferedReader in2 = new BufferedReader(new InputStreamReader(System.in)); BufferedReader in3 = new BufferedReader(new InputStreamReader(url.openStream())); To read numbers from text input, you need to read a string first and then convert it. String s = in.readLine(); double x = Double.parseDouble(s); That works if there is a single number on each line. Otherwise, you must work harder and break up the input string, for example, by using the StringTokenizer utility class. We see an example of this later in this chapter. TIP
|