4.4 Readers and Writers

     

Many programmers have a bad habit of writing code as if all text were ASCII or at least in the native encoding of the platform. While some older, simpler network protocols, such as daytime, quote of the day, and chargen, do specify ASCII encoding for text, this is not true of HTTP and many other more modern protocols, which allow a wide variety of localized encodings, such as K0I8-R Cyrillic, Big-5 Chinese, and ISO 8859-2 for most Central European languages. Java's native character set is the UTF-16 encoding of Unicode. When the encoding is no longer ASCII, the assumption that bytes and chars are essentially the same things also breaks down. Consequently, Java provides an almost complete mirror of the input and output stream class hierarchy designed for working with characters instead of bytes.

In this mirror image hierarchy, two abstract superclasses define the basic API for reading and writing characters. The java.io.Reader class specifies the API by which characters are read. The java.io.Writer class specifies the API by which characters are written. Wherever input and output streams use bytes, readers and writers use Unicode characters. Concrete subclasses of Reader and Writer allow particular sources to be read and targets to be written. Filter readers and writers can be attached to other readers and writers to provide additional services or interfaces.

The most important concrete subclasses of Reader and Writer are the InputStreamReader and the OutputStreamWriter classes. An InputStreamReader contains an underlying input stream from which it reads raw bytes. It translates these bytes into Unicode characters according to a specified encoding. An OutputStreamWriter receives Unicode characters from a running program. It then translates those characters into bytes using a specified encoding and writes the bytes onto an underlying output stream.

In addition to these two classes, the java.io package provides several raw reader and writer classes that read characters without directly requiring an underlying input stream, including:

  • FileReader

  • FileWriter

  • StringReader

  • StringWriter

  • CharArrayReader

  • CharArrayWriter

The first two classes in this list work with files and the last four work inside Java, so they aren't of great use for network programming. However, aside from different constructors, these classes have pretty much the same public interface as all other reader and writer classes.

4.4.1 Writers

The Writer class mirrors the java.io.OutputStream class. It's abstract and has two protected constructors. Like OutputStream , the Writer class is never used directly; instead, it is used polymorphically, through one of its subclasses. It has five write() methods as well as a flush( ) and a close( ) method:

 protected Writer( ) protected Writer(Object lock) public abstract void write(char[] text, int offset, int length) throws IOException public void write(int c) throws IOException public void write(char[] text) throws IOException public void write(String s) throws IOException public void write(String s, int offset, int length) throws IOException public abstract void flush( ) throws IOException public abstract void close( ) throws IOException 

The write(char[] text , int offset , int length) method is the base method in terms of which the other four write( ) methods are implemented. A subclass must override at least this method as well as flush( ) and close() , although most override some of the other write( ) methods as well in order to provide more efficient implementations . For example, given a Writer object w , you can write the string "Network" like this:

 char[] network = {'N', 'e', 't', 'w', 'o', 'r', 'k'}; w.write(network, 0, network.length); 

The same task can be accomplished with these other methods, as well:

 w.write(network); for (int i = 0; i < network.length; i++) w.write(network[i]); w.write("Network"); w.write("Network", 0, 7); 

All of these examples are different ways of expressing the same thing. Which you use in any given situation is mostly a matter of convenience and taste. However, how many and which bytes are written by these lines depends on the encoding w uses. If it's using big-endian UTF-16, it will write these 14 bytes (shown here in hexadecimal) in this order:

 00 4E 00 65 00 74 00 77 00 6F 00 72 00 6B 

On the other hand, if w uses little-endian UTF-16, this sequence of 14 bytes is written:

 4E 00 65 00 74 00 77 00 6F 00 72 00 6B 00 

If w uses Latin-1, UTF-8, or MacRoman, this sequence of seven bytes is written:

 4E 65 74 77 6F 72 6B 

Other encodings may write still different sequences of bytes. The exact output depends on the encoding.

Writers may be buffered, either directly by being chained to a BufferedWriter or indirectly because their underlying output stream is buffered. To force a write to be committed to the output medium, invoke the flush() method:

 w.flush( ); 

The close( ) method behaves similarly to the close( ) method of OutputStream . close( ) flushes the writer, then closes the underlying output stream and releases any resources associated with it:

 public abstract void close( ) throws IOException 

After a writer has been closed, further writes throw IOException s.

4.4.2 OutputStreamWriter

OutputStreamWriter is the most important concrete subclass of Writer . An OutputStreamWriter receives characters from a Java program. It converts these into bytes according to a specified encoding and writes them onto an underlying output stream. Its constructor specifies the output stream to write to and the encoding to use:

 public OutputStreamWriter(OutputStream out, String encoding) throws UnsupportedEncodingException public OutputStreamWriter(OutputStream out) 

Valid encodings are listed in the documentation for Sun's native2ascii tool included with the JDK and available from http://java.sun.com/j2se/1.4.2/docs/guide/intl/encoding.doc.html. If no encoding is specified, the default encoding for the platform is used. (In the United States, the default encoding is ISO Latin-1 on Solaris and Windows, MacRoman on the Mac.) For example, this code fragment writes the string

figs/jnp2_greek10.gif in the Cp1253 Windows Greek encoding:

 OutputStreamWriter w = new OutputStreamWriter( new FileOutputStream("OdysseyB.txt"), "Cp1253"); w.write(" 
figs/jnp2_greek10.gif
 "); 

Other than the constructors, OutputStreamWriter has only the usual Writer methods (which are used exactly as they are for any Writer class) and one method to return the encoding of the object:

 public String getEncoding( ) 

4.4.3 Readers

The Reader class mirrors the java.io.InputStream class. It's abstract with two protected constructors. Like InputStream and Writer , the Reader class is never used directly, only through one of its subclasses. It has three read() methods, as well as skip( ) , close( ) , ready( ) , mark( ) , reset( ) , and markSupported( ) methods:

 protected Reader( ) protected Reader(Object lock) public abstract int read(char[] text, int offset, int length) throws IOException public int read( ) throws IOException public int read(char[] text) throws IOException public long skip(long n) throws IOException public boolean ready( ) public boolean markSupported( ) public void mark(int readAheadLimit) throws IOException public void reset( ) throws IOException public abstract void close( ) throws IOException 

The read(char[] text , int offset , int length) method is the fundamental method through which the other two read( ) methods are implemented. A subclass must override at least this method as well as close( ) , although most will override some of the other read( ) methods as well in order to provide more efficient implementations.

Most of these methods are easily understood by analogy with their InputStream counterparts. The read() method returns a single Unicode character as an int with a value from 0 to 65,535 or -1 on end of stream. The read(char[] text) method tries to fill the array text with characters and returns the actual number of characters read or -1 on end of stream. The read(char[] text , int offset , int length) method attempts to read length characters into the subarray of text beginning at offset and continuing for length characters. It also returns the actual number of characters read or -1 on end of stream. The skip(long n) method skips n characters. The mark( ) and reset( ) methods allow some readers to reset back to a marked position in the character sequence. The markSupported( ) method tells you whether the reader supports marking and resetting. The close( ) method closes the reader and any underlying input stream so that further attempts to read from it throw IOException s.

The exception to the rule of similarity is ready() , which has the same general purpose as available( ) but not quite the same semantics, even modulo the byte-to-char conversion. Whereas available( ) returns an int specifying a minimum number of bytes that may be read without blocking, ready( ) only returns a boolean indicating whether the reader may be read without blocking. The problem is that some character encodings, such as UTF-8, use different numbers of bytes for different characters. Thus, it's hard to tell how many characters are waiting in the network or filesystem buffer without actually reading them out of the buffer.

InputStreamReader is the most important concrete subclass of Reader . An InputStreamReader reads bytes from an underlying input stream such as a FileInputStream or TelnetInputStream . It converts these into characters according to a specified encoding and returns them. The constructor specifies the input stream to read from and the encoding to use:

 public InputStreamReader(InputStream in) public InputStreamReader(InputStream in, String encoding) throws UnsupportedEncodingException 

If no encoding is specified, the default encoding for the platform is used. If an unknown encoding is specified, then an UnsupportedEncodingException is thrown.

For example, this method reads an input stream and converts it all to one Unicode string using the MacCyrillic encoding:

 public static String getMacCyrillicString(InputStream in) throws IOException { InputStreamReader r = new InputStreamReader(in, "MacCyrillic"); StringBuffer sb = new StringBuffer( ); int c; while ((c = r.read( )) != -1) sb.append((char) c); r.close( ); return sb.toString( ); } 

4.4.4 Filter Readers and Writers

The InputStreamReader and OutputStreamWriter classes act as decorators on top of input and output streams that change the interface from a byte-oriented interface to a character-oriented interface. Once this is done, additional character-oriented filters can be layered on top of the reader or writer using the java.io.FilterReader and java.io.FilterWriter classes. As with filter streams, there are a variety of subclasses that perform specific filtering, including:

  • BufferedReader

  • BufferedWriter

  • LineNumberReader

  • PushbackReader

  • PrintWriter

4.4.4.1 Buffered readers and writers

The BufferedReader and BufferedWriter classes are the character-based equivalents of the byte-oriented BufferedInputStream and BufferedOutputStream classes. Where BufferedInputStream and BufferedOutputStream use an internal array of bytes as a buffer, BufferedReader and BufferedWriter use an internal array of chars.

When a program reads from a BufferedReader , text is taken from the buffer rather than directly from the underlying input stream or other text source. When the buffer empties, it is filled again with as much text as possible, even if not all of it is immediately needed, making future reads much faster. When a program writes to a BufferedWriter , the text is placed in the buffer. The text is moved to the underlying output stream or other target only when the buffer fills up or when the writer is explicitly flushed, which can make writes much faster than would otherwise be the case.

BufferedReader and BufferedWriter have the usual methods associated with readers and writers, like read( ) , ready( ) , write( ) , and close( ) . They each have two constructors that chain the BufferedReader or BufferedWriter to an underlying reader or writer and set the size of the buffer. If the size is not set, the default size of 8,192 characters is used:

 public BufferedReader(Reader in, int bufferSize) public BufferedReader(Reader in) public BufferedWriter(Writer out) public BufferedWriter(Writer out, int bufferSize) 

For example, the earlier getMacCyrillicString( ) example was less than efficient because it read characters one at a time. Since MacCyrillic is a 1-byte character set, it also read bytes one at a time. However, it's straightforward to make it run faster by chaining a BufferedReader to the InputStreamReader , like this:

 public static String getMacCyrillicString(InputStream in) throws IOException { Reader r = new InputStreamReader(in, "MacCyrillic"); r = new BufferedReader(r, 1024); StringBuffer sb = new StringBuffer( ); int c; while ((c = r.read( )) != -1) sb.append((char) c); r.close( ); return sb.toString( ); } 

All that was needed to buffer this method was one additional line of code. None of the rest of the algorithm had to change, since the only InputStreamReader methods used were the read( ) and close( ) methods declared in the Reader superclass and shared by all Reader subclasses, including BufferedReader .

The BufferedReader class also has a readLine( ) method that reads a single line of text and returns it as a string:

 public String readLine( ) throws IOException 

This method is supposed to replace the deprecated readLine() method in DataInputStream , and it has mostly the same behavior as that method. The big difference is that by chaining a BufferedReader to an InputStreamReader , you can correctly read lines in character sets other than the default encoding for the platform. Unfortunately, this method shares the same bugs as the readLine( ) method in DataInputStream , discussed earlier in this chapter. That is, readline( ) tends to hang its thread when reading streams where lines end in carriage returns, as is commonly the case when the streams derive from a Macintosh or a Macintosh text file. Consequently, you should scrupulously avoid this method in network programs.

It's not all that difficult, however, to write a safe version of this class that correctly implements the readLine( ) method. Example 4-1 is such a SafeBufferedReader class. It has exactly the same public interface as BufferedReader ; it just has a slightly different private implementation. I'll use this class in future chapters in situations where it's extremely convenient to have a readLine( ) method.

Example 4-1. The SafeBufferedReader class
 package com.macfaq.io; import java.io.*; public class SafeBufferedReader extends BufferedReader { public SafeBufferedReader(Reader in) { this(in, 1024); } public SafeBufferedReader(Reader in, int bufferSize) { super(in, bufferSize); } private boolean lookingForLineFeed = false; public String readLine( ) throws IOException { StringBuffer sb = new StringBuffer(""); while (true) { int c = this.read( ); if (c == -1) { // end of stream if (sb.equals("")) return null; return sb.toString( ); } else if (c == '\n') { if (lookingForLineFeed) { lookingForLineFeed = false; continue; } else { return sb.toString( ); } } else if (c == '\r') { lookingForLineFeed = true; return sb.toString( ); } else { lookingForLineFeed = false; sb.append((char) c); } } } } 

The BufferedWriter( ) class adds one new method not included in its superclass, called newLine( ) , also geared toward writing lines:

 public void newLine( ) throws IOException 

This method inserts a platform-dependent line-separator string into the output. The line.separator system property determines exactly what the string is: probably a linefeed on Unix and Mac OS X, a carriage return on Mac OS 9, and a carriage return/ linefeed pair on Windows. Since network protocols generally specify the required line-terminator, you should not use this method for network programming. Instead, explicitly write the line-terminator the protocol requires.

4.4.4.2 LineNumberReader

LineNumberReader is a subclass of BufferedReader that keeps track of the current line number. This can be retrieved at any time with the getLineNumber( ) method:

 public int getLineNumber( ) 

By default, the first line number is 0. However, the number of the current line and all subsequent lines can be changed with the setLineNumber( ) method:

 public void setLineNumber(int lineNumber) 

This method adjusts only the line numbers that getLineNumber( ) reports . It does not change the point at which the stream is read.

The LineNumberReader 's readLine( ) method shares the same bug as BufferedReader and DataInputStream 's, and is not suitable for network programming. However, the line numbers are also tracked if you use only the regular read( ) methods, and these do not share that bug. Besides these methods and the usual Reader methods, LineNumberReader has only these two constructors:

 public LineNumberReader(Reader in) public LineNumberReader(Reader in, int bufferSize) 

Since LineNumberReader is a subclass of BufferedReader , it has an internal character buffer whose size can be set with the second constructor. The default size is 8,192 characters.

4.4.4.3 PushbackReader

The PushbackReader class is the mirror image of the PushbackInputStream class. As usual, the main difference is that it pushes back chars rather than bytes. It provides three unread( ) methods that push characters onto the reader's input buffer:

 public void unread(int c) throws IOException public void unread(char[] text) throws IOException public void unread(char[] text, int offset, int length) throws IOException 

The first unread( ) method pushes a single character onto the reader. The second pushes an array of characters. The third pushes the specified subarray of characters, starting with text[offset] and continuing through text[offset+length-1] .

By default, the size of the pushback buffer is only one character. However, the size can be adjusted in the second constructor:

 public PushbackReader(Reader in) public PushbackReader(Reader in, int bufferSize) 

Trying to unread more characters than the buffer will hold throws an IOException .

4.4.4.4 PrintWriter

The PrintWriter class is a replacement for Java 1.0's PrintStream class that properly handles multibyte character sets and international text. Sun originally planned to deprecate PrintStream in favor of PrintWriter but backed off when it realized this step would invalidate too much existing code, especially code that depended on System.out . Nonetheless, new code should use PrintWriter instead of PrintStream .

Aside from the constructors, the PrintWriter class has an almost identical collection of methods to PrintStream . These include:

 public PrintWriter(Writer out) public PrintWriter(Writer out, boolean autoFlush) public PrintWriter(OutputStream out) public PrintWriter(OutputStream out, boolean autoFlush) public void flush( ) public void close( ) public boolean checkError( ) protected void setError( ) public void write(int c) public void write(char[] text, int offset, int length) public void write(char[] text) public void write(String s, int offset, int length) public void write(String s) public void print(boolean b) public void print(char c) public void print(int i) public void print(long l) public void print(float f) public void print(double d) public void print(char[] text) public void print(String s) public void print(Object o) public void println( ) public void println(boolean b) public void println(char c) public void println(int i) public void println(long l) public void println(float f) public void println(double d) public void println(char[] text) public void println(String s) public void println(Object o) 

Most of these methods behave the same for PrintWriter as they do for PrintStream . The exceptions are the four write( ) methods, which write characters rather than bytes; also, if the underlying writer properly handles character set conversion, so do all the methods of the PrintWriter . This is an improvement over the noninternationalizable PrintStream class, but it's still not good enough for network programming. PrintWriter still has the problems of platform dependency and minimal error reporting that plague PrintStream .

It isn't hard to write a PrintWriter class that does work for network programming. You simply have to require the programmer to specify a line separator and let the IOException s fall where they may. Example 4-2 demonstrates . Notice that all the constructors require an explicit line-separator string to be provided.

Example 4-2. SafePrintWriter
 /* * @(#)SafePrintWriter.java 1.0 04/06/28 * * Placed in the public domain * No rights reserved. */ package com.macfaq.io; import java.io.*; /** * @version 1.1, 2004-06-28 * @author Elliotte Rusty Harold * @since Java Network Programming, 2nd edition */ public class SafePrintWriter extends Writer { protected Writer out; private boolean autoFlush = false; private String lineSeparator; private boolean closed = false; public SafePrintWriter(Writer out, String lineSeparator) { this(out, false, lineSeparator); } public SafePrintWriter(Writer out, char lineSeparator) { this(out, false, String.valueOf(lineSeparator)); } public SafePrintWriter(Writer out, boolean autoFlush, String lineSeparator) { super(out); this.out = out; this.autoFlush = autoFlush; if (lineSeparator == null) { throw new NullPointerException("Null line separator"); } this.lineSeparator = lineSeparator; } public SafePrintWriter(OutputStream out, boolean autoFlush, String encoding, String lineSeparator) throws UnsupportedEncodingException { this(new OutputStreamWriter(out, encoding), autoFlush, lineSeparator); } public void flush( ) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.flush( ); } } public void close( ) throws IOException { try { this.flush( ); } catch (IOException ex) { } synchronized (lock) { out.close( ); this.closed = true; } } public void write(int c) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.write(c); } } public void write(char[] text, int offset, int length) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.write(text, offset, length); } } public void write(char[] text) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.write(text, 0, text.length); } } public void write(String s, int offset, int length) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.write(s, offset, length); } } public void print(boolean b) throws IOException { if (b) this.write("true"); else this.write("false"); } public void println(boolean b) throws IOException { if (b) this.write("true"); else this.write("false"); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(char c) throws IOException { this.write(String.valueOf(c)); } public void println(char c) throws IOException { this.write(String.valueOf(c)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(int i) throws IOException { this.write(String.valueOf(i)); } public void println(int i) throws IOException { this.write(String.valueOf(i)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(long l) throws IOException { this.write(String.valueOf(l)); } public void println(long l) throws IOException { this.write(String.valueOf(l)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(float f) throws IOException { this.write(String.valueOf(f)); } public void println(float f) throws IOException { this.write(String.valueOf(f)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(double d) throws IOException { this.write(String.valueOf(d)); } public void println(double d) throws IOException { this.write(String.valueOf(d)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(char[] text) throws IOException { this.write(text); } public void println(char[] text) throws IOException { this.write(text); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(String s) throws IOException { if (s == null) this.write("null"); else this.write(s); } public void println(String s) throws IOException { if (s == null) this.write("null"); else this.write(s); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(Object o) throws IOException { if (o == null) this.write("null"); else this.write(o.toString( )); } public void println(Object o) throws IOException { if (o == null) this.write("null"); else this.write(o.toString( )); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void println( ) throws IOException { this.write(lineSeparator); if (autoFlush) out.flush( ); } } 

This class actually extends Writer rather than FilterWriter , unlike PrintWriter . It could extend FilterWriter instead; however, this would save only one field and one line of code, since this class needs to override every single method in FilterWriter ( close( ) , flush( ) , and all three write( ) methods). The reason for this is twofold. First, the PrintWriter class has to be much more careful about synchronization than the FilterWriter class. Second, some of the classes that may be used as an underlying Writer for this class, notably CharArrayWriter , do not implement the proper semantics for close( ) and allow further writes to take place even after the writer is closed. Consequently, programmers have to handle the checks for whether the stream is closed in this class rather than relying on the underlying Writer out to do it for them.

This chapter has been a whirlwind tour of the java.io package, covering the bare minimum you need to know to write network programs. For a more detailed and comprehensive look with many more examples, check out my other book in this series, Java I/O (O'Reilly).




Java Network Programming
Java Network Programming, Third Edition
ISBN: 0596007213
EAN: 2147483647
Year: 2003
Pages: 164

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net