4.2 Input Streams
Java's basic input class is
java.io.InputStream
:
public abstract class InputStream
This class provides the fundamental
methods
needed to read data as raw bytes. These are:
public abstract int read( ) throws IOException
public int read(byte[] input) throws IOException
public int read(byte[] input, int offset, int length) throws IOException
public long skip(long n) throws IOException
public int available( ) throws IOException
public void close( ) throws IOException
Concrete subclasses of
InputStream
use these methods to read data from particular media. For instance, a
FileInputStream
reads data from a file. A
TelnetInputStream
reads data from a network connection. A
ByteArrayInputStream
reads data from an array of bytes. But whichever source you're reading, you mostly use only these same six methods. Sometimes you don't know exactly what kind of stream you're reading from. For instance,
TelnetInputStream
is an undocumented class hidden inside the
sun.net
package. Instances of it are returned by various methods in the
java.net
package: for example, the
openStream( )
method of
java.net.URL
. However, these methods are declared to return only
InputStream
, not the more specific subclass
TelnetInputStream
. That's polymorphism at work once again. The instance of the subclass can be used
transparently
as an instance of its superclass. No specific knowledge of the subclass is required.
The basic method of
InputStream
is the noargs
read( )
method. This method reads a single byte of data from the input stream's source and returns it as an
int
from 0 to 255. End of stream is signified by returning -1. The
read( )
method waits and blocks execution of any code that
follows
it until a byte of data is available and ready to be read. Input and output can be slow, so if your program is doing anything else of importance, try to put I/O in its own thread.
The
read( )
method is declared abstract because subclasses need to change it to handle their particular medium. For instance, a
ByteArrayInputStream
can implement this method with pure Java code that copies the byte from its array. However, a
TelnetInputStream
needs to use a native library that understands how to read data from the network interface on the host platform.
The following code fragment reads 10 bytes from the
InputStream
in
and stores them in the
byte
array
input
. However, if end of stream is
detected
, the loop is
terminated
early:
byte[] input = new byte[10];
for (int i = 0; i < input.length; i++) {
int b = in.read( );
if (b == -1) break;
input[i] = (byte) b;
}
Although
read( )
only reads a byte, it returns an
int
. Thus, a cast is necessary before storing the result in the byte array. Of course, this produces a signed byte from -128 to 127 instead of the unsigned byte from 0 to 255 returned by the
read( )
method. However, as long as you're clear about which one you're working with, this is not a major problem. You can convert a signed byte to an unsigned byte like this:
int i = b >= 0 ? b : 256 + b;
Reading a byte at a time is as inefficient as writing data one byte at a time. Consequently, there are two overloaded
read()
methods that fill a specified array with multiple bytes of data read from the stream,
read(byte[] input)
and
read(byte[]
input
,
int
offset
,
int
length)
. The first method attempts to fill the specified array
input
. The second attempts to fill the specified subarray of
input
, starting at
offset
and continuing for
length
bytes.
Notice I said these methods
attempt
to fill the array, not that they
necessarily
succeed. An attempt may fail in several ways. For instance, it's not unheard of that while your program is reading data from a remote web server over a PPP dialup link, a bug in a switch at a phone company central office will disconnect you and several thousand of your neighbors from the rest of the world. This would cause an
IOException
. More commonly, however, a read attempt won't completely fail but won't completely succeed, either. Some of the
requested
bytes may be read, but not all of them. For example, you may try to read 1,024 bytes from a network connection, when only 512 have actually arrived from the server; the rest are still in transit. They'll
arrive
eventually, but they aren't available at this moment. To account for this, the multibyte read methods return the number of bytes actually read. For example, consider this code fragment:
byte[] input = new byte[1024];
int bytesRead = in.read(input);
It attempts to read 1,024 bytes from the
InputStream
in
into the array
input
. However, if only 512 bytes are available, that's all that will be read, and
bytesRead
will be set to 512. To guarantee that all the bytes you want are actually read, place the read in a loop that reads repeatedly until the array is filled. For example:
int bytesRead = 0;
int bytesToRead = 1024;
byte[] input = new byte[bytesToRead];
while (bytesRead < bytesToRead) {
bytesRead += in.read(input, bytesRead, bytesToRead - bytesRead);
}
This technique is
especially
crucial for network streams. Chances are that if a file is available at all, all the bytes of a file are also available. However, since networks move much more slowly than CPUs, it is very easy for a program to empty a network buffer before all the data has arrived. In fact, if one of these two methods
tries
to read from a temporarily empty but
open
network buffer, it will
generally
return 0, indicating that no data is available but the stream is not yet closed. This is often preferable to the behavior of the single-byte
read( )
method, which blocks the running thread in the same circumstances.
All three
read( )
methods return -1 to signal the end of the stream. If the stream ends while there's still data that hasn't been read, the multibyte
read( )
methods return the data until the buffer has been emptied. The
next
call to any of the
read( )
methods will return -1. The -1 is never placed in the array. The array only contains actual data. The previous code fragment had a bug because it didn't consider the possibility that all 1,024 bytes might never arrive (as opposed to not being immediately available). Fixing that bug requires testing the return value of
read( )
before adding it to
bytesRead
. For example:
int bytesRead=0;
int bytesToRead=1024;
byte[] input = new byte[bytesToRead];
while (bytesRead < bytesToRead) {
int result = in.read(input, bytesRead, bytesToRead - bytesRead);
if (result == -1) break;
bytesRead += result;
}
If you do not want to wait until all the bytes you need are immediately available, you can use the
available()
method to determine how many bytes can be read without blocking. This returns the minimum number of bytes you can read. You may in fact be able to read more, but you will be able to read at least as many bytes as
available()
suggests. For example:
int bytesAvailable = in.available( );
byte[] input = new byte[bytesAvailable];
int bytesRead = in.read(input, 0, bytesAvailable);
// continue with rest of program immediately...
In this case, you can assert that
bytesRead
is exactly equal to
bytesAvailable
. You cannot, however, assert that
bytesRead
is greater than zero. It is possible that no bytes were available. On end of stream,
available( )
returns 0. Generally,
read(byte[]
input
,
int offset
,
int
length)
returns -1 on end of stream; but if
length
is 0, then it does not notice the end of stream and returns 0 instead.
On rare occasions, you may want to skip over data without reading it. The
skip( )
method accomplishes this task. It's less useful on network connections than when reading from files. Network connections are sequential and normally quite slow, so it's not significantly more
time-consuming
to read data than to skip over it. Files are random access so that skipping can be implemented simply by
repositioning
a file pointer rather than processing each byte to be
skipped
.
As with output streams, once your program has finished with an input stream, it should close it by invoking its
close()
method. This releases any resources associated with the stream, such as file handles or ports. Once an input stream has been closed, further reads from it throw
IOException
s. However, some kinds of streams may still allow you to do things with the object. For instance, you generally won't want to get the message digest from a
java.security.DigestInputStream
until after the data has been read and the stream closed.
4.2.1 Marking and Resetting
The
InputStream
class also has three less commonly used methods that allow programs to back up and reread data they've already read. These are:
public void mark(int readAheadLimit)
public void reset( ) throws IOException
public boolean markSupported( )
In order to reread data, mark the current position in the stream with the
mark( )
method. At a later point, you can reset the stream to the
marked
position using the
reset( )
method. Subsequent reads then return data starting from the marked position. However, you may not be able to reset as far back as you like. The number of bytes you can read from the mark and still reset is determined by the
readAheadLimit
argument to
mark( )
. If you try to reset too far back, an
IOException
is thrown. Furthermore, there can be only one mark in a stream at any given time. Marking a second location erases the first mark.
Marking and resetting are usually implemented by storing every byte read from the marked position on in an internal buffer. However, not all input streams support this. Before trying to use marking and resetting, check to see whether the
markSupported()
method returns true. If it does, the stream supports marking and resetting. Otherwise,
mark( )
will do nothing and
reset( )
will throw an
IOException
.
|
In my opinion, this
demonstrates
very poor design. In practice, more streams
don't
support marking and resetting than
do
. Attaching functionality to an abstract superclass that is not available to many, probably most, subclasses is a very poor idea. It would be better to place these three methods in a separate interface that could be implemented by those classes that provided this functionality. The
disadvantage
of this approach is that you couldn't then invoke these methods on an arbitrary input stream of unknown type, but in practice, you can't do that anyway because not all streams support marking and resetting. Providing a method such as
markSupported( )
to check for functionality at runtime is a more traditional, non-object-oriented solution to the problem. An object-oriented approach would embed this in the type system through interfaces and classes so that it could all be checked at compile time.
|
|
The only two input stream classes in
java.io
that always support marking are
BufferedInputStream
and
ByteArrayInputStream
. However, other input streams such as
TelnetInputStream
may support marking if they're chained to a buffered input stream first.
|