Most programs need to interact with the outside world, and one common way of doing so is by reading and writing files. Files are normally on some persistent medium such as a disk drive, and, for the most part, we shall happily ignore the differences between a hard disk (and all the operating system-dependent filesystem types), a floppy or zip drive, a CD-ROM, and others. For now, they're just files.
This chapter covers all the normal input/output operations such as opening/closing and reading/writing files. Files are assumed to reside on some kind of file store or permanent storage. I don't discuss how such a filesystem or disk I/O system works consult a book on operating system design or a platform-specific book on system internals or filesystem design. Network filesystems such as Sun's Network File System (NFS, common on Unix and available for Windows through products such as Hummingbird NFS Maestro), Macintosh Appletalk File System (used for OS 9; available for Unix via the open source Netatalk), and SMB (Windows network filesystem, available for Unix with the open source Samba program) are assumed to work "just like" disk filesystems, except where noted.
JDK 1.5 introduced the Formatter and Scanner classes, which provide substantial new functionality. Formatter allows many formatting tasks to be performed either into a String or to almost any output destination. Scanner parses many kinds of objects, again either from a String or from almost any input source. These are new and very powerful; each is given its own recipe in this chapter.
Streams and Readers/Writers
Java provides two sets of classes for reading and writing. The Stream section of package java.io (see Figure 10-1) is for reading or writing bytes of data. Older languages tended to assume that a byte (which is a machine-specific collection of bits, usually eight bits on modern computers) is exactly the same thing as a "character" a letter, digit, or other linguistic element. However, Java is designed to be used internationally, and eight bits is simply not enough to handle the many different character sets used around the world. Script-based languages like Arabic and Indian languages, and pictographic languages like Chinese and Japanese, each have many more than 256 characters, the maximum that can be represented in an eight-bit byte. The unification of these many character code sets is called, not surprisingly, Unicode. Actually, it's not the first such unification, but it's the most widely used standard at this time. Both Java and XML use Unicode as their character sets, allowing you to read and write text in any of these human languages. But you have to use Readers and Writers, not Streams, for textual data.
Figure 10-1. java.io classes
Unicode itself doesn't solve the entire problem. Many of these human languages were used on computers long before Unicode was invented, and they didn't all pick the same representation as Unicode. And they all have zillions of files encoded in a particular representation that isn't Unicode. So conversion routines are needed when reading and writing to convert between Unicode String objects used inside the Java machine and the particular external representation that a user's files are written in. These converters are packaged inside a powerful set of classes called Readers and Writers. Readers and Writers are always used instead of InputStreams and OutputStreams when you want to deal with characters instead of bytes. We'll see more on this conversion, and how to specify which conversion, a little later in this chapter.
One topic not addressed in depth here is the Java " New IO" package (it was "new" in 1.4). NIO is more complex to use, and the benefits accrue primarily in large-scale server-side processing. Recipe 4.5 provides one example of using NIO. The NIO package is given full coverage in Java NIO by Ron Hitchens (O'Reilly).
Another issue not addressed here is hardcopy printing. Java's scheme for printing onto paper uses the same graphics model as is used in AWT, the basic Window System package. For this reason, I defer discussion of printing to Chapter 13.
Another topic not covered here is that of having the read or write occur concurrently with other program activity. This requires the use of threads, or multiple flows of control within a single program. Threaded I/O is a necessity in many programs: those reading from slow devices such as tape drives, those reading from or writing to network connections, and those with a GUI. For this reason the topic is given considerable attention, in the context of multithreaded applications, in Chapter 24.