6.9 IO STREAMS FOR JAVA | Programming with Objects: A Comparative Presentation of Object Oriented Programming with C++ and Java

6.9 I/O STREAMS FOR JAVA

Here are some interesting differences between how I/O is set up in C++ and in Java:

Whereas in C++ all the needed input/output functionality is packed into a relatively small number of stream classes, Java offers a separate class for practically every situation.^[22] If you want to carry out byte stream I/O in its most basic form, Java offers classes designed just for that purpose. But if you are interested in the reading and writing of higher level primitive types, such as integers, floating point numbers, and so on, you use a different set of stream classes that you wrap around the streams for byte-level I/O. If you want to use buffering, you bring in yet additional classes. If your input/output happens to consist of Unicode character streams, there are stream classes that possess methods designed specially for that purpose. If you are interested in reading and writing with random access, Java gives you separate classes for that, too, and so on. Like C++, Java also gives you separate classes for reading from and writing into strings. Additionally, Java gives you classes for reading from and writing into arrays.
While the C++ streams are buffered by default, the Java streams are unbuffered by default. However, you can incorporate buffering in a Java stream by invoking classes specified for that purpose.
In Java, all integer and floating point numbers are output in the big-endian representation regardless of the underlying platform. This is different from C++, as we saw in the previous section. Therefore, data files produced by Java tend to be more portable.
A typical C++ program uses ASCII-encoded bytes for the characters used either individually or in strings. When these characters are written out to an output device, they remain in their ASCII-encoded form. On the other hand, a character in a Java program is always represented by its 2-byte Unicode. However, if it is desired to write out these characters to an output device, a typical Java program will do so using the ASCII encodings. This is true of input also. A typical C++ program will expect the text based input to be ASCII encoded; the characters in such input will then remain ASCII-encoded inside the program. A typical Java program will also expect its text based input to be ASCII encoded, but will convert the characters into their Unicode representations on the fly as they are read into memory. We are only describing typical situations here. Java I/O provides the functionality needed for writing out character streams using Unicode and other representations and for reading character streams that are based on Unicode and other representations.

Shown in Figures 6.3 and 6.4 are the hierarchies of the Java stream classes. The classes that are particularly suitable for sequential bytestream I/O descend from the OutputStream and InputStream abstract classes. For character streams, the same is done by the classes that descend from Reader and Writer. Random access I/O capability is provided by the class RandomAccessFile:

click to expand
Figure 6.3

click to expand
Figure 6.4

The rest of this section is organized in a somewhat unconventional manner, the aim being to augment what the reader can glean easily from the on-line description for each of the classes. Toward that end, we will present some of the more commonly used functionality of the more frequently used classes as a whole, as opposed to in piecemeal. Of the classes whose use we will not illustrate in the programs shown here, PipedOutputStream and PipedInputStream will be taken up later in Chapter 18 on multithreading.

6.9.1 Writing Primitive Types

It is rather easy to get a feeling of being lost in the very large number of stream classes offered by Java, especially because one is often able to do the same thing by chaining different stream classes together in different ways. To illustrate the extent of choice available for all the primitive types, in this subsection we will show some of the different ways in which we can try to write an int into a file and examine what happens in each case. Although all our discussion here will be for the writing of ints, parallel arguments would hold for the other primitive types.

With regard to the writing of an int to a file, realize (as we did for the case of C++ streams) that an int like 98 will be stored in the memory of the computer as a 4-byte entity whose bit pattern will correspond to the hex

      00 00 00 62

in its big-endian representation, which as was mentioned before is the only representation Java understands for both integers and floating-point types. So when we talk about writing this integer to a file, the question becomes as to whether we want this bit pattern to be written out exactly as it is, or whether we want the characters 9 and 8 to be written out separately as two separate bytes. If we desire the character representation for the output, we then have to decide whether Java should output the 2-byte Unicode representation for the characters, or the 1-byte ASCII-based representation, or, perhaps, some other encoding that might be relevant in a particular context.

Shown below is a program that illustrates some of the many different ways in which an int can be written out to a disk file in Java. Not all of these attempts will produce the desired result, but the reader will nonetheless find it educational to see what happens to the output in each case.

 
 //WriteIntToFile.Java import java.io.*; class WriteIntToFile {      public static void main( String[] args ) throws Exception {           int anInt = 98;/                                           //(A)           FileOutputStream fos = new FileOutputStream( "out.fos" );  //(B)           fos.write( anInt );                                        //(C)           fos. close();           FileWriter fw = new FileWriter( "out.fw" );                //(D)           fw.write( anInt );                                         //(E)           fw.close();           DataOutputStream dos = new DataOutputStream(                new FileOutputStream( "out.dos" ) );                  //(F)           dos.writeInt( anInt );                                     //(G)           dos.close();           DataOutputStream dbos = new DataOutputStream(                new BufferedOutputStream(                     new FileOutputStream( "out.dbos" ) ) );          //(H)           dbos.writeInt( anInt );                                    //(I)           dbos.close();           PrintStream ps = new PrintStream(                new FileOutputStream( "out.ps" ) );                   //(J)           ps.print( anInt );                                         //(K)           ps.close();           PrintStream pbs = new PrintStream(                new BufferedOutputStream(                     new FileOutputStream( "out.pbs" ) ) );           //(L)           pbs.print( anInt );                                        //(M)           pbs.close();           PrintWriter pw = new PrintWriter(                new FileOutputStream( "out.pw" ) );                   //(N)           pw.print( anInt );                                         //(O)           pw.close();           PrintWriter pbw = new PrintWriter(                new BufferedOutputStream(                     new FileOutputStream( "out.pbw" ) ) );           //(P)           pbw.print( anInt );                                        //(Q)           pbw.close();           PrintWriter pw2 = new PrintWriter(                new FileWriter( "out.pw2" ) );                                                                      //(R)           pw2.print( anInt );                                        //(S)           pw2.close();           RandomAccessFile ra =                new RandomAccessFile( "out.ra", "rw" );               //(T)           ra.writeInt( anInt );                                      //(U)           ra.close();      } }

Table 6.1 shows how many bytes were output to the disk file in each case, the contents of *** (A). Our first attempt to write out this int, in lines (B) and (C) of the program, consists of the following code fragment:

Table 6.1
filename	file size in bytes	file content in hex	output as displayed by ‘cat filename' (cat reades in text mode)
out.fos	1	62	b
out.fw	1	62	b
out.dos	4	00000062	b
out.dbos	4	00000062	b
out.ps	2	3938	98
out.pbs	2	3938	98
out.pw	2	3938	98
out.pbw	2	3938	98
out.pw2	2	3938	98
out.ra	4	00000062	b

      FileOutputStream fos = new FileOutputStream( "out.fos" );      fos.write( anInt );

We attach a FileOutputStream object fos with a file named out.fos and invoke the method write ( int ) defined for the stream class to write out the integer. This stream class only supports methods for writing a single byte or an array of bytes.

When its write method is supplied with an int argument, it takes the lowest byte of that int and writes that out into the file. As a consequence, only the byte whose hex is 62 is written into the file. This explains why the size of the output file out. fos in Table 6.1 is only one byte and why when this file is read in the text mode, the content of the file is the letter ‘b'. FileOutputStream is an example of a byte stream in Java.

Our next attempt, in lines (D) and (E) of the program, consists of

      FileWriter fw = new FileWriter( "out.fw" );      fw.write( anInt );

We attach a FileWriter stream with an output file called out.fw and invoke the method write defined for the stream class. This stream accepts a character or an array of characters, and converts the 2-byte Unicode representation of each into a single byte according to the platform's default encoding scheme. When the write method of this stream is invoked with an int argument, it puts out the character corresponding to the lowest 16 bits of the four bytes of the integer. In our example, the character put out will be the one corresponding to the hex 0062. That character is the letter 'a' and the one byte that will be written into the output file will be the hex 62. That explains why the size of the output file out. fw in Table 6.1 is only one byte and, when read in the text mode, the content of the file appears to be the letter ‘b'. FileWriter is an example of a character output stream in Java.

Our next attempt, in lines (F) and (G) of the program, consists of

      DataOutputStream dos = new DataOutputStream(                                       new FileOutputStream( "out.dos" ) );      dos.writeInt( anInt );

This is an example of a compound stream, a stream obtained by wrapping one stream, in our case a DataOutputStream, around another stream, FileOutputStream in the code shown. What do we achieve by wrapping one stream around another? Wrapping in the manner shown here allows you to use the "higher-level" functionality of the wrapper stream and the utility of the stream that is wrapped. Compared to just a byte or an array of bytes that a FileOutputStream can write, the class DataOutputStream supports methods for writing out higher level data types such as ints, floats, doubles, Strings, and so on, using methods with names like writeInt, writeFloat, writeDouble, writeChars, and so on. A DataOutputStream knows how to convert these higher level data types into byte streams, which can then be output by the FileOutputStream object. As a consequence, what will be written into the file out. dos will be the four bytes of hex 00 00 00 62-the value of the variable anInt. This accounts for the size 4 for the file out. dos in Table 6.1. If this file is read in text mode, its only displayable character would correspond to the byte of hex 62-the letter ‘b'. DataOutputStream is also a byte stream, like the FileOutputStream.

The next attempt to write out an int, in lines (H) and (I) of the program, illustrates the syntax one uses for buffering a stream (although you'd, of course, have no need for buffering if all you want to do is to write out a single integer):

      DataOutputStream dbos = new DataOutputStream(           new BufferedOutputStream(new FileOutputStream("out.dbos")));      dbos.writeInt(anInt);

As mentioned already, unlike C++ streams Java streams do not provide buffering by default. Since the actual transfer of a data to and from a physical device such as a disk can be very slow in relation to the processing speeds, buffering can make a significant performance difference when doing I/O on long streams of data. A BufferedOutputStream accumulates the data to be written out in a section of the memory called the buffer. The actual transfer of the data to the physical device takes place only when the buffer is full or when it is explicitly flushed under program control. Like our previous attempt in lines (F) and (G), note that we are still attaching a DataOutputStream with the output file and we are still using the writeInt method of this stream to write out the integer. The only difference is that the byte stream produced by the DataOutputStream is being routed through a BufferedOutputStream. So, as shown by the entry for the output file out. dbos in Table 6.1, the overall effect in terms of what gets written into the output file and the size of the file remain the same as before. BufferedOutputStream is also a byte stream.

Our next attempt, in lines (J) and (K) of the program, consists of

      PrintStream ps = new PrintStream(new FileOutputStream("out.ps"));      ps.print(anInt);

A PrintStream first creates a character representation for the data that is output and then puts out a byte stream, with one byte for each character. The character to byte translation takes place using the platform's default character coding-meaning ASCII in most cases. So for the integer 98, the print method invoked in line (K) will convert the 4-byte int into a two character sequence, the character 9 followed by the character 8. It will then put out two bytes, one for 9 and one for 8. This two-byte sequence will then be fed into the the FileOutputStream that is enclosed by the PrintStream object. The row for the output file out.ps in Table 6.1 corresponds to this case. The print or printIn methods defined for PrintStream take different types of arguments such as int, double, String, and so on. The printIn methods output a line separator character after the characters corresponding to the data. PrintStream is a byte stream, because in addition to putting out characters, it also supports write methods for outputting individual and array of bytes. The standard output stream System.out and the standard error stream System.err are PrintStream objects.

Our next attempt, in lines (L) and (M) of the program, consists of

      PrintStream pbs = new PrintStream(           new BufferedOutputStream(new FileOutputStream("out.pbs")));      pbs.print( anInt );

This produces exactly the same result as the previous example. In both cases, we attach a PrintStream with the output file, the difference here being that the output of the PrintStream is now routed through a BufferedOutputStream. The row for the output file out.pbs in Table 6.1 corresponds to this case.

Our next attempt, in lines (N) and (O), consists of

      PrintWriter pw = new PrintWriter(new      FileOutputStream("out.pw"));      pw.print( anInt );

The various print and printIn methods of PrintWriter class behave in a manner exactly similar to those for a PrintStream. So the integer 98 will first be converted into its print representation, that is a pair of characters consisting of 9 followed by 8. Subsequently, as shown by the entry for the output file out.pw in Table 6.1, a byte will be output for each character. PrinterWriter is an example of a character stream because, unlike PrintStream, it does not support any methods for the writing of individual bytes or arrays of bytes.

Our next attempt to write out an int to a disk file, in lines (P) and (Q) of the program, consists of

      PrintWriter pbw = new PrintWriter(           new BufferedOutputStream(new FileOutputStream("out.pbw")));      pbw.print( anInt );      pbw.close();

This is exactly the same case as in the lines (N) and (O), except that now we buffer the output of the PrintWriter. The output produced is shown in the row for the output file out.pbw in Table 6.1.

With regard to using compound streams, our next attempt in lines (R) and (S) of the program shows a departure from all the previous cases. Instead of having a FileOutputStream do the actual writing into the disk file, we now give that task to a FileWriter.

      PrintWriter pw2 = new PrintWriter( new FileWriter( "out.pw2" ) );      pw2.print( anInt );

In this case, the PrintWriter object pw2 outputs the sequence of characters corresponding to the print representation of the argument anInt supplied to the print method. The FileWriter converts the sequence of characters into an array of bytes that it outputs to the file, as shown by the row for out.pw2 in Table 6.1.

All of the stream classes we have discussed so far give us sequential access to a disk file. With sequential access in the write mode, successive items of information are placed at the end of what's already there. But sometimes it is important to be able to read and write data anywhere in a file. Java gives us the RandomAccessFile stream class for that purpose. This class allows us to open a file in either just the read mode or the read/write mode. Since we want to write out an integer, in the code fragment shown below [these are lines (T) and (U) of the program], we have no choice but to open the file in the latter mode. This is done by supplying the string argument "rw" to the stream constructor.

      RandomAccessFile ra = new RandomAccessFile( "out.ra", "rw" );      ra.writeInt( anInt );      ra.close();

A RandomAccessFile stream allows us to associate a file pointer with a file. An output operation writes bytes starting at the current position of the file pointer and advances the pointer until it is past the last byte written. (By the same token, input operations that we will talk about later start reading at the current location of the file pointer and the pointer is advanced until it is past the last byte read.) The methods getFilePointer and seek defined for the RandomAccessFile return the current position of the file pointer and allow us to set the position of the pointer under program control. In the example code shown above, we have invoked the writeInt method to write our integer into the output file. This is one of the many write methods defined for RandomAccessFile, one for each primitive type, and some additional ones for the string type. Each of these methods writes out a binary representation of the data. In our example, the four bytes of the integer will be written out, explaining in Table 6.1 the size of the output file out.ra and why the content of the file appears as the letter ‘b' when read in the text mode.

The reader might have wondered why we had to designate a separate output file for each case separately. Why couldn't we have used a single output file and simply appended the output to the file for each case, perhaps in a separate line for clarity? Yes, we could have done so. Java allows the append file mode to be invoked for the two stream classes that we used for the physical transfer of bytes into the output files, FileOutputStream and FileWriter. To open a disk file in the append mode, the constructors for the two classes would have to follow the syntax

      FileOutputStream( String fileName, boolean append )      FileWriter( String fileName, boolean append )

For the RandomAccessFile stream when used in the read/write mode, new information can be appended to what's already in a file by setting the file pointer to the end of the file:

      RandomAccessFile ra = new RandomAccessFile( "out.ra", "rw" );      ra.seek( ra.length() );

Finally, note that Java's stream classes are in the package java.io, which you'd need to import into your program if it uses any of the stream classes shown here.

6.9.2 Writing Strings

We believe that the reader will gain additional insights into the Java stream classes if we repeat the exercise of the previous subsection for the case of writing a string to a disk file. Although no less illuminating, the discussion here will be shorter since the reader already knows about the basic properties of the stream classes that we will use. For example, the reader already knows that a FileWriter stream accepts a character or an array of characters, and converts the 2-byte Unicode representation of each into a single byte according to the platform's default encoding scheme.

Shown in the program below are ten of the many different ways of writing a string to a disk file. In some cases, the different ways correspond to the different write methods defined for the same stream class.

 
 //WriteStringToFile.java import.java.io.*; class WriteStringToFile { public static void main( String[] args ) throws Exception {      String aString = "hello";                                   //(A)      FileWriter fw = new FileWriter( "out.fw" );                                                                                   //(B)      fw.write( aString );                                        //(C)      fw.close();      DataOutputStream dos = new DataOutputStream(                       new FileOutputStream( "out.dos" ) );       //(D)      dos.writeBytes( aString );                                  //(E)      dos.close();      DataOutputStream dos2 = new DataOutputStream(           new FileOutputStream( "out.dos2" ) );                  //(F)      dos2.writeChars( aString );                                 //(G)      dos2.close();      DataOutputStream dos3 = new DataOutputStream(           new FileOutputStream( "out.dos3" ) );                  //(H)      dos3.writeUTF( aString );                                   //(I)      dos3.close();      PrintStream ps =           new PrintStream( new FileOutputStream( "out.ps" ) );   //(J)      ps.print( aString );                                        //(K)      ps.close();      PrintWriter pw =           new PrintWriter( new FileOutputStream( "out.pw" ) );   //(L)      pw.print( aString );                                        //(M)      pw.close();      PrintWriter pw2 =           new PrintWriter( new FileWriter( "out.pw2" ) );        //(N)      pw2.print( aString );                                       //(O)      pw2.close();      RandomAccessFile ra =           new RandomAccessFile( "out.ra", "rw" );                //(P)      ra.writeBytes( aString );                                   //(Q)      ra.close();      RandomAccessFile ra2 =           new RandomAccessFile( "out.ra2", "rw" );               //(R)      ra2.writeChars( aString );                                  //(S)      ra2.close();      RandomAccessFile ra3 =           new RandomAccessFile( "out.ra3", "rw" );               //(T)      ra3.writeUTF( aString );                                    //(U)      ra3.close();    } }

Table 6.2 shows how many bytes were output to the disk file in each case, the actual contents of each file in hex, and what the contents of each file are according to a text reader.

Table 6.2
filename	file size in bytes	file content in hex	output as displayed by 'cat filename' (cat reads in text mode)
out.fw	5	68 656c 6c 6f	hello
out.dos	5	68 65 6c 6c 6f	hello
out.dos2	10	00 68 00 65 00 6c 00 6c 00 6f	hello
out.dos3	7	00 05 68 65 6c 6c 6f	hello
out.ps	5	68 65 6c 6c 6f	hello
out.pw	5	68 65 6c 6c 6f	hello
out.pw2	5	68 65 6c 6c 6f	hello
out.ra	5	68 65 6c 6c 6f	hello
out.ra2	10	00 68 00 65 00 6c 00 6c 00 6f	hello
out.ra3	7	00 05 68 65 6c 6c 6f	hello

To explain the output produced by the lines (B) and (C) of the program, the write method of the FileWriter stream class writes out the 5 ASCII bytes corresponding to the five letters of the string "hello". This is confirmed by the contents of the file out.fw shown in hex in the first row of Table 6.2.^[23]

This behavior is also exhibited by the writeBytes method of the DataOutputStream class used in the lines (D) and (E) of the program, as shown by the size and the contents of the file out.dos in Table 6.2. The writeBytes method of DataOutputStream knows how to convert a string into an array of bytes, one for each character of the string. The byte array is then written out by the FileOutputStream object.

Contrast the previous two attempts with the code shown in lines (F) and (G) of the program. We now use the writeChars method of DataOutputStream to write out the string. This method outputs a character stream, meaning that for each character in the string it outputs the two bytes of the Unicode representation of the character. This byte array is then written out to the file by the FileOutputStream object. This explains the 10-byte size of the file out.dos2 in the third row of Table 6.2 and the hex content shown in the next column.

Lines (H) and (I) of the program again use a DataOutputStream, but this time we have used the method writeUTF. This method uses UTF-8 encoding^[24] to convert the string "hello" into an array of 7 bytes, the first two of which represent an unsigned integer whose value is the number of bytes needed for the characters of the string. Since the string "hello" consists entirely of ASCII characters and since in UTF-8 each such character is represented by a single byte consisting of the ASCII encoding itself, we have in the output file out.dos3 what's shown Table 6.2.

The fifth row of the table corresponds to the lines (J) and (K) of the program. We now write out the string by invoking the print method of the PrintStream class. This method converts the string into an array of bytes, one for each character in the string using the platform's default encoding, which in most cases will be the ASCII encoding. As a result, the size of the output file out.ps is 5 and its contents just the ASCII codes for the five characters of "hello", as shown by the row for out.ps in Table 6.2. Exactly the same thing happens when we invoke the print of PrintWriter wrapped around FileOutputStream, as in lines (L) and (M), and PrintWriter wrapped around FileWriter as in lines (N) and (O). This is demonstrated by the rows for out. pw and out.pw2 in Table 6.2.

The last three rows in Table 6.2 correspond to the output being written out by a RandomAccessFile stream: by using the writeBytes method in lines (P) and (Q); by using the writeChars method in lines (R) and (S); and, finally, by using the writeUTF method in lines (T) and (U). As you'd expect by now and as shown by the row for out.ra in Table 6.2, the writeBytes method writes out a single ASCII encoded byte for each character. On the other hand, the writeChars method writes out the two Unicode bytes for each character, as shown by the row for out.ra2 in Table 6.2. Finally, the writeUTF method outputs the UTF-8 encoding for the entire string, as demonstrated by the row for out.ra3 in Table 6.2.

6.9.3 Reading the Primitive Types

Compared to the options for writing the primitive types into a file, the options for reading the types are not as numerous. Basically, it boils down to this: If a primitive type is in its binary form, it can be read into a Java program by using one of the methods defined for either the DataInputStream or the RandomAccessFile stream classes. But if the primitive exists in its print representation (meaning as a sequence of characters), then you have to devise your own method for reading it in.

Shown below is a program that first writes out an integer in its binary representation into a disk file, out .num. in lines (A) and (B). So for the integer 123456, the four bytes of hex 00 01 e2 40 get written into the file. In lines (C) and (D). we then use the readInt method of the unbuffered DataInputStream to read the four bytes from the file into the local variable x. We do the same through a buffered version of DataInputStream in lines (E) and (F). Then the program does the same thing again by invoking the readInt method of the RandomAccessFile stream class in lines (G) and (H).

The output of the program is shown in the commented out portions of the lines containing calls to the System. out.printIn method.

 
 //ReadIntFromFile.java import java.io.*; class ReadIntFromFile {     public static void main( String[] args ) throws Exception {      int anInt = 123456;            int x;      DataOutputStream dos = new DataOutputStream(           new FileOutputStream( "out.num" ) );                 //(A)      dos.writeInt( anInt ); //writes hex 00 01 e2 40 to file   //(B)      dos.close();      // read int with DataInputStream      DataInputStream dis = new DataInputStream(           new FileInputStream( "out.num" ) );                  //(C)      x = dis.readInt();                                        //(D)      System.out.printIn(x); // 123456      dis.close();      // read int with buffered DataInputStream      DataInputStream dbis = new DataInputStream(      new BufferedInputStream(           new FileInputStream("out.num")));                    //(E)      x = dbis.readInt();                                       //(F)      System.out.printIn( x ); // 123456      dbis.close();      // read int with RandomAccessFile      RandomAccessFile rai = new           RandomAccessFile( "out.num", "r" );                  //(G)      x = rai.readInt();                                        //(H)      System.out.printIn( x ); // 123456      rai.close();    } }

One could write similar demonstration programs for the reading of the other primitive types, such as longs, floats, doubles, bytes, chars, and so on.

6.9.4 Reading Strings

The following two issues arise in the reading of strings from files, sockets, or any other source:

Does a string have some overall encoding, such as a UTF-8 encoding, or is it represented by a fixed number of bytes for each character?
If each character of a string is represented by the same number of bytes, is it one ASCII-encoded byte per character or is it two Unicode bytes per character?

Since a string can be composed of an arbitrary number of characters, in general it is not possible to devise a readString method that would work for strings the way the readInt of the DataInputStream class works for the reading of integers. For the case of reading integers in their binary representation, the readInt method can safely use the fact that every four bytes represent an integer, implying that it does not have to look for boundaries between consecutive integers. But any method that tries to read a string in one fell swoop would have no way to detect the end of one string and the beginning of another for the cases of ASCII or Unicode encodings. But that problem does not arise for UTF-8 encodings of strings for obvious reasons.

Shown below is a program that first writes two UTF-8 encoded strings consecutively into a file called out. dos in lines (C), (D), and (E). The strings were originally declared in lines (A) and (B). The hex for the bytes that are written into the file is shown commented out in the program. These byte patterns are in accord with our explanation of UTF-8 coding in the previous subsection. Subsequently, the program invokes the method readUTF of the DataInputStream in lines (F), (G), and (H) to read the two strings in the file. Finally, the program does the same with the help of the RandomAccessFile class in lines (I), (J), and (K).

 
 //ReadStringFromFile. java import java.io.*; class ReadStringFromFile {      public static void main( String[] args ) throws Exception {           String aString = "hello";                               //(A)           String bString = "there";                               //(B)           String str;           DataOutputStream dos = new DataOutputStream(                new FileOutputStream( "out.dos" ) );               //(C)            //hex output for "hello": 00 05 68 65 6c 6c 6f           dos.writeUTF( aString );                                //(D)                      //hex output for "there": 00 05 74 68 65 72 65           dos.writeUTF( bString );                                //(E)           dos.close();           DataInputStream dis = new DataInputStream(                new FileInputStream( "out.dos" ) );                //(F)           str = dis.readUTF();                                    //(G)           System.out.printIn( str );       // hello           str = dis.readUTF();                                    //(H)           System.out.printIn( str );       // there           dis.close();           RandomAccessFile ra =                new RandomAccessFile( "out.dos", "r" );            //(I)           str = ra.readUTF();                                     //(J)           System.out.printIn( str );       // hello           str = ra.readUTF(); //(K)           System.out.printIn( str );       // there           ra.close();       } }

For reading strings that are ASCII encoded or in Unicode, you have to write your own routines that either read a file character by character and look for delimiter characters for the string boundaries, or that read the entire file into a single large string and then use a string tokenizer to recover the strings placed originally in the file.

^[22]It is a matter of perspective whether you'd consider the number of stream classes in C++ to be "relatively small." The number of C++ stream classes begin to look not so small if (for a fair comparison with Java) you include the "w" prefixed versions of the classes shown in the previous section. C++ stream classes such as wistream, wostream, etc., are designed for dealing with 2-byte character representations.

^[23]In ASCII coding, the hex for the letter ‘h' is 68, for the letter ‘e' 65, for the letter ‘l' 6c, and for the letter ‘o' 6f.

^[24]While the Unicode has the advantage that its 16 bits allow us to create bit patterns for almost all the characters used in the different languages of the world, it makes for inefficient storage and transmission of text that is in English and other closely related languages (whose characters can be well represented by the 7-bit ASCII code). To get around this problem, a variable-width encoding was invented-UTF-8-that retains a single byte representation for the characters in the ASCII set, but has 2-byte and 3-byte representations for the other character sets. This is how Unicode is translated into UTF-8 bytes:

      Characters in the range \u0001 to \u007f encoded as a single byte      Characters in the range \u0080 to \u07ff encoded as two bytes      Characters in the range \u0800 to \uffff encoded as three bytes

In order to be able to distinguish between 1-byte, 2-byte, and 3-byte characters in a byte stream, a 1-byte character always begins with the high-order bit O, a 2-byte character with the high-order bits 110, and a 3-byte character with the high-order bits 1110. The second byte of a 2-byte character and the second and the third bytes of a 3-byte character must always begin with 10 for the high-order bits. This is illustrated pictorially below, where ‘x' represents a data bit:

      1-byte character: Oxxx xxxx      2-byte character: 110x xxxx 10xx xxxx      3-byte character: 1110 xxxx 10xx xxxx 10xx xxxx

We therefore have 7 data bits for 1-byte characters, 11 for 2-byte characters, and 16 for 3-byte characters. Additionally, the first two bytes of the UTF-8 encoding of a string must represent the number of bytes needed for all the characters in the string. This 2-byte representation of the length must be in the network byte order, meaning that the first byte holds the largest part of the value. This is the agreed-upon byte order for transmitting integer and floating-point numbers over the internet. (Network byte order is the same as the big-endian byte order used by many processors for the representation of integers and floating-point values. The other byte order, used by the x86 family of processors, is the little-endian byte order in which the first byte is the least significant byte. As mentioned earlier in this chapter, Java uses the big-endian byte order for representing numbers regardless of the platform.) Here is a UTF-8 encoding of the string "hello":

      00 05 68 65 6c 6c 6f

while its ASCII encoding would be

      68 65 6c 6c 6f

Java uses UTF-8 with a slight modification: the character \u0000 is encoded using the two bytes 1100 0000 1000 0000, which follows the 2-byte rule shown previously. This eliminates the need for encoding this character with a byte of all zeros. The acronym UTF stands for UCS Transmission Format, where UCS stands for Universal Character Set.