Chapter 12. Working with Java Streams | Python Programming with the Javaв„ў Class Libraries: A Tutorial for Building Web and Enterprise Applications with Jython

CONTENTS

The Java Way of File Operations
Text Streams
Binary Streams: InputStream and OutputStream
DataInput and DataOutput
The File Class
The RandomAccessFile Class
The StreamTokenizer Class
Persisting Objects with Java Streams
Using Java Streams to Work with Memory
Summary

Terms in This Chapter

Abstract class
ASCII
autoexec.bat file
Buffering
Caching
Callable object
Canonical path
Chaining
Character/byte stream
Class hierarchy
Concrete class
Current directory
Debug utility
Design pattern
File descriptor
File path (relative/absolute)
Flag

Helper class
Interface
Java networking APIs
Parameter
Parent directory
Polymorphism
Path separator
Prefetching
Separator
Single inheritance
Slice
Stream (binary/text)
Source code
Source file
Token
Unicode

Streams are the Java programming language's way to support I/O. A stream can represent a file, a network connection, or the access of a Web site. Learning to deal with Java streams is essential for understanding Java's networking APIs.

Most of the time conversion to and from the Java type system is transparent. When it isn't, this chapter will demonstrate how to do low-level type conversion straightforwardly.

As If One Way Weren't Bad Enough

The joke is that there are always two ways of doing things in Jython: the Python way and the Java way. For example, if you use Python to prototype for Java applications, you need to know how Java does it. You also need Java streams to do various forms of Java I/O, such as networking APIs.

The Java Way of File Operations

Interfaces and classes for dealing with files and other I/O types are in the java.io package. An interface is a class that contains abstract methods. Classes in java.io form a class hierarchy.

The two main class types in java.io are text oriented (character streams) and binary oriented (byte streams). Subclasses of the Reader and Writer classes are text oriented; those of the InputStream and OutputStream classes are binary oriented.

InputStream and OutputStream are abstract; that is, they can't be instantiated directly. To use an abstract class you must subclass it and instantiate the subclass. The subclasses of InputStream and OutputStream allow the reading of binary data to and from various types of input and output such as byte arrays (memory), files, and even network sockets.

Streams can be chained to provide extra functionality. For example, you can buffer a FileInputStream by chaining it to a BufferedInputStream Then you can chain the BufferedInputStream to an ObjectInputStream to read in whole objects at one time. (This is similar to the pickle functionality in Python.)

Java 1.1's binary data input streams are complemented by somewhat equivalent text input streams. The parents of these classes are the abstract classes Reader and Writer. Having an equivalent set of text-oriented character stream classes allows the conversion of Unicode text. Readers and Writers, like streams, can be chained together. For example, you can buffer a FileReader by chaining it to a BufferedReader. (Buffering will be explained shortly.)

I/O Classes to Be Covered

There are more than thirty I/O classes, not including interfaces and abstract classes. This seems like a lot, but if you understand how Reader, Writer, InputStream, and OutputStream work, you can easily understand the rest.

Reader and Writer subclasses deal with character streams, that is, text. InputStream and OutputStream subclasses deal with binary streams. An easy way to remember this is, if you can read it, use Reader and Writer; if you can't, use InputStream and OutputStream.

For Beginners: Understanding Streams

Think of a stream as an abstract file. Just as you can read and write to a file, you can read and write to a stream. You might use a stream for reading and writing with an RS-232 serial connection, a TCP/IP connection, or a memory location (like a sequence). A stream is abstract, so if you know how to read and write to a file, you basically already know how to read and write to a memory location or an RS-232 serial connection or, for that matter, a Web site. This is the art and magic of polymorphism.

Text Streams

All of the text-oriented classes are derived from Reader and Writer. By understanding them, you'll have a handle on dealing with their descendents.

Writer

The Writer class writes character data to a stream. It has the following methods:

write(c) writes a single character to the text stream
write(cbuf) writes a sequence of characters to the text stream
write(cbuf, off, len) writes a sequence of characters to the text stream starting at the offset into the stream and copying to len in the buffer
write(str) writes out a string to the text stream
write(str, off, len) writes out a string from the given offset to the given length
close() closes the text stream
flush() flushes the content of the text stream; used if the stream supports BufferedOutput

Writer is the superclass of all character output streams, for example, FileWriter, BufferedWriter, CharArrayWriter, OutputStreamWriter, and PrintWriter.

Reader

The Reader class reads data from a stream. Reader is the superclass of all character input streams. It has the following methods:

read() reads in a single character
read(cbuf) reads in a sequence of characters
read(cbuf, iOff, iLen) reads from the current position in the file into the sequence at the given offset
ready() determines if the buffer has input data ready
mark(int readAheadLimit) sets the read-ahead buffer size
markSupported() returns true if the mark() method is supported
skip() skips a certain number of characters ahead in the text input stream
reset() moves the file pointer to the marked position
close() closes the file

I can't show you any examples of using Reader or Writer because they're abstract and can't be instantiated on their own. However, I can show examples of their subclasses, FileReader and FileWriter, which are concrete.

As an exercise, look up Reader and Writer in the Java API documentation, and compare their Python-friendly method definitions to the official Java versions. Notice the conversion from one type to another.

FileReader and FileWriter

FileReader and FileWriter read and write text files. They have the same methods their base classes have as well as the following constructor methods.

FileReader:

__init__(strFilename) opens the file specified by the string (strFilename)
__init__(File) opens the file specified by the file object
__init__(fd) opens the file specified by the file descriptor

FileWriter:

__init__(strFilename) opens the file specified by the string (strFilename)
__init__(strFilename, bAppend) same as above, but optionally opens the file in append mode
__init__(File) opens the file specified by the file object
__init__(fd) opens the file specified by the file descriptor

As an exercise, look up FileReader and FileWriter in the Java API documentation. Then modify the address book application from Chapter 8 to use these classes.

Creating a File with FileWriter

The following interactive session creates a file with FileWriter and writes some text to it. (Follow along.)

Import the FileWriter class from the java.io package.

>>> from java.io import FileWriter

Create an instance of it.

>>> fw = FileWriter("c:\\dat\\File.txt")

Write a single character to the output character stream.

>>> fw.write('w')

Write out a string.

>>> fw.write("\r\nthis is a string")

Write out a sequence of characters.

>>> characters = ('\r','\n','a','b','c') >>> fw.write(characters)

Close the file.

>>> fw.close()

Here's what the file we just created, called c:\dat\file.txt, looks like. You can open it with Notepad or any other text editor.

w this is a string abc

Finding the File Length

You can use a java.io file to see the length of c:\dat\file.txt. It should be 22 bytes.

>>> from java.io import File     >>> f = File("c:\\dat\\File.txt")     >>> file.length()     22L f

Don't worry too much about what the File class does; we'll cover it later. For now, just think of it as a way to specify a file and get its properties.

Reopen c:\dat\file.txt in append mode.

>>> fw = FileWriter('c:\\dat\\File.txt', 1)

Append some text.

>>> fw.write("\r\nAnother String appended to the File") >>> fw.close()

We open the file in append mode by passing a true (1) to the second parameter of the FileWriter constructor. This means that we want to append text. You can see from the following example (c:\dat\file.txt) that the text has been added.

w this is a string abc Another String appended to the File

If we now write to the file and don't open it in append mode, all of the old text will be deleted.

>>> fw = FileWriter(File) >>> fw.write("Oops where is all the other text") >>> fw.close() >>> f.length() 32L

Look at the file with a text editor. It should have text only in the last write statement.

By the way, the concepts just covered for working with FileWriter aren't much different from those for working with the Python file object.

Reading a File with FileReader

Now it's FileReader's turn with an interactive session. We'll open c:\dat\file\txt and read the text we wrote to it with FileWriter.

Import the FileReader class from the java.io package.

>>> from java.io import FileReader >>> File_Reader = FileReader("C:\\dat\\File.txt")

The tricky part is that the read() method expects you to pass it a chars array that it can fill in, but the closest thing in Python to a Java array is a sequence or list, and neither is close enough. So Jython has added a helper class, called PyArray, that passes primitive arrays to Java methods (see Chapter 11).

Import the zeros() function from the jarray module.

>>> from jarray import zeros

Use zeros() to create a PyArray.

>>> chars = zeros(4, 'c')

Print the chars array to the screen.

>>> chars array(['\x0', '\x0', '\x0', '\x0'], char)

The first argument to zeros() tells it how big we want our array. The second argument tells zeros() that we want our array to be of type char. Now zeros() creates an array in which all of the values are zeros. When we call File_Reader.read, the read() method fills in the empty slots with characters from the file.

Here's our call to the read() method, which should read the first four letters of the file.

>>> File_Reader.read(chars) 4

As you can see, chars contains the first four letters in our file, 'O', 'o', 'p', and 's'.

>>> chars array(['O', 'o', 'p', 's'], char)

This is nice but what we really want is a string object, 'Oops'. For this we need PyArray, which we can use anywhere a sequence is suitable, such as the Python class library.

The Python string module has methods for doing special things with strings. One method, joinfields(), combines strings in a sequence to form a string object. Let's use joinfields() to convert chars to a string.

Import the string module.

>>> import string

Use joinfields() to turn the chars array into a Python string.

>>> first_word = string.joinfields(chars, "")

Print the value of first_word to the screen.

>>> first_word 'Oops'

Notice that the second argument to joinfields() specifies the separator we want for the fields. Since we don't want any separator, we pass an empty string.

Let's recap what we've learned to read in the whole c:\dat\file.txt file at once.

>>> from jarray import zeros        # import the zeros to create an array of     primitives >>> from java.io import FileReader, File # import the File to get the length     of the file >>> from string import joinfields   # import the joinfields to convert arrays     to strings >>> file = File("c:\\dat\\file.txt")   # create a java.io.File instance >>> chars = zeros(file.length(), 'c')      # create an array to hold the     contents of file.txt >>> file_Reader = FileReader(file) # create a FileReader instance to read the     file >>> >>> file_Reader.read(chars)   # Read in the whole file     32

That took only eight steps. Let's see the same thing with the Python file object.

>>> python_file = open("c:\\dat\\file.txt","r")   # open the file >>> str = python_file.read()                 # read the contents of the file >>> print str Oops where is all the other text

What took eight steps with the java.io.FileWriter instance now takes two steps. Which one would you rather work with?

BufferedReader and BufferedWriter

The BufferedReader class provides input buffering to the Reader stream. The BufferedWriter class provides output buffering to the Writer stream. Input buffering consists of prefetching data and caching it in a memory queue so that, for example, not every call to the read() method equates to a read operation on the physical I/O. Output buffering applies writes to a memory image that is periodically written out to a character stream. You want buffering support for input and output streams, especially large ones, to gain speed and efficiency.

BufferedReader and BufferedWriter have all of the methods that Reader and Writer have, as well as the following:

BufferedWriter:

__init__ (Writer_in) creates a BufferedWriter instance with the specified output stream
__init__(Writer_in, iBufSize) same as above but specifies the size of the buffer
newLine() writes a newline character to the output stream

BufferedReader:

__init__ (Reader_in) creates a BufferedReader instance with the specified input stream
__init__(Reader_in, iBufSize) same as above but specifies the size of the buffer
readLine() reads in a line of text from the input stream

Using the Buffering Classes

Let's have a short interactive session showing how to use our buffering classes. We'll also cover the newLine() and readLine() methods.

Import the BufferedReader and BufferedWriter classes and the FileReader and FileWriter classes.

>>> from java.io import BufferedReader, BufferedWriter, FileReader, FileWriter

Create a FileWriter instance that creates a file called c:\dat\buf_file.txt.

>>> file_out = FileWriter("c:\\dat\\buf_file.txt")

Create a BufferedWriter instance, buffer_out, passing it as an argument to the BufferedWriter constructor.

>>> buffer_out = BufferedWriter(File_out)

Write three lines of text to the file using write() to write the characters and then newLine() to write the platform-specific newline characters.

>>> buffer_out.write("Line 1"); buffer_out.newLine() >>> buffer_out.write("Line 2"); buffer_out.newLine() >>> buffer_out.write("Line 3"); buffer_out.newLine()

Close the stream.

>>> buffer_out.close()

This code demonstrates chaining a BufferedWriter to a FileWriter, which adds buffering to file output.

The next session uses FileReader to open the file we created in the last example.

Create a FileReader instance, and pass it to the BufferedReader constructor to create a BufferedReader instance.

>>> file_in = BufferedReader(FileReader("c:\\dat\\buf_File.txt"))

Read in all three lines at once with three method calls to readLine().

>>> line1, line2, line3 = file_in.readLine(), file_in.readLine(),     file_in.readLine()

Print all three lines at once.

>>> print line1; print line2; print line3 Line 1 Line 2 Line 3

You may be wondering why there's a newLine() function in BufferedWriter. It's there because it knows how to represent a newline character on whatever operating system you happen to be writing your code for. (We used "\r\n" in our FileWriter example, which won't work on UNIX or Mac.)

PrintWriter

PrintWriter provides a print() and a println() function for all primitive types. These functions convert primitive types to characters and then write them to the output stream. PrintWriter has all of write() methods, as well as these:

print() writes out primitive data types to an output stream (as readable text)
println() same as above, but adds a newline character to the ouput

In the next example, you'll see how to use PrintWriter, how to chain output streams together, and how to work with Java primitive types.

Import the classes needed.

>>> from java.io import PrintWriter, FileWriter, BufferedWriter

Create a PrintWriter by passing it a new instance of BufferedWriter, which is created from a new instance of FileWriter.

>>> out = PrintWriter(BufferedWriter(FileWriter("pr.txt")))

Write three strings to a file using println().

>>> out.println("Line1") >>> out.println("Line2") >>> out.println("Line3")

Use println() to write a Python Double object and a Python Integer object to a file.

>>> out.println(4.4) >>> out.println(5)

Write a Java Boolean to the file.

>>> from java.lang import Boolean >>> out.println(Boolean(1)) >>> out.println(Boolean("true"))

Write a Java Byte to the file.

>>> from java.lang import Byte >>> out.println(Byte(1))

Show how print() works.

>>> out.print("Line 9") >>> out.print("still line 9") >>> out.close()

(It also works with primitive types.)

Now you know how to chain stream classes to add the functionality you want. The out instance, for example, can write to files from FileWriter, work with output buffering from BufferedWriter, and work with primitive types from PrintWriter. By the way, the technique of chaining Writer stream classes is known as the Decorator design pattern. To learn more about if read Design Patterns (Gamma et al., [1995]).

Binary Streams: InputStream and OutputStream

InputStream is the analog of Reader; OutputStream is the analog of Writer. Their methods are listed here.

InputStream:

read(byte_sequence) reads a sequence of bytes and returns the actual bytes read
read(byte_sequence, off, len) same as above but allows you to set the slice for the sequence; returns the number of actual bytes read
read() reads one byte of data from the input stream
skip(n) skips to a given offset in the file
close() closes the stream
reset() moves the file pointer to the marked position
mark(iReadlimit) marks the file pointer
available() returns the number of bytes available that can be read without blocking (similar to the Reader class's ready() method)

OutputStream:

close()
flush() flushes the buffer, that is, forces the output buffer to be written to the output stream
write(b) writes a single byte to the output stream
write(byte_sequence) writes a sequence of bytes to the output stream
write(byte_sequence, off, len) same as above, but specifies the slice

FileInputStream and FileOutputStream

FileInputStream and FileOutputStream extend InputStream and OutputStream, respectively. They are the analogs to File Reader and FileWriter. Here are their methods.

FileInputStream:

__init__(strFilename) opens the file specified by the string (strFilename)
__init__(File) opens the file specified by the file object
__init__(fd) opens the file specified by the file descriptor

FileOutputStream:

__init__(strFilename) opens the file specified by the string (strFilename)
__init__(strFilename, bAppend) same as above, but optionally opens the file in append mode
__init__(File) opens the file specified by the file object
__init__(fd) opens the file specified by the file descriptor

Reading and writing to a file with FileInputStream and FileOutputStream are a lot like reading and writing to a file with FileReader and FileWriter. That being the case, I'm omitting much of the detail in the following example (OutputStream1.py) to avoid repetition.

   # First we import the FileOutputStream class from the java.io package from java.io import FileOutputStream from jarray import array    # Next create an instance of the FileOutputStream out = FileOutputStream("c:\\dat\\file.bin")    # Now write a byte to the output binary stream out.write(1)    # Now write some more bytes to the binary stream out.write([2,3,4,5,6,7,8,9,10,11,12,13,14,15,16])    # Next write out a string #out.write("\r\nthis is a string") #Jython converts this automatically    # Here is the hard way str = "\r\nthis is a string"    # First convert every character in the string into its ASCII equivalent seq = map(ord, str)    # Now convert the sequence into a Java array of the primitive type byte bytes = array(seq, 'b')    # Now write out this string to the file as follows out.write(bytes) out.close()

The code above may seem familiar at first, but as you examine it you'll notice a couple of strange things. First, since write() works only with bytes and byte arrays, we have to use the jarray module again. Second, Jython converts a string into a byte array automatically, but if the class has two write() methods one that takes a string and one that takes a byte array we have to do some extra work to make sure that the correct method is called. Thus, we have to convert the string to a byte array if we want the byte array method.

Using the intrinsic functions ord() and map(), we'll see how to manually convert the string into a byte array before passing it to the write() method.

ord()

Remember from Chapter 9 that the ord() function takes a single character from a string and converts it to its ASCII equivalent (i.e., the number representation of the character stored in the file). Let's work with an interactive example.

Convert "A", "B", and "C" to ASCII.

>>> ord("A"), ord("B"), ord("C") (65, 66, 67)

Convert "a", "b", and "c" to ASCII.

>>> ord("a"), ord("b"), ord("c") (97, 98, 99)

Figure the distance from "a" to "z" and from "A" to "Z".

>>> ord("z") - ord("a") 25 >>> ord("Z") - ord("A") 25

map()

The built-in map() function deals with sequences, as we also learned in Chapter 9. It takes two arguments: first, a callable object such as a function or a method; second, a type of sequence like a string, list, or tuple. map() executes the callable object against every item in the sequence and then returns a list of the results. Thus,

>>> [ord("a"), ord("b"), ord("c"), ord("d")] [97, 98, 99, 100]

is the same as

>>> map(ord,"abcd") [97, 98, 99, 100]

Once we have a sequence, we can convert it into a byte array using the array() function from jarray (see the section on FileReader). array() takes two arguments: the first is a sequence and the second is a character representing the Java primitive type you want the array to be. (This is an extra step; Jython automatically converts an integer sequence into a byte array.)

This code creates an array full of bytes:

>>> from jarray import array >>> seq = map(ord, "abcd") >>> array(seq, 'b') array([97, 98, 99, 100], byte)

The Debug Utility

Of course, having both binary data and text data makes the file difficult to read. We can at least read the text part with a text editor, but besides the text there's only a lot of black boxes where our binary data should be. To see the binary contents of the file, we need another program.

If you're running some variation of Windows or DOS, you can use the debug utility, C:\dat.debug\file.bin. As shown below, the -d command dumps some of the file to the screen, and the -q command quits the program. You have to enter the commands the way you enter statements in Jython.

C:\dat>debug File.bin -d 0E7F:0100 01 02 03 04 05 06 07 08-09 0A 0B 0C 0D 0E 0F 10 ................ 0E7F:0110 0D 0A 74 68 69 73 20 69-73 20 61 20 73 74 72 69 ..this is a stri 0E7F:0120 6E 67 2B DE 59 03 CB 8B-D6 C6 06 BB DB 00 E3 31 ng+.Y..........1 0E7F:0130 49 AC E8 D9 F6 74 08 49-46 FE 06 BB DB EB EF E8 I....t.IF....... 0E7F:0140 DB F9 75 04 FE 06 17 D9-3C 3F 75 05 80 0E 1B D9 ..u.....<?u..... 0E7F:0150 02 3C 2A 75 05 80 0E 1B-D9 02 3A 06 02 D3 75 C9 .<*u......:...u. 0E7F:0160 4E 32 C0 86 04 46 3C 0D-75 02 88 04 89 36 D9 D7 N2...F<.u....6.. 0E7F:0170 89 0E D7 D7 C3 BE BC DB-8B 4C 05 8B 74 09 E8 08 .........L..t... -q

What you have is three main columns of data. To the far left is the position of the file in hexadecimal notation. In the middle is the value of the binary data, also in hexadecimal notation, and to the far right is the data's ASCII equivalent.

Hexadecimal

Hexadecimal is a base 16 number system. This list shows some decimal numbers and their hexadecimal equivalents:

Decimal	Hexadecimal
1	1
2	2
3	3
4	4
5	5
6	6
7	7
8	8
9	9
10	A
11	B
12	C
13	D
14	E
15	F

Here's the first line of binary data. Every two digits represent a single byte.

01 02 03 04 05 06 07 08-09 0A 0B 0C 0D 0E 0F 10

We wrote it the following calls:

out.write(1) out.write([2,3,4,5,6,7,8,9,10,11,12,13,14,15,16])

You can see that the bytes in the file increase from 1 to 16 in hexadecimal.

Let's see how to read the file back in by reading in the string.

Import FileInputStream, and then create an instance that refers to c:\dat\file.bin.

>>> from java.io import FileInputStream >>> file = FileInputStream("c:\\dat\\file.bin")

Import the jarray buffer to create binary input buffers. The 'b' flag, the second argument to the zeros() function call, signifies a byte array. We'll use input_buffer to read in the first 16 bytes of data.

Import the zeros() function from the jarray module.

>>> from jarray import zeros

Create a byte input buffer with zeros().

>>> input_buffer = zeros(16, 'b')

Read in the first 16 bytes to the buffer with the read() method, which returns the number of bytes read in.

>>> file.read(input_buffer) 16

Print the array of bytes buffered.

>>> input_buffer array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], byte)

Create a small buffer to read "\r\n".

>>> input_buffer = zeros (2,'b') >>> file.read(input_buffer) 2

Read in "this is a string" from the file.

>>> input_buffer = zeros(16,'b')      #read in the bytes for the string >>> file.read(input_buffer)     16

Since we read this in as bytes, the code shows the ASCII equivalents for the characters we want instead of the actual characters of the string. We need to convert this byte array into a string sequence.

>>> input_buffer array([116, 104, 105, 115, 32, 105, 115, 32, 97, 32, 115, 116, 114,     105, 110, 103], byte)

To do the conversion we create a sequence and iterate through the byte array. At the same time we append the results of executing the chr() built-in function against each byte array item. chr() is the reverse of ord(); it converts a number (ASCII code) into a character (i.e., a string with one item).

Lastly, we convert the sequence into a string with a call to string.joinfields().

>>> seq = []                       # create an empty sequence >>> for num in input_buffer:       # for every number in the input_buffer ...     seq.append(chr(num))       # append the results of chr(num) to the                                    # sequence ... >>> seq                            # show what the sequence contains     ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 's', 't', 'r', 'i',     'n', 'g'] >>> import string                    # import the string module >>> str = string.joinfields(seq, "") # create the string from the sequence >>> print str                        # print out the string

Of course, a more Pythonesque way to convert the byte array into a string is with the built-in map() function.

>>> str = string.joinfields(map(chr,input_buffer),"") >>> print str this is a string

map()can be very useful; it takes one line of code to accomplish what previously took four.

BufferedInputStream and BufferedOutputStream

BufferedInputStream and BufferedOutputStream provide the same kind of support to InputStream and OutputStream that BufferedReader and BufferedWriter provide to Reader and Writer: input and output buffering to their respective derivations. As I said before, input buffering entails prefetching data and caching it in memory so that read operations don't have to fetch the data off a slow device. Output buffering applies write operations to a memory image that's periodically written out to a stream.

BufferedInputStream and BufferedOutputStream have all of the methods that InputStream and OutputStream have plus the following:

BufferedInputStream:

__init__(InputStream) creates an input buffer for the specified input stream
__init__(InputStream, iBufSize) same as above, but specifies the size of the input buffer

BufferedOutputStream:

__init__(OutputStream) creates an output buffer for the specified output stream
__init__(OutputStream, iBufSize) same as above, but specifies the size of the output buffer

The mark(), markSupported(), reset(), skip(), and flush() methods provide the buffering support.

BufferedOutput can be chained to an existing stream, as the following example shows. Import BufferedOutputStream and FileOutputStream.

>>> from java.io import BufferedOutputStream, FileOutputStream

Create a BufferedOutputStream instance chained to a new FileOutputStream instance.

>>> out = BufferedOutputStream(FileOutputStream("bufout.bin"))

Write a sequence of bytes from 0 to 16 (0x0 to 0xF in hexadecimal).

>>> out.write([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])

Write a sequence of bytes from 17 to 31 (0x10 to 0x1F in hexadecimal).

>>> out.write([17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31])

Write 255 (0xFF hexadecimal) a few times.

>>> for index in xrange(0,16): ...     out.write(255) ...

Here's another way of writing out a string:

>>> str = "Hello" >>> for char in str: ...     out.write(ord(char)) >>> out.close()

The output for the file created looks like this (from C:\dat>debug bufout.bin):

-d 0E7F:0100 00 01 02 03 04 05 06 07-08 09 0A 0B 0C 0D 0E 0F ................ 0E7F:0110 10 11 12 13 14 15 16 17-18 19 1A 1B 1C 1D 1E 1F ................ 0E7F:0120 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................ 0E7F:0130 48 65 6C 6C 6F 00 00-00 00 00 00 00 00 00 00 00 .Hello..........

The first line has the hexadecimal numbers 0x0 to 0xF. The second line lists the numbers 0x10 to 0xlF. The third line contains the 255 (0xFF) we wrote out, and the fourth line contains the "Hello" string we wrote out.

Using BufferedInputStream

Now it's time for a small example of BufferedInputStream. This one demonstrates the mark(), skip(), and reset() methods.

Import the classes needed, and create the BufferedInputStream instance, chaining it to a FileInputStream instance.

>>> from java.io import BufferedInputStream, FileInputStream >>> input = BufferedInputStream(FileInputStream("bufout.bin"))

Mark the position in the file we want to return to.

>>> input.mark(100)

Read in first three bytes. (Remember, we set the first three bytes to 0x0, 0x1, and 0x2 1, 2, and 3 in decimal in the last example.)

>>> input.read(), input.read(), input.read() (0, 1, 2, 3)

Use the skip() method to skip ahead twenty bytes in the file.

>>> input.skip(20) 20L

The reset() method sets the file back to the position marked with the mark() method, which happens to be at the beginning. A reading of the first three bytes proves this.

>>> input.reset() >>> input.read(), input.read(), input.read() (0, 1, 2)

Reset the file at the beginning. Use the zeros() function to create byte arrays to read in the first three lines. (The lines aren't real, like lines of text, but refer to the way the hexdump program that is, the debug program displays the data.) Then read in the first three lines and display them.

>>> input.reset()             # reset to the beginning of the file. >>> from jarray import zeros  # import the zeros function >>> line1 = zeros(16, 'b')    # buffer to read in first 16 bytes >>> line2 = zeros(16, 'b')    # buffer to read in next 16 bytes >>> line3 = zeros(16, 'b')    # buffer to read in the third 16 >>> input.read(line1)         # read in line 1 16 >>> input.read(line2), input.read(line3)   #read in line 2 and 3 (16, 16) >>> line1 #display line1 array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], byte) >>> line2       #display line2 array([16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31], byte) >>> line3 #display line3 array([-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1], byte)

Compare the output to the hexdump we did with the debug program.

Now let's read in the "Hello" string. This repeats the FileInputStream exercise, so we won't cover it in the same detail as before.

Read in the bytes from the "Hello" string. (A better way would have been to create a Java array of bytes with jarray, but I wanted to show this way.)

>>> binstr=input.read(),input.read(),input.read(),input.read(),input.read() >>> binstr      #display it (72, 101, 108, 108, 111)

Convert the byte array to a sequence of characters.

>>> char_seq = [] >>> for num in binstr: ...     char_seq.append(chr(num)) ...

Convert the sequence of characters to a string.

>>> char_seq ['H', 'e', 'l', 'l', 'o'] >>> import string >>> str = string.joinfields(char_seq, "") >>> print str Hello

DataInput and DataOutput

DataInput and DataOutput read and write primitive Java types from a stream. Since they're interfaces, all of their methods, which follow, are abstract and so do nothing until they're instantiated.

DataInput:

readBoolean()
readByte()
readFloat()
readChar()
readDouble()
readInt()
readShort()
readLong()
readLine()
readUTF()
readUnsignedByte()
readUnsignedShort()
readFully(byte_sequence)
readFully(byte_sequence, off, len)
skipBytes(n)

DataOutput:

write (b)
write(byte_sequence)
writeBoolean(boolean)
writeByte(byte)
writeBytes(string)
writeChar(char)
writeChars(string)
writeDouble(double)
writeFloat(float)
writeInt(int)
writeLong(long)
writeShort(short)
writeUTF()

DataInputStream and DataOutput Stream

The data stream classes, DataInputStream and DataOutputStream, implement the DataInput and DataOutput interfaces, respectively, reading and writing Java primitive types to a stream. Here are their methods.

DataInputStream:

__init__ (InputStream) creates a buffered writer with the specified writer output stream

DataOutputStream:

__init__ (InputStream) creates a buffered reader with the specified reader input stream

Using DataInputStream

To demonstrate DataInputStream, we'll read in the file we wrote with DataOutputStream in the last interactive session.

Import the classes needed.

>>> from java.io import DataInputStream, FileInputStream

Create a DataInputStream instance that's chained to a new instance of FileInputStream.

>>> data_in = DataInputStream(FileInputStream("data_out.bin"))

Read in a Boolean from the stream.

>>> data_in.readBoolean() 1

Read in a Byte from the stream.

>>> data_in.readByte() 1

Read in a Char from the stream.

>>> data_in.readChar() '\x1'

Read in an Int from the stream.

>>> data_in.readInt() 1

Read in a Long and a Short from the stream.

>>> data_in.readLong(), data_in.readShort() (1L, 1)

Close the stream.

>>>data_in.close()

Using DataOutputStream

Though not very creative, the following example shows how to write the Java primitive types using DataOutputStream:

>>> from java.io import DataOutputStream, FileOutputStream >>> data_out = DataOutputStream(FileOutputStream("data_out.bin")) >>> data_out.writeBoolean(1) >>> data_out.writeByte(1) >>> data_out.writeChar(1) >>> data_out.writeInt(1) >>> data_out.writeLong(1) >>> data_out.writeShort(1) >>> data_out.close()

Here's a hexdump listing of data_out.bin:

0E7F:0100 01 01 00 01 00 00 00 01-00 00 00 00 00 00 00 01 ................ 0E7F:0110 00 01 FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................

readUTF() and writeUTF()

Two methods that make life easier are readUTF() and writeUTF(), which read and write strings in Unicode (UTF-8 transformation format).

Here's an example of writing two strings:

>>> from java.io import * >>> out = DataOutputStream(FileOutputStream("strs.bin")) >>> out.writeUTF("Hello world") >>> out.writeUTF(" Hello earth") >>> out.close()

Here's an example of reading two strings:

>>> input = DataInputStream(FileInputStream("strs.bin")) >>> input.readUTF() 'Hello world' >>> input.readUTF() ' Hello earth' >>> input.close()

As you can see, it is a lot easier to write strings with this technique than with the others we had to use without DataInputStream and DataOutputStream.

The File Class

The Java File class is nothing like the Python file object, in spite of the name. It allows you to check the following attributes of a file:

Read
Write
Size
Last modification date
Directory or not a directory

It also provides the methods listed below to view a directory's contents, to create and delete directories, and to delete directory files:

canRead() determines permission and access to read the file
canWrite() determines read-only or read/write access
delete() deletes the current file referenced by this file object
equals(File) determines if the file referenced by this object is the same as the argument
exists() determines if the file exists
getPath() gets the file path (can be relative)
getAbsolutePath() same as above, but resolves relative to absolute paths
getCanonicalPath()
getName() gets the file name
getParent() gets the parent directory
isAbsolute()
isDirectory() determines if the file is a directory
isFile() determines if the file is a regular file
lastModified() gets the date the file was last modified
length() gets the length of the file
list() lists the files in the directory
list(filter) lists the files in the directory with a filter
mkdir(dir) creates a directory
mkdirs() creates several directories
renameTo() renames the file
toString() returns the string equivalent of the file (toString is inherited from Object)

File Interactive Session

Let's look at an example of File. Import the File class from the java.io package.

>>> from java.io import File

Create an instance of the file object that refers to the properties of the file (c:\\dat\\File.txt).

>>> file = File("c:\\dat\\File.txt")

Determine if the file is read-only by calling canWrite(). If canWrite() returns 1 (true), the file is read/write. (If you change the properties of the file to read-only, canWrite() returns 0, that is, false.)

>>> file.canWrite() 1

Determine if the file exists. If not, the exists() function returns 0.

>>> file.exists() 1

Get just the file name without the path.

>>> file.getName() 'File.txt'

Get the file path.

>>> file.getPath() 'c:\\dat\\File.txt'

Get the path of the parent directory.

>>> file.getParent() 'c:\\dat'

In the following code, we'll see if the file is a directory. Then we'll create a File instance that refers to the file's parent directory and test if the directory is actually a directory.

Is the file a directory? (1 means yes; 0 means no)

>>> file.isDirectory() 0

Create a File instance that refers to the file's parent directory, and then see if the directory is a directory.

>>> directory = file(File.getParent()) >>> directory.isDirectory() 1

Is the file a file?

>>> file.isFile() 1

Is the directory a file?

>>> directory.isFile() 0

The lastModified() method returns the time, as a Long, which refers to the milli-seconds elapsed since Jan. 1, 1970.

>>> file.lastModified() 934294386000L

You can convert the Long to a date using the following technique. Import the Date class, passing its last modified value to its constructor.

>>> from java.util import Date >>> date = Date(File.lastModified()) >>> print date Tue Aug 10 07:13:06 PDT 1999

Get the length of the file.

>>> file.length() 63L

Is the class an absolute path (c:\dat\text.tx) or a relative path (..\..\text.txt)?

>>> file.isAbsolute() 1

Delete the file and check to see if it exists; get its length.

>>> file.delete() 1 >>> file.exists()         # File no longer exist 0 >>> file.length()         # Thus its length is zero 0L

Create a new File instance that refers to the relative location of autoexec.bat.

>>> file = File ("..\\..\\..\\autoexec.bat")

Check to see if the file exists.

>>> file.exists() 1

Get the path (note that it's relative).

>>> file.getPath() '..\\..\\..\\autoexec.bat'

Get the absolute path (this looks weird).

>>> file.getAbsolutePath() 'C:\\book\\..\\..\\..\\autoexec.bat'

Get the canonical path (this looks better).

>>> file.getCanonicalPath() 'C:\\AUTOEXEC.BAT'

Path Separators: Sometimes \\, Sometimes/

The path separator can vary depending on what operating system you're using, so it's not a good idea to hardcode it. On Windows machines, the path separator is \, but on Unix it's /.

Given this situation, the proper way to create the directory string is

>>> new_directory = directory.getCanonicalPath() + File.separator \ ...                 + "MyNewDir"

not

>>> new_directory = directory.getCanonicalPath() \                     + "\\MyNewDir"

File class instances can work with directories as well. Create a relative directory that points to the current directory.

>>> directory = File(".") >>> directory.isAbsolute() 0

List the files in the current directory (an array of Java strings is returned).

>>> directory.list() array(['TOC2.txt', 'readme.txt', 'Silver', 'status.xls', 'TOC.txt', 'chap9', 'chap5', 'chap3', 'chap2', 'chap10', 'chap1', '~WRL1019.tmp', '~WRL0244.tmp', '~WRL2798.tmp', '~WRL2319.tmp', 'Code Samples', 'chap6', 'scripts', 'Gold'], java.lang.String)

Let's create a new directory called MyNewDir. First create the directory string.

>>> new_directory = directory.getCanonicalPath() + "\\MyNewDir"

Show it.

>>> print new_directory C:\book\MyNewDir

Create an instance of the directory that will be created.

>>> newDir = File(new_directory)

See if the directory to which the instance refers already exists.

>>> newDir.exists() 0

Create the directory with the mkdir() method.

>>> newDir.mkdir() 1

See if it exists (it should).

>>> newDir.exists() 1

The RandomAccessFile Class

RandomAccessFile both reads and writes to binary output streams. It's most similar to the Python file object and has the following methods:

__init__(strName, strMode) opens the file in the specified mode
__init__(File, strMode) same as above, but passes a Java file object
getFD() gets the file descriptor
getFilePointer() similar to the tell() method for the Python file object
length() gets the length of the file
seek() similar to the seek() method for the Python file object
read(byte_sequence) similar to the read() method in InputStream
read(byte_sequence, off, len) similar to the read() method in InputStream

The following abstract methods are from DataInput:

readBoolean()
readByte()
readFloat()
readChar()
readDouble()
readInt()
readShort()
readLong()
readLine()
readUTF()
readUnsignedByte()
readUnsignedShort()
readFully(byte_sequence)
readFully(byte_sequence, off, len)
skipBytes(n)
write (b)

These abstract methods are from DataOutput:

write(byte_sequence)
writeBoolean(boolean)
writeByte(byte)
writeBytes(string)
writeChar(char)
writeChars(string)
writeDouble(double)
writeFloat(float)
writeInt(int)
writeLong(long)
writeShort(short)
writeUTF(string)

Advanced Note: The Shortcoming of Single Inheritance

Instead of RandomAccessFile inheriting from both InputStream and OutputStream, it inherits from DataInput and DataOuput (which are interfaces). Of course, it could have inherited from these stream classes if they were interfaces, but because RandomAccessFile doesn't support them, it can't chain to them.

A while back I wanted a random access file that worked with ObjectOutputStream and ObjectInputStream. To get it I had to write my own class that extended from ObjectOutput and ObjectInput, and re-implement object streaming. This is a situation where the single inheritance model of Java falls short.

RandomAccessFile Modes

The RandomAccessFile modes are r for read only, which is similar to Python's r, and rw for read/write, which is similar to Python's r+. Working with RandomAccessFile isn't much different from working with DataInputStream and DataOutputStream. It's also not much different from working with the Python file object, as I said before. Since there are no truly new concepts here, this next interactive session is a short one. (You still have to follow along, though.)

Import RandomAccessFile from java.io.

>>> from java.io import RandomAccessFile

Create an instance of it in read/write mode.

>>> file = RandomAccessFile("c:\\dat\\rFile.bin", "rw")

Create an array that can be written to the file; write it to the file.

>>> import jarray >>> byte_array = jarray.array((0,1,2,3,4,5,6,7,8,9,10), 'b') >>> print byte_array array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], byte) >>> file.write(byte_array)

Write some strings to the file.

>>> file.writeUTF("Hello") >>> file.writeChars(" how are you") >>> file.writeBytes(" fine thanks")

Get the location in the file.

>>> file.getFilePointer() 54L >>> location = file.getFilePointer()    # Save it for later

Move to the start of the file using the seek() method.

>>> file.seek(0)

Create an empty array ( buffer), and read in the values from byte_array (written a few steps back). Notice that buffer's values are the same as byte_array's values after the file.read() function call.

>>> buffer = jarray.zeros(11, 'b') >>> file.read(buffer) 11 >>> print buffer array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], byte)

Move to the saved location; show that it's the same place in the file.

>>> file.seek(location) >>> file.getFilePointer() 54L

Write the buffer to the file and show its current location.

>>> buffer[10] = 0xa >>> file.write(buffer) >>> file.getFilePointer() 65L >>> file.close()

Unicode

Unicode is a standard for working with character sets from all common languages. It's essential for internationalizing software.

Are you the least bit curious about the differences between writeUTF(), writeChars(), and writeBytes()? Here they are:

>>> file.writeUTF("Hello") >>> file.writeChars(" how are you") >>> file.writeBytes(" fine thanks")

In essence, writeChars() writes the Unicode representation of the characters to the file; writeUTF() and writeBytes() write out the ASCII equivalents. (More precisely, writeUTF() writes in a Java-modified UTF-8 format, which you can read about in the Java API documentation under the DataInput interface. In the previous interactive example, UTF-8 equated to ASCII.)

The Unicode data in this example is represented by two bytes, whereas the writeBytes() and writeUTF() data is represented by one byte. Fire up a hexdump utility, and see for yourself (this is a hexdump of the rfile.bin file):

C:\dat>debug rFile.bin -d 1876:0100 00 01 02 03 04 05 06 07-08 09 0A 00 05 48 65 6C .............Hel 1876:0110 6C 6F 00 20 00 68 00 6F-00 77 00 20 00 61 00 72 lo. .h.o.w. .a.r 1876:0120 00 65 00 20 00 79 00 6F-00 75 20 66 69 6E 65 20 .e. .y.o.u fine 1876:0130 74 68 61 6E 6B 73 00 01-02 03 04 05 06 07 08 09 thanks.......... 1876:0140 0A 02 D3 74 0A 41 3C 22-75 E6 80 F7 20 EB E1 5E ...t.A<"u... ..^ 1876:0150 58 C3 A1 D7 D7 8B 36 D9-D7 C6 06 1B D9 00 C6 06 X.....6......... 1876:0160 17 D9 00 8B 36 D9 D7 8B-0E D7 D7 8B D6 E3 42 51 ....6.........BQ 1876:0170 56 5B 2B DE 59 03 CB 8B-D6 C6 06 BB DB 00 E3 31 V[+.Y..........1

Notice that the how are you in the right column, which was written with writeChars(), has a period (".") between each character, while the Hello and the fine thanks don't. This shows that how are you is using 2-byte Unicode.

The StreamTokenizer Class

StreamTokenizer breaks up an input stream into tokens and can be used to parse a simple file (excuse me, "input stream"). Read the Java API documentation on StreamTokenizer, and then compare what you read to the following methods

__init__(Reader)
__init__(InputStream)
nextToken() returns the next token in the stream
lineno() returns the current line number
lowerCaseMode(flag) returns all words to lowercase if passed a true value
parseNumbers() sets the parsing of floating-point numbers
pushBack() pushes the token back onto the stream, returning it to the next nextToken() method call
quoteChar(char) specifies the character string delimiter; the whole string is returned as a token in sval
resetSyntax() sets all characters to ordinary so that they aren't ignored as tokens
commentChar(char) specifies a character that begins a comment that lasts until the end of the line; characters in a comment are not returned
slashSlashComments(flag) allows recognition of // to denote a comment (this is a Java comment)
slashStarComments(flag) allows recognition of /* */ to denote a comment
toString()
whitespaceChars(low,hi) specifies the range of characters that denote delimiters
wordChars(low, hi) specifies the range of characters that make up words
ordinaryChar(char) specifies a character that is never part of a token (the character should be returned as is)
ordinaryChars(low, hi) specifies a range of characters that are never part of a token (the character should be returned as is)
eolSignificant(flag) specifies if end-of-line (EOL) characters are significant (they're ignored if not, i.e., treated like whitespace)

StreamTokenizer's variables are ttype (one of the constant values TT_EOF, TT_EOL, TT_NUMBER, and TT_WORD); sval (contains the token of the last string read); and nval (contains the token of the last number read).

Using StreamTokenizer

Reading the documentation probably isn't enough to get you started with StreamTokenizer, so we're going to work with a simple application that produces a report on the number of classes and functions in a Python source file. Here's the source code:

class MyClass:      #This is my class        def method1(self):              pass        def method2(self):              pass #Comment should be ignored def AFunction():       pass class SecondClass:        def m1(self):              print "Hi Mom"       #Say hi to mom        def m2(self):              print "Hi Son"       #Say hi to Son #Comment should be ignored def BFunction():        pass

Follow along with the next interactive session. Afterward we'll look at the code to count the classes and functions.

Import the FileInputStream class from java.io, and create an instance of it.

>>> from java.io import FileInputStream >>> file = FileInputStream("C:\\dat\\ParseMe.py")

Import the StreamTokenizer class, and create an instance of it. Pass its constructor the FileInputStream instance.

>>> from java.io import StreamTokenizer >>> token = StreamTokenizer(File)

Call nextToken() to get the first token in the file (that is, class).

>>> token.nextToken() -3

As you can see, nextToken() returns a numeric value, although you may have been expecting a string value containing "class". In fact, nextToken() returns the type of token, that is, a word, a number, or an EOL or EOF (end-of-file) character, so -3 refers to TT-WORD.

The ttype variable holds the last type of token read.

>>> token.ttype -3

The sval variable holds the actual last token read. If we want to check if the last token type was a word, we can write this, and, if it was a word, we can print it out.

>>> if token.ttype == token.TT_WORD: ...     print token.sval ... class >>>

Call nextToken() again to get the next token, which is MyClass.

>>> token.nextToken() -3 >>> print token.sval MyClass

Call nextToken() again; this time it should return the '#' token.

>>> token.nextToken() 58 >>> print token.sval None

Since the token is a ':' StreamTokenizer doesn't recognize it as valid. The only valid types are NUMBER, EOL, EOF, and WORD. So for ':' to be recognized, it has to be registered with the wordChars() method.

>>> token.TT_NUMBER -2 >>> token.TT_EOL 10 >>> token.TT_EOF -1 >>> token.TT_WORD -3

If the type isn't one of these, the number corresponding to the character encoding is returned. Let's see what nextToken() returns for the next character.

>>> token.nextToken() 35

The 35 refers to '#', which you can prove with the built-in ord() function.

>>> ord('#') 35

Get the next token.

>>> token.nextToken() -3

The token is a word (-3 equates to TT_WORD). Print sval to find out what the word is.

>>> print token.sval This

As you can see, the StreamTokenizer instance is reading text out of the comment from the first line. We want to ignore comments, so we need to return the tokens we took out back into the stream.

Push the token back into the stream.

>>> token.pushBack()

Attempt to push the token before the last one back into the stream.

>>> token.pushBack()

Set commentChar() to ignore '#'. (commentChar() takes an integer argument corresponding to the encoding of the character.)

>>> token.commentChar(ord('#'))

Get the next token, and print it out.

>>> token.nextToken() -3 >>> print token.sval This

Are you wondering why we still have the comment text? The pushback() method can only push back the last token, so calling it more than once won't do any good. Let's start from the beginning, creating a new FileInputStream instance and a new StreamTokenizer instance.

Create the StreamTokenizer instance by passing its constructor a new instance of FileInputStream.

>>> file = fileInputStream("c:\\dat\\parseMe.py") >>> token = StreamTokenizer(File)

Iterate through the source code, printing out the words in the file. Quit the while loop when the token type is EOF.

>>> while token.ttype != token.TT_EOF: ...     token.nextToken() ...     if(token.ttype == token.TT_WORD): ...             print token.sval

Notice that the comment text isn't in the words printed out.

class MyClass def method1 self pass def method2 self pass def AFunction pass ... ...

Parsing Python with StreamTokenizer

Okay, we've done our experimentation. Now it's time for the actual code for counting the classes and functions in our Python source code.

from java.io import FileInputStream, StreamTokenizer          # Create a stream tokenizer by passing a new          # instance of the FileInputStream token = StreamTokenizer(FileInputStream("c:\\dat\\parseMe.py"))          # Set the comment character. token.commentChar(ord('#')) classList = [] functionList = []          # Add an element to a list def addToList(theList, token):   token.nextToken()   if (token.ttype == token.TT_WORD):         theList.append (token.sval)          # Adds a class to the class list def parseClass(token):   global classList   addToList (classList, token)          # Adds a function to the function list def parseFunction(token):   global functionList   addToList (functionList, token)          # Iterate through the list until the          # token is of type TT_EOF, end of File while token.ttype != token.TT_EOF:   token.nextToken()   if(token.ttype == token.TT_WORD):          if (token.sval == "class"):                parseClass(token)          elif(token.sval == "def"):                parseFunction(token)          # Print out detail about a function or class list def printList(theList, type):   print "There are " + `len(theList)` + " " + type   print theList          # Print the results. printList (classList, "classes") printList (functionList, "functions and methods")

Here's the output:

There are 2 classes ['MyClass', 'SecondClass'] There are 6 functions and methods ['method1', 'method2', 'AFunction', 'm1', 'm2', 'BFunction']

The main part of the code (where all the action is happening) is

          # Iterate through the list until the           # token is of type TT_EOF, end of File while token.ttype != token.TT_EOF:   token.nextToken()   if(token.ttype == token.TT_WORD):           if (token.sval == "class"):                 parseClass(token)           elif(token.sval == "def"):                 parseFunction(token)

Let's look at it step by step.

If the token type isn't equal to EOF, get the next token.

while token.ttype != token.TT_EOF:       token.nextToken()

If the token type is WORD,

if(token.ttype == token.TT_WORD):

check to see if the token is a class modifier. If it is, call the parseClass() function, which uses the StreamTokenizer instance to extract the class name and put it on a list.

if (token.sval == "class"):       parseClass(token)

If the token isn't a class modifier, check to see if it's a function modifier. If so, call parseFunction(), which uses StreamTokenizer to extract the function name and put it on a list.

elif(token.sval == "def"):        parseFunction(token)

StreamTokenizer is a good way to parse an input stream. If you understand its runtime behavior (which you should from the preceding interactive session), you'll be more likely to use it.

The more astute among you probably noticed that functions and methods were counted together in the last example. As an exercise, change the code so that each class has an associated list of methods and so that these methods are counted separately.

Hint: You'll need to use the resetSyntax() method of StreamTokenizer to set all characters to ordinary. Then you'll need to count the spaces (ord(")) and tabs (ord("\t")) that occur before the first word on a line. For this you also need to track whether you hit an EOL token type. (If you can do this exercise, I do believe that you can do any exercise in the book.)

As another exercise, create a stream that can parse a file whose contents look like this:

[SectionType:SectionName] value1=1 value2 = 3       #This is a comment that should be ignored value4 = "Hello"

SectionType defines a class of section, and SectionName is like defining a class instance. value equates to a class attribute.

Here's an example.

[Communication:Host] type = "TCP/IP" #Possible values are TCP/IP or RS-232 port = 978          #Sets the port of the TCP/IP [Communication:Client] type = "RS-232" baudrate = 9600 baudrate = 2800 baudrate = 19200 [Greeting:Client] sayHello = "Good morning Mr. Bond" [Greeting:Host] sayHello = "Good morning sir"

Create a dictionary of dictionaries of dictionaries. The name of the top-level dictionary should correspond to the section type (Communication, Greeting); its value should be a dictionary whose name corresponds to the section names (Client, Host) and whose values correspond to another dictionary. The names and values of the third-level dictionaries should correspond to the name values in the file (sayHello = "Good morning Mr. Bond", type = "RS-232"). If, like baudrate, the name repeats itself, you should create a list corresponding to the name baudrate and, instead of a single value inserted in the bottom-tier dictionaries, put the list as the value.

The structure will look like this:

{} Communication {}  Client {}  type = "rs-232"   |                |            |   |                |            - baudrate = [9600, 2800, 19200]   |                |   |                |  Host {}   type = "TCP/IP"   |                         |   |                         - port = 978   | Greeting  -----{} - Client{} - sayHello = "Good morning Mr. Bond"                    |                    |  Host   sayHello = "Good morning sir"

Persisting Objects with Java Streams

ObjectOutputStream writes out Java class instances (objects) to an output stream. It accomplishes for Java what the pickle module does for Python. Only Java instances that have the Serializable class (interface) as a base class (interface) can be serialized with ObjectOutputStream. All Jython objects (class instances, functions, dictionaries, lists) implement Serializable.

Here's a short example. Import ObjectOutputStream and FileOutputStream from the java.io package.

>>> from java.io import ObjectOutputStream, FileOutputStream

Create an instance of ObjectOutputStream, passing the constructor a new instance of FileOutputStream.

>>> oos = ObjectOutputStream(FileOutputStream("c:\\dat\\out.bin"))

Define a simple class.

>>> class MyClass: ...     def __init__(self): ...             self.a = "a" ...

Create an instance of the class.

>>> object = MyClass()

Write the instance to the output stream with the writeObject() method.

>>> oos.writeObject(object) >>> oos.close()     #From here

Now we can use ObjectInputStream to read the object back. Import ObjectInputStream and FileInputStream from package java.io.

>>> from java.io import ObjectInputStream, FileInputStream

Create an instance of ObjectInputStream.

>>> ois = ObjectInputStream(FileInputStream("c:\\dat\\out.bin"))

Read the object from the stream.

>>> object2 = ois.readObject()

Show that the attribute of object2 is the same as the attribute of object but that object and object2 aren't the same.

>>> print "The a attribute of object 2 is " + object2.a >>> print "Are object and object2 the same? " + `(object is object2)`

As I said, object streams function a lot like the pickle module.

As an exercise, modify the address book application from Chapter 8 to use object streams instead of pickle.

Using Java Streams to Work with Memory

We spoke earlier about streams being abstract metaphors for files or, more precisely, any type of input/output mechanism. With streams, for example, you can write data to a byte array, which is essentially a location in the computer's memory. ByteArrayInputStream and ByteArrayOutputStream allow reading and writing to and from memory. You might want to look them up in the Java API documentation.

Instead of passing a file stream to a stream derivation, you can pass a byte array stream. Here's the earlier example showing this (the differences are in bold).

from java.io import ObjectOutputStream, ByteArrayOutputStream bytes = ByteArrayOutputStream() oos = ObjectOutputStream(bytes) class MyClass:    def __init__(self):           self.a = "a" object = MyClass() oos.writeObject(object) oos.close() print "Here is the binary image on a Jython class instance" print bytes.toByteArray() from java.io import ObjectInputStream, ByteArrayInputStream ois = ObjectInputStream(ByteArrayInputStream(bytes.toByteArray())) object2 = ois.readObject() print "The a attribute of object 2 is " + object2.a print "Are object and object2 the same? " + `(object is object2)`

Note that, to create the ByteArrayInputStream instance, I passed the byte array returned by the toByteArray() method of the ByteArrayOutputStream instance. In a later chapter, I'll show you how to work with network streams.

As an exercise, read the Java API documentation on CharArrayReader and CharArrayWriter, and write a simple program that reads and writes text strings to and from a memory location.

Summary

Streams are Java's way to support I/O. They can represent a file, a network connection, or access to a Web site. In this chapter, we dealt mostly with file streams because they're easy to work with and demonstrate stream fundamentals. Learning to deal with Java streams is essential for learning to use the Java APIs.

We covered some nonstream classes RandomAccessFile, File, and StreamTokenizer. As we saw in the examples, RandomAccessFile works most like the Python file object. The Java File class allows access to a file's attributes: Is the file read-only? Is it a directory? and so forth. It also allows the creation of directories. The StreamTokenizer class works with any text stream (a class derived from Reader) or binary stream (a class derived from InputStream).

Streams work with more than just files. We demonstrated the ByteArrayInputStream and the ByteArrayOutputStream classes, which allow reading and writing to memory buffers.

CONTENTS