Performance | Practical Java Game Programming (Charles River Media Game Development)

In this section we will compare the performance of Java IO and NIO with native IO and consider different methods. We will see how to use direct byte buffers to achieve file IO just as fast as the fastest native file IO with benchmarks to support the claim. We will also see that to achieve fast file IO you must be careful and not make many assumptions. In fact, we will show that using NIO without making appropriate considerations can result in much slower IO than the regular IO package, and that using the standard IO package can result in very reasonable performance.

When a Java application reads a sequence of bytes from a file, the data retrieved from the file can incur an additional copy, which would not be necessary if a native application read the same data from the file. In fact, this is true for any form of IO. For example, if a chunk of data is passed from the Java heap to the sound card, written to a file, or sent to the video card, an extra copying operation has to be performed. This extra copying can have an adverse effect on the performance of the game, depending on how much data is involved and how often it needs to be retrieved from or sent to a device.

When the OS retrieves the data from the file but before it passes it to the application, the data is copied to a temporary buffer in the system memory and then copied from the system memory to the Java heap. This extra step is because the Java heap is managed by the garbage collector and should not be written to directly. On the other hand, when a native application reads a sequence of bytes, the data can be made available without the use of a temporary buffer (see Figure 5.11).

image from book
Figure 5.11: The flow of data when reading from a file in Java versus in a native application.

The following code segment shows how to duplicate a file in Java and the equivalent native code that we will use for our benchmarks.

On the CD The following code can be found on the book’s companion CD-ROM in Code/Chapter 5/ IO Performance Test:

public void test(String inputFilename, String outputFilename){     File inFile = new File(inputFilename);     File outFile = new File(outputFilename);     FileInputStream fis = new FileInputStream(inFile);     FileOutputStream fos = new FileOutputStream(outFile);       byte[] bytes = new byte[bufferSize];       while (true){         int count = fis.read(bytes);           if (count <= 0)             break;         fos.write(bytes, 0, count);     } } void nativeTest(char* inputFilename, char* outputFilename){     FILE *streamIn = fopen( inputFilename, "rb" );     FILE *streamOut = fopen( outputFilename, "wb" );     // disable stdio buffer     setvbuf( streamIn, NULL, _IONBF, 0);     setvbuf( streamOut, NULL, _IONBF, 0);     bytes = (unsigned char*)malloc(bufferSize);     while(true){         readCount = fread(bytes, 1, bufferSize, streamIn);         if(readCount <= 0)             break;         fwrite(bytes, 1, readCount, streamOut);     }     fclose( streamIn );     fclose( streamOut );     free( bytes ); }

Java FileInputStream and FileOutputStream have a few private native methods that provide functionality for opening, reading, writing, and closing a file. They actually store a file handle internally that is used by these native methods, and call fread, fwrite, fopen, and fclose.

The native equivalent uses the functionality provided by stdio.h. Note that stdio by default creates a buffer for each stream. Here we explicitly disable the stdio buffering. The codes are otherwise similar. Table 5.1 shows the performance of the code segments with different sizes passed to their read/fread function. The tests were performed on a file over 16MB, and the timings were collected using a high-resolution timer. The Java tests were performed carefully to ensure that no housekeeping was performed while the tests ran. In addition, the VM was allowed to warm up and perform necessary compilation.

On the CD Even though some of the tests were performed sequentially in the same instance of the VM, care was taken to ensure that one test did not affect another. For additional details, you may view the test application provided on the companion CD-ROM at Code/Chapter 5/ IO Performance Test. Table 5.1 shows the performance of Java file IO using file input and output streams against the equivalent C code that was presented earlier. The data in the table has been normalized with respect to the optimal buffer size for the Java test.

Table 5.1 : JAVA FILE IO VERSUS STANDARD C FILE IO
Buffer Size	Java	Native
1B	80530%	65880%
1K	280%	230%
4K	180%	150%
8K	125%	110%
16K	100%	78%
32K	168%	60%
64K	158%	59%
128K	183%	64%
File Size	750%	600%

The best buffer size for the Java method was 16K, and 64K for the native function. Note that 16K buffer size did allow Java to run quite reasonably. In fact, it performed better than the native function that used an 8K buffer size, and it was only about 40 percent slower than the fastest native benchmark. The performance difference is because the Java code results in the invocation of native methods by the VM, which have some additional overhead. Additionally, the VM has to copy the data between the Java heap and system memory. Even though both the native and the Java benchmarks do depend on the drive and the implementation details of the underlying operating system, they are sufficient for relative comparisons.

Note that single-byte reads result in extremely poor performance for both the Java and native code. In fact, the one-byte Java test was more than 800 times slower than the Java test that used a 16K byte array. If you write a large file one byte at a time, you will be able to see the file slowly grow on the drive. When possible you should read and write data to a slow medium, such as a hard drive, in chunks. Multiples of 4K chunks are typically preferred in Java because they can be page aligned and allow the VM (and the OS) to transfer data more efficiently. Loading the entire file in a buffer and then writing it out not only results in a poor performance but also assumes that the system has enough free memory. It is important to know that when the VM needs to copy data from the Java heap to system memory, it uses a static temporary buffer (see Figure 5.11) for data up to 8K but must dynamically allocate a larger buffer so that it can copy larger amounts of data between the Java heap and system memory. Keep in mind that a larger buffer does not necessarily result in better performance and can, in fact, result in both more CPU usage and system heap fragmentation.

Table 5.2 shows the performance of different techniques. The first three listings are from Table 5.1 and are provided for easier comparison. The new tests in this table include a test that uses 16-K BufferedInputStream and BufferOutputStream objects, four tests that use direct and nondirect byte buffers, and a test that uses the transferTo method of FileChannel.

Table 5.2 : THE PERFORMANCE OF DIFFERENT JAVA FILE IO METHODS
Method	Performance
One byte	80530%
Array with the same size as the file	750%
Array with the size of 16K	100%
One-byte read with buffered streams of 16K	1600%
Nondirect buffer with the same size as the file	700%
Direct buffer with the same size as the file	690%
Direct buffer with the size of 64K	60%
Nondirect buffer with the size of 16K	110%
File channel transfer	700%

Table 5.2 shows that using BufferedInputStream and BufferOutputStream results in 16 times slower performance. This is because buffering results in an extra level of copying, which is the buffer labeled as intermediate buffer in Figure 5.11.

So, does this mean that buffering is not good? Note that even though using buffered streams is not the fastest possible approach, it performed more than 50 times faster than reading the data one byte at a time. The idea is that if you need to read bytes in small segments, you should use a buffer such as InputStreamBuffer or OutputStreamBuffer. If you can read data in chunks, you should definitely use an array and manage it in your own code. If you are reading data in optimal chucks, then there is no reason to use the Java buffer stream objects.

Note that the native and Java tests that read and write data in chunks (byte array, direct buffer, nondirect buffer, and channel transfer) and are as large as the file, perform similarly and just as poorly as each other do. This result is because when the overly large chunk is written, the primary bottleneck becomes the OS and the device. In general, the buffer of the device can be compared to a funnel. If a relatively small burst of data is written to the device, the buffer can hold the burst and let the device write out the data as soon as it gets the chance. However, if an excessive amount of data must be written to the drive, the buffer will not be able to hold all the data, and new data can be written into the buffer only as fast as it can be written to the device.

The direct buffer test does not perform much better for the very large buffer. On the other hand, you have probably noticed that a direct buffer of 64K in length performed very well. In fact, it performed exactly as the most optimal native operation. Because the buffer (or array) of a direct buffer resides in system memory, the VM does not have to copy the data.

The last entry, labeled as file channel transfer, does the following:

// channel to channel transfer  void test(String inputFilename, String outputFilename){     File inFile = new File(inputFilename);     File outFile = new File(outputFilename);     FileInputStream fis = new FileInputStream(inFile);     FileOutputStream fos = new FileOutputStream(outFile);     FileChannel ifc = fis.getChannel();      FileChannel ofc = fos.getChannel();      ifc.transferTo(0, ifc.size(), ofc); }

A FileChannel has TransferFrom and TransferTo methods that try to perform optimal transfers. They attempt to perform OS-level transfer and fall back to direct byte buffer transfer if lower-level transfers are not supported. The test performed here fell back to byte buffer transfer. A point that you should be aware of is that accessing the elements of a direct buffer and byte buffers in general is much more expensive (about 12 times) than accessing the elements of an array through accessors (see Table 5.3).

Table 5.3 : ACCESS OVERHEAD OF BYTE BUFFERS
Structure	Performance
Byte array access (through accessors)	100%
Nondirect Byte Buffer	1187%
Direct Byte Buffer	1286%

As a last note in regard to direct buffers, you should know that because they are page aligned, they are created in 4K chunks. You should avoid making many small direct buffers, because 100 direct buffers that each have capacity of one byte will take up almost half a megabyte. In other words, creating 100 direct buffers with the capacity of one is as costly as creating 100 direct buffers with capacity of 4K.

When it comes to parsing text files, it does not make much difference whether you read one byte at a time, read an array of bytes, or wrap a FileReader or InputStreamReader with a BufferedReader. This is because they already use a buffer internally so that they can convert the characters in chunks. In fact, wrapping them with a BufferedReader can only slow down the parsing. If you want optimal text parsing, you should read the file as bytes and, if needed, use a charset to convert the bytes yourself. A custom reader can result in a significant performance gain. If you are sure that the input file contains only ASCII characters, you can simply cast each byte to a character without having to use charsets and paying the additional conversion costs.