Developer Considerations

Developer Considerations

A developer needs to consider the size of each file I/O data block that an application writes, the mode in which the application passes those blocks to the file system, and the technique the application uses to extend the file size during a write operation.

Blocks and Clusters

Sectors are the smallest storage unit in a file storage medium such as a hard disk volume. The disk driver handles data in sector-size blocks. The sector size is a feature of the physical storage medium.

Clusters are the smallest data block that a file system handles, and they are a whole multiple of the volume sector size. Users specify the file system cluster size when they format a hard disk.

Applications are capable of writing file I/O data blocks of arbitrary size. However, an application writes data most efficiently when it passes I/O data blocks that are whole multiples of the file system cluster size (itself a whole multiple of the volume sector size) and begins a write operation at a position on a cluster boundary.

Consider what happens when an application writes 100-byte I/O blocks to 4-kilobyte (KB) file system clusters. The application will have to provide about 40 data blocks to actually fill one cluster. For each data block, the following actions take place:

  1. The file system updates its internal mapping table and directs the disk driver to read the data from certain volume sectors into the file system cache.

  2. The disk heads move from the file system data, typically at the beginning of the physical disk, to the location of the file system cluster containing the first sector.

  3. The disk driver can handle data in sectors, but the file system must handle data in clusters. The disk driver therefore reads all the sectors in the cluster that is, 4 KB of existing data.

  4. The file system modifies the existing data to include the 100 bytes of new data.

  5. The disk driver writes the modified 4 KB of data back to the file system cluster.

Obviously, you can improve the performance of this scenario by writing larger I/O data blocks. However, if the blocks are 5 KB, for example, the application still incurs additional overhead by almost always writing to partial file system clusters.

If the application writes an I/O data block the same size as a file system cluster (or a whole multiple of that size), the disk driver doesn t need to read the existing cluster and modify it. It simply writes the I/O data block into the file system cache and later writes it to the on-disk cluster.

You can t change the fact that it will take about ten times longer to write a 10-GB file than it takes to write a 1-GB file. Your opportunity for optimization is with the system overhead the amount of time spent handling metadata and moving the physical parts of the disk. That time is most affected by the number of calls to the file system.

An application that writes digital media files should follow these guidelines:

  • Always start write operations at the beginning of a cluster, which is also the beginning of a sector.

  • Write file I/O data blocks that are at least 64 KB, and preferably 256 KB, in size. Writing extremely large files to a file system configured to use large clusters will benefit from data blocks as large as 1 MB.

Writing Data Blocks

An application can open a file for buffered or unbuffered I/O. Buffered I/O means that the operating system first copies the I/O data block to one or more system cache buffers. Data being read is then copied to the application address space; data being written is then copied to the disk. Buffered I/O benefits applications performing many small I/O operations but degrades the reading or writing of very large files.

When you implement unbuffered file I/O, you re required to read and write data blocks that are a multiple of the file system cluster size and are aligned on block boundaries.

Programmatic requirements for unbuffered file I/O are very specific. Read the Remarks section of the documentation for the Microsoft Windows kernel functions WriteFile or WriteFileEx for more information.

When correctly implemented, unbuffered I/O reduces the amount of time required to write each data block because it s not necessary to copy the data block to the system cache buffers. Unbuffered I/O also assures that the application doesn t incur the overhead of reading unmodified data from the disk and then writing it back again.

One of the benefits of a larger data block is to reduce the system overhead of NTFS logging. The log file is updated once for every allocation request, not for every volume sector or file system cluster. If you use buffered I/O, you won t have definite control over the number of requests it takes to write the file. An application that reads or writes digital media files should open them for unbuffered file I/O.

Extending Files

Programmers generally take one of two approaches to extending a file: they either seek to a new end-point location, or they call the Windows kernel functions SetFilePointer and SetEndOfFile. You should avoid both of these approaches.

When you use the seek approach, the file system allocates the new space and then writes zeroes into each file system cluster. (This technique, called cleaning, is a requirement for C2-level security as defined by the U.S. National Computer Security Center.) Each cluster actually has data written into it twice: first during the cleaning operation that follows the seek-and-write operation, and then again when the application sends a data block to that cluster.

When you call SetEndOfFile and the file system is NTFS, you gain a significant performance benefit over the seek-and-write approach because NTFS simply reduces the free-cluster counter and updates the file size. NTFS doesn t clean the new space, although it does mark the clusters as belonging to the file. Therefore, each cluster receives data only once when the application writes a data block.

When you call SetEndOfFile and the file system is FAT32, no performance gain occurs over seek-and-write because FAT32 allocates the new space and then cleans it. You should recommend that your customers use NTFS.

To take full advantage of this technique, you must also take care to write data sequentially. Whenever you actually write to a cluster that is beyond the current end of the file (as opposed to merely notifying the system that you plan to write beyond the current end), the file system always cleans the intervening clusters.

Instead of using either of these approaches, applications that write digital media files should extend files during the write operation by writing I/O data blocks sequentially. If the file system is NTFS, the effect is the same as if the application had called SetEndOfFile, and it s also more efficient. Therefore, you should also recommend that your customers format their hard disks using NTFS.

Developer Summary

To achieve the best file I/O performance in applications that handle digital media files, programmers should follow these guidelines:

  • Use unbuffered file I/O.

  • Write and read I/O data blocks sequentially.

  • Start all write operations on a 64-KB boundary.

  • Write and read I/O data blocks that are at least 64 KB and preferably 256 KB in size.

  • Recommend that customers configure their computers to use NTFS and follow the other suggestions in the following section, User Considerations.



Programming Microsoft DirectShow for Digital Video and Television
Programming Microsoft DirectShow for Digital Video and Television (Pro-Developer)
ISBN: 0735618216
EAN: 2147483647
Year: 2002
Pages: 108
Authors: Mark D. Pesce

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net