By default, Linux systems read from and write to a buffer/page cache that is kept in memory. They avoid actually transferring data to disk until the buffer is full or until the application calls a sync function to flush the buffer/page cache. This strategy increases performance by avoiding the relatively slow mechanical process of writing to disk more often than necessary. Input and output operations are of two types: Asynchronous I/O, which frees the application to perform other tasks while input is written or read Synchronized I/O, which performs the write or read operation and verifies its completion before returning Synchronized I/O is useful when the integrity of data and files is critical to an application. Synchronized output assures that the data that is written to a device is actually stored there. Synchronized input assures that the data that is read from a device is a current image of data on that device. Two levels of file synchronization are available: Data integrity Write operations: Data in the buffer is transferred to disk, along with file system information necessary to retrieve the data. Read operations: Any pending write operations relevant to the data being read complete with data integrity before the read operation is performed. File integrity Write operations: Data in the buffer and all file system information related to the operation are transferred to disk. Read operations: Any pending write operations relevant to the data being read complete with file integrity before the read operation is performed. How to Assure Data or File Integrity You can assure data integrity or file integrity at specific times by using function calls, or you can set file status flags to force automatic file synchronization for each read or write call associated with that file. Note that using synchronized I/O can degrade system performance. Using Function Calls You can choose to write to buffer/page cache as usual and call functions explicitly when you want the program to flush the buffer to disk. For instance, you may want to use the buffer/page cache when a significant amount of I/O is occurring and call these functions when activity slows down. Two functions are available: Function | Description |
---|
fdatasync | Flushes all data buffers, providing operation completion with data integrity. | fsync | Flushes all data and file control information from the buffers, providing operation completion with file integrity. |
For a complete description of these functions, refer to the man pages for fdatasync and fsync. Using File Descriptors If you want to write data to disk in all cases automatically, you can set file status flags to force this behavior instead of making explicit calls to fdatasync or fsync. To set this behavior, use these flags with the open function: Flag | Description |
---|
O_DSYNC | Forces data synchronization for each write operation. For example: fd = open("filea", O_RDWR|O_CREAT|O_DSYNC, 0666); | O_SYNC | Forces file and data synchronization for each write operation. For example: fd = open("filea", O_RDWR|O_CREAT|O_SYNC, 0666); |
Performance Implications of sync/fsync Forced synchronization of the contents of real memory and disk takes place in several ways: An application program makes an fsync() call for a specified file. This causes all the pages that contain modified data for that file to be written to disk. The writing is complete when the fsync() call returns to the program. An application program makes a sync() call. This causes all the file pages in memory that contain modified data to be scheduled for writing to disk. The writing is not necessarily complete when the sync() call returns to the program. A user can enter the sync command, which in turn issues a sync() call. Again, some of the writes might not be complete when the user is prompted for input (or the next command in a shell script is processed). The sync daemon, bdflush, is called at regular intervals. This ensures that the system does not accumulate large amounts of data that exists only in volatile RAM. |