Let s now get our hands dirty by working through some examples of GNU/Linux stream file I/O programming.
To write an application that performs file handling, the first step is to make visible the file I/O APIs. This is done by simply including the stdio.h header file, as:
#include <stdio.h>
Not doing so will result in compiler errors (undeclared symbols). The next step is to declare our handle to be used in file I/O operations. This is often called a file pointer and is a transparent structure that should not be accessed by the developer.
FILE *my_fp;
We ll build on this in the next sections to illustrate ASCII and binary applications.
Let s now open a file and illustrate the variety of modes that can be used. Recall that opening a file can also be the mechanism to create a file. We ll investigate this first.
The fopen function is very simple and provides the following API:
FILE * fopen (const char *filename, const char *mode);
We specify the filename that we wish to access (or create) through the first argument ( filename ) and then the mode we wish to use ( mode ). The result of the fopen operation is a FILE pointer, which could be NULL , indicating that the operation failed.
The key to the fopen call is the mode that is provided. Table 10.1 provides an initial list of access modes.
The mode is simply a string that the fopen call uses to determine how to open (or create) the file. If we wanted to create a new file, we could simply use the fopen call as follows :
my_fp = fopen ("myfile.txt", "w");
Mode | Description |
---|---|
r | Open an existing file for read |
w | Open a file for write (create new if exists) |
a | Open a file for append (create if file doesn t exist) |
rw | Open for read and write (create if it doesn t exist) |
The result would be the creation of a new file (or the destruction of the existing file) in preparation for write operations. If instead we wanted to read from an existing file, we d open it as follows:
my_fp = fopen ("myfile.txt", "r");
Note that we ve simply used a different mode here. The read mode assumes that the file exists, and if not, a NULL is returned.
In both cases, it is assumed that our file myfile.txt will either exist or be created in the current working directory. The current directory is the directory from which we invoked our application.
It s very important that the results of all file I/O operations be checked for success. For the fopen call, we simply test the response for NULL . What happens upon error is ultimately application dependent (you decide). An example of one mechanism is provided in Listing 10.1.
1: #include <stdio.h> 2: #include <errno.h> 3: #include <string.h> 4: 5: #define MYFILE "missing.txt" 6: 7: main() 8: { 9: 10: FILE *fin; 11: 12: /* Try to open the file for read */ 13: fin = fopen (MYFILE, "r"); 14: 15: /* Check for failure to open */ 16: if (fin == (FILE *)NULL) { 17: 18: /* Emit an error message and exit */ 19: printf("%s: %s\n", MYFILE, strerror(errno)); 20: exit(-1); 21: 22: } 23: 24: /* All was well, close the file */ 25: fclose(fin); 26: 27: }
$ ./app missing.txt: No such file or directory $
Let s now move on to writing and then reading data from a file.
A number of methods exist for both reading and writing data to a file. More options can be a blessing, but it s also important to know where to use which mechanism. For example, we could read or write on a character basis or on a string basis (for ASCII text only). We could also use a more general API that permits reading and writing records, which supports both ASCII and binary representations. We ll look at each here, but we ll focus primarily on the latter mechanism.
The standard I/O library presents a buffered interface. This has two very important properties. First, system reads and writes are in blocks (typically 8KB in size ). Character I/O is simply written to the FILE buffer, where the buffer is written to the media automatically when it s full. Second, fflush is necessary, or nonbuffered I/O must be set if the data is being sent to an interactive device such as the console terminal.
The character interfaces are demonstrated in Listings 10.2 and 10.3. In Listing 10.2, we illustrate character output using fputc and in Listing 10.3, character input using fgetc . These functions have the following prototypes :
int fputc (int c, FILE *stream); int fgetc (FILE *stream);
In this example, we ll generate an output file using fputc and then use this file as the input to fgetc . In Listing 10.2, we open our output file at line 11 and then work our way through our sample string. Our simple loop walks through the entire string until a NULL is detected , at which point we exit and close the file (line 21). At line 16, we use fputc to emit our character (as an int , per the fputc prototype) as well as specifying our output stream ( fout ).
1: #include <stdio.h> 2: 3: int main() 4: { 5: int i; 6: FILE *fout; 7: const char string[]={"This\r\nis a test\r\nfile.\r\n1: #include <stdio.h> 2: 3: int main() 4: { 5: int i; 6: FILE *fout; 7: const char string[]={"This\r\nis a test\r\nfile.\r\n\0"}; 8: 9: fout = fopen("inpfile.txt", "w"); 10: 11: if (fout == (FILE *)NULL) exit(-1); 12: 13: i = 0; 14: while (string[i] != NULL) { 15: 16: fputc((int)string[i], fout); 17: i++; 18: 19: } 20: 21: fclose(fout); 22: 23: return 0; 24: }"}; 8: 9: fout = fopen("inpfile.txt", "w"); 10: 11: if (fout == (FILE *)NULL) exit(-1); 12: 13: i = 0; 14: while (string[i] != NULL) { 15: 16: fputc((int)string[i], fout); 17: i++; 18: 19: } 20: 21: fclose(fout); 22: 23: return 0; 24: }
The function to read this file using the character interface is shown in Listing 10.3. This function is very similar to our file creation example. We open the file for read at line 8 and follow with a test at line 10. We then enter a loop to get the characters from the file (lines 12 “22). The loop simply reads characters from the file using fgetc and stops when the special EOF symbol is encountered . This is the indication that we ve reached the end of the file. For all characters that are not EOF (line 16), we emit the character to standard-out using the printf function. Upon reaching the end of the file, we close it using fclose at line 24.
1: #include <stdio.h> 2: 3: int main() 4: { 5: int c; 6: FILE *fin; 7: 8: fin = fopen("inpfile.txt", "r"); 9: 10: if (fin == (FILE *)0) exit(-1); 11: 12: do { 13: 14: c = fgetc(fin); 15: 16: if (c != EOF) { 17: 18: printf("%c", (char)c); 19: 20: } 21: 22: } while (c != EOF); 23: 24: fclose(fin); 25: 26: return 0; 27: }
Executing our applications is illustrated as follows:
$ ./charout $ ./charin This is a test file. $
The character interfaces are obviously simple, but they are also inefficient and should be used only if a string-based method cannot be used. We ll look at this interface next.
In this section, we ll look at four library functions in particular that provide the means to read and write strings. The first two ( fputs and fgets ) are simple string interfaces, and the second two ( fprintf and fscanf ) are more complex and provide additional capabilities.
The fputs and fgets interfaces mirror our previously discussed fputc and fgetc functions. They provide the means to write and read variable-length strings to files in a very simple way. Prototypes for the fputs and fgets are defined as:
int fputs (int c, FILE *stream); char * fgets (char *s, int size, FILE *stream);
Let s first look at a sample application that accepts strings from the user (via standard-input) and then writes them to a file (see Listing 10.4). We ll halt the input process once a blank line has been received.
1: #include <stdio.h> 2: 3: #define LEN 80 4: 5: int main() 6: { 7: char line[LEN+1]; 8: FILE *fout, *fin; 9: 10: fout = fopen("testfile.txt", "w"); 11: if (fout == (FILE *)0) exit(-1); 12: 13: fin = fdopen(0, "r"); 14: 15: while ((fgets(line, LEN, fin)) != NULL) { 16: 17: fputs(line, fout); 18: 19: } 20: 21: fclose(fout); 22: fclose(fin); 23: 24: return 0; 25: }
At line 10, we open our output file using fopen to a new file called testfile.txt . We check the error status of this line at line 11, exiting if a failure occurred. At line 13, we use a special function fdopen to associate an existing file descriptor with a stream. In this case, we associate in the standard-input descriptor with a new stream called fin (returned by fdopen ). Whatever we now type in (standard-in) will be routed to this file stream. Next, we enter a loop that attempts to read from the fin stream (standard-in) and write this out to the output stream ( fout ). At line 15, we read using fgets and check the return with NULL . The NULL will appear when we close the descriptor (which is achieved through pressing Ctrl+D at the keyboard). The line read is then emitted to the output stream using fputs . Finally, when the input stream has closed, we exit our loop and close the two streams at lines 21 and 22.
Let s now look at another example of the read side, fgets . In this example ( Listing 10.5), we read the contents of our test file using fgets and then printf it to standard-out.
1: #include <stdio.h> 2: 3: #define LEN 80 4: 5: int main() 6: { 7: char line[LEN+1]; 8: FILE *fin; 9: 10: fin = fopen("testfile.txt", "r"); 11: if (fin == (FILE *)0) exit(-1); 12: 13: while ((fgets(line, LEN, fin)) != NULL) { 14: 15: printf("%s", line); 16: 17: } 18: 19: fclose(fin); 20: 21: return 0; 22: }
In this example, we open our input file and create a new input stream handle called fin . We use this at line 13 to read variable-length strings from the file, and when one is read, we emit it to standard-out via printf at line 15.
This demonstrates writing and reading strings to and from a file, but what if our data is more structured than simply strings? If our strings are actually made up of lower-level structures (such as integers, floating-point values, or other types), we can use another method to more easily deal with them. This is the next topic of discussion.
Consider the problem of reading and writing data that takes a regular form but consists of various data types. Let s say that we want to store an integer item (an id ), two floating-point values (2d coordinates), and a string (an object name ). Let s look first at the application that creates this file (see Listing 10.6). Note that in this example we ultimately deal with strings, but using the API functions, the ability to translate to the native data types is provided.
1: #include <stdio.h> 2: 3: #define MAX_LINE 40 4: 5: #define FILENAME "myfile.txt" 6: 7: typedef struct { 8: int id; 9: float x_coord; 10: float y_coord; 11: char name[MAX_LINE+1]; 12: } MY_TYPE_T; 13: 14: #define MAX_OBJECTS 3 15: 16: /* Initialize an array of three objects */ 17: MY_TYPE_T objects[MAX_OBJECTS]={ 18: { 0, 1.5, 8.4, "First-object" }, 19: { 1, 9.2, 7.4, "Second-object" }, 20: { 2, 4.1, 5.6, "Final-object" } 21: }; 22: 23: int main() 24: { 25: int i; 26: FILE *fout; 27: 28: /* Open the output file */ 29: fout = fopen (FILENAME, "w"); 30: if (fout == (FILE *)0) exit(-1); 31: 32: /* Emit each of the objects, one per line */ 33: for (i = 0 ; i < MAX_OBJECTS ; i++) { 34: 35: fprintf (fout, "%d %f %f %s\n", 36: objects[i].id, 37: objects[i].x_coord, objects[i].y_coord, 38: objects[i].name); 39: 40: } 41: 42: fclose (fout); 43: 44: return 0; 45: }
Note | We could have achieved this with a sprintf call (to create our output string) and then written this out as follows: char line[81]; ... snprintf (line, 80, "%d %f %f %s\n", objects[i].id objects[i].x_coord, objects[i].y_coord, objects[i].name); fputs (line, fout); |
Note | The disadvantage is that local space must be declared for the string being emitted. This would not be required with a call to fprintf directly. |
The prototypes for both the fprintf and sprintf are shown here:
int fprintf (FILE* stream, const char *format, ...); int sprintf (char *str, const char *format, ...);
From the file created in Listing 10.6, we read this file in Listing 10.7. This function utilizes the fscanf function to both read and interpret the data. After opening the input file (lines 21 “22), we loop and read the data while the end of file has not been found. We detect the end of file marker using the feof function at line 25. The fscanf function utilizes the input stream ( fin ) and the format to be used to interpret the data. This string is identical to that used to write the data out (see Listing 10.6, line 35).
Once a line of data has been read, it s immediately printed to standard-out using the printf function at lines 32 “35. Finally, the input file is closed using the fclose call at line 39.
1: #include <stdio.h> 2: 3: #define MAX_LINE 40 4: 5: #define FILENAME "myfile.txt" 6: 7: typedef struct { 8: int id; 9: float x_coord; 10: float y_coord; 11: char name[MAX_LINE+1]; 12: } MY_TYPE_T; 13: 14: int main() 15: { 16: int i; 17: FILE *fin; 18: MY_TYPE_T object; 19: 20: /* Open the input file */ 21: fin = fopen (FILENAME, "r"); 22: if (fin == (FILE *)0) exit(-1); 23: 24: /* Read the records from the file and emit */ 25: while (! feof (fin)) { 26: 27: fscanf (fin, "%d %f %f %s\n", 28: &object.id, 29: &object.x_coord, &object.y_coord, 30: object.name); 31: 32: printf("%d %f %f %s\n", 33: object.id, 34: object.x_coord, object.y_coord, 35: object.name); 36: 37: } 38: 39: fclose (fin); 40: 41: return 0; 42: }
Note | We could have achieved this functionality with an sscanf call (to parse our input string). char line[81]; ... fgets (fin, 80, line); sscanf (line, 80, "%d %f %f %s\n", objects[i].id objects[i].x_coord, objects[i].y_coord, objects[i].name); |
Note | The disadvantage is that local space must be declared for the parse to be performed on the input string. This would not be required with a call to fscanf directly. |
The fscanf and sscanf function prototypes are both shown here:
int fscanf (FILE *stream, const char *format, ...); int sscanf (const char *str, const char *format, ...);
All of the methods discussed thus far require that we re dealing with ASCII text data. In the next section, we ll look at API functions that permit dealing with binary data.
Note | For survivability , it s important to not leave files open over long durations of time. When I/O is complete, the file should be closed with fclose (or at a minimum, flushed with fflush ). This has the effect of writing any buffered data to the actual file. |
In this section, we ll look at a set of library functions that provide the ability to deal with both binary and ASCII text data. The fwrite and fread functions provide the ability to deal not only with the I/O of objects, but also with arrays of objects. The prototypes of the fwrite and fread functions are provided here:
size_t fread (void *ptr, size_t size, size_t nmemb, FILE *stream); size_t fwrite (const void *ptr, size_t size, size_t nmemb, FILE *stream);
Let s look at a couple of simple examples of fwrite and fread to explore their use (see Listing 10.8). In this first example, we ll emit the MY_TYPE_T structure first encountered in Listing 10.6.
1: #include <stdio.h> 2: 3: #define MAX_LINE 40 4: 5: #define FILENAME "myfile.bin" 6: 7: typedef struct { 8: int id; 9: float x_coord; 10: float y_coord; 11: char name[MAX_LINE+1]; 12: } MY_TYPE_T; 13: 14: #define MAX_OBJECTS 3 15: 16: MY_TYPE_T objects[MAX_OBJECTS]={ 17: { 0, 1.5, 8.4, "First-object" }, 18: { 1, 9.2, 7.4, "Second-object" }, 19: { 2, 4.1, 5.6, "Final-object" } 20: }; 21: 22: int main() 23: { 24: int i; 25: FILE *fout; 26: 27: /* Open the output file */ 28: fout = fopen (FILENAME, "w"); 29: if (fout == (FILE *)0) exit(-1); 30: 31: /* Write out the entire objects structure */ 32: fwrite ((void *)objects, sizeof(MY_TYPE_T), 3, fout); 33: 34: fclose (fout); 35: 36: return 0; 37: }
Let s look at the invocation of this application (called binout ) and a method for inspecting the contents of the binary file (see Listing 10.9). After executing the binout executable, the file myfile.bin is generated. Attempting to use the more utility to inspect the file results in a blank line. This is because the first character in the file is a NULL character, which is interpreted by more as the end. Next, we use the od utility (octal dump) to emit the file without interpreting it. We specify -x as the option to emit the file in hexadecimal format. (For navigation purposes, the integer id field has been underlined .)
$ ./binout $ more myfile.bin $ od -x myfile.bin 0000000 0000 0000 0000 3fc0 6666 4106 6946 7372 0000020 2d74 626f 656a 7463 0000 0000 0000 0000 0000040 0000 0000 0000 0000 0000 0000 0000 0000 0000060 0000 0000 0000 0000 0001 0000 3333 4113 0000100 cccd 40ec 6553 6f63 646e 6f2d 6a62 6365 0000120 0074 0000 0000 0000 0000 0000 0000 0000 0000140 0000 0000 0000 0000 0000 0000 0000 0000 0000160 0002 0000 3333 4083 3333 40b3 6946 616e 0000200 2d6c 626f 656a 7463 0000 0000 0000 0000 0000220 0000 0000 0000 0000 0000 0000 0000 0000 0000240 0000 0000 0000 0000 0000250 $
Note | One important item to note about reading and writing binary data is the issue of portability and endianness. Consider that we create our binary data on a Pentium system, but the binary file is moved to a PowerPC system to read. The data will be in the incorrect byte order and therefore essentially corrupt. The Pentium uses little endian byte order (least significant byte first in memory), whereas the PowerPC uses big endian (most significant byte first in memory). For portability, endianness should always be considered when dealing with binary data. Also consider the use of host and network byte swapping functions, as discussed in Chapter 11, Programming with Pipes. |
Now let s look at reading this file using fread , but rather than reading it sequentially, let s read it in a nonsequential way ( otherwise known as random access). In this example, we ll read the records of the file in reverse order. This requires the use of two new functions that will permit us to seek into a file ( fseek ) and also rewind back to the start ( rewind ):
void rewind (FILE *stream); int fseek (FILE *stream, long offset, int whence);
The rewind function simply resets the file read pointer back to the start of the file, while the fseek function allows us to the new position given an index. The whence argument defines whether the position is relative to the start of the file ( SEEK_SET ), the current position ( SEEK_CUR ), or the end of the file ( SEEK_END ). See Table 10.2. The lseek function operates like fseek , but instead on a file descriptor:
int lseek (FILE *stream, long offset, int whence);
Name | Description |
---|---|
SEEK_SET | Moves the file position to the position defined by offset . |
SEEK_CUR | Moves the file position the number of bytes defined by offset from the current file position. |
SEEK_END | Moves the file position to the number of bytes defined by offset from the end of the file. |
We repeat this process, setting the file read position to the second element at line 38, and then read again with fread . The final step is reading the first element in the file. This requires no fseek because after the rewind (at line 48), we re at the top of the file. We can then fread the first record at line 50.
1: #include <stdio.h> 2: 3: #define MAX_LINE 40 4: 5: #define FILENAME "myfile.txt" 6: 7: typedef struct { 8: int id; 9: float x_coord; 10: float y_coord; 11: char name[MAX_LINE+1]; 12: } MY_TYPE_T; 13: 14: MY_TYPE_T object; 15: 16: int main() 17: { 18: int i; 19: FILE *fin; 20: 21: /* Open the input file */ 22: fin = fopen(FILENAME, "r"); 23: if (fin == (FILE *)0) exit(-1); 24: 25: /* Get the last entry */ 26: fseek(fin, (2 * sizeof(MY_TYPE_T)), SEEK_SET); 27: 28: fread(&object, sizeof(MY_TYPE_T), 1, fin); 29: 30: printf("%d %f %f %s\n", 31: object.id, 32: object.x_coord, object.y_coord, 33: object.name); 34: 35: /* Get the second to last entry */ 36: rewind (fin); 37: 38: fseek (fin, (1 * sizeof(MY_TYPE_T)), SEEK_SET); 39: 40: fread (&object, sizeof(MY_TYPE_T), 1, fin); 41: 42: printf("%d %f %f %s\n", 43: object.id, 44: object.x_coord, object.y_coord, 45: object.name); 46: 47: /* Get the first entry */ 48: rewind (fin); 49: 50: fread (&object, sizeof(MY_TYPE_T), 1, fin); 51: 52: printf("%d %f %f %s\n", 53: object.id, 54: object.x_coord, object.y_coord, 55: object.name); 56: 57: fclose (fin); 58: 59: return 0; 60: }
The process of reading the third record is illustrated graphically in Figure 10.1. We illustrate the fopen , fseek , fread , and finally the rewind .
The function ftell provides the means to identify the current position. This function returns the current position as a long type and can be used to pass as the offset to fseek (with SEEK_SET ) to reset to that position. The ftell prototype is provided here:
long ftell (FILE *stream);
An alternate API exists to ftell and fseek . The fgetpos and fsetpos provide the same functionality, but in a different form. Rather than an absolute position, an opaque type is used to represent the position (returned by fgetpos , passed into fsetpos ). The prototypes for these functions are provided here:
int fgetpos (FILE *stream, fpos_t *pos); int fsetpos (FILE *stream, fops_t *pos);
An example code snippet of these functions is shown here:
fpos_t file_pos; ... /* Get desired position */ fgetpos (fin, &file_pos); ... rewind (fin); /* Return to desired position */ fsetpos (fin, &file_pos);
It s recommended to use the fgetpos and fsetpos APIs over the ftell and fseek methods. Since the ftell and fseek methods don t abstract the details of the mechanism, the fgetpos and fsetpos functions are less likely to be deprecated in the future.