executes.

Team-FLY

4.6 File Representation

Files are designated within C programs either by file pointers or by file descriptors. The standard I/O library functions for ISO C ( fopen , fscanf , fprintf , fread , fwrite , fclose and so on) use file pointers. The UNIX I/O functions ( open , read , write , close and ioctl ) use file descriptors. File pointers and file descriptors provide logical designations called handles for performing device-independent input and output. The symbolic names for the file pointers that represent standard input, standard output and standard error are stdin , stdout and stderr , respectively. These symbolic names are defined in stdio.h . The symbolic names for the file descriptors that represent standard input, standard output and standard error are STDIN_FILENO , STDOUT_FILENO and STDERR_FILENO , respectively. These symbolic names are defined in unistd.h .

Exercise 4.19

Explain the difference between a library function and a system call.

Answer:

The POSIX standard does not make a distinction between library functions and system calls. Traditionally, a library function is an ordinary function that is placed in a collection of functions called a library, usually because it is useful, widely used or part of a specification, such as C. A system call is a request to the operating system for service. It involves a trap to the operating system and often a context switch. System calls are associated with particular operating systems. Many library functions such as read and write are, in fact, jackets for system calls. That is, they reformat the arguments in the appropriate system-dependent form and then call the underlying system call to perform the actual operation.

Although the implementation details differ , versions of UNIX follow a similar implementation model for handling file descriptors and file pointers within a process. The remainder of this section provides a schematic model of how file descriptors (UNIX I/O) and file pointers (ISO C I/O) work. We use this model to explain redirection (Section 4.7) and inheritance (Section 4.6.3, Section 6.2 and Chapter 7).

4.6.1 File descriptors

The open function associates a file or physical device with the logical handle used in the program. The file or physical device is specified by a character string (e.g., /home/johns/my.dat or /dev/tty ). The handle is an integer that can be thought of as an index into a file descriptor table that is specific to a process. It contains an entry for each open file in the process. The file descriptor table is part of the process user area, but the program cannot access it except through functions using the file descriptor.

Example 4.20

Figure 4.2 shows a schematic of the file descriptor table after a program executes the following.

 myfd = open("/home/ann/my.dat", O_RDONLY);

The open function creates an entry in the file descriptor table that points to an entry in the system file table. The open function returns the value 3, specifying that the file descriptor entry is in position three of the process file descriptor table.

Figure 4.2. Schematic diagram of the relationship between the file descriptor table, the system file table and the in-memory inode table in a UNIX-like operating system after the code of Example 4.20 executes.

graphics/04fig02.gif

The system file table , which is shared by all the processes in the system, has an entry for each active open . Each system file table entry contains the file offset, an indication of the access mode (i.e., read, write or read-write) and a count of the number of file descriptor table entries pointing to it.

Several system file table entries may correspond to the same physical file. Each of these entries points to the same entry in the in-memory inode table . The in-memory inode table contains an entry for each active file in the system. When a program opens a particular physical file that is not currently open, the call creates an entry in this inode table for that file. Figure 4.2 shows that the file /home/ann/my.dat had been opened before the code of Example 4.20 because there are two entries in the system file table with pointers to the entry in the inode table. (The label B designates the earlier pointer in the figure.)

Exercise 4.21

What happens when the process whose file descriptor table is shown in Figure 4.2 executes the close(myfd) function?

Answer:

The operating system deletes the fourth entry in the file descriptor table and the corresponding entry in the system file table. (See Section 4.6.3 for a more complete discussion.) If the operating system also deleted the inode table entry, it would leave pointer B hanging in the system file table. Therefore, the inode table entry must have a count of the system file table entries that are pointing to it. When a process executes the close function, the operating system decrements the count in the inode entry. If the inode entry has a 0 count, the operating system deletes the inode entry from memory. (The operating system might not actually delete the entry right away on the chance that it will be accessed again in the immediate future.)

Exercise 4.22

The system file table entry contains an offset that gives the current position in the file. If two processes have each opened a file for reading, each process has its own offset into the file and reads the entire file independently of the other process. What happens if each process opens the same file for write? What would happen if the file offset were stored in the inode table instead of the system file table?

Answer:

The writes are independent of each other. Each user can write over what the other user has written because of the separate file offsets for each process. On the other hand, if the offsets were stored in the inode table rather than in the system file table, the writes from different active opens would be consecutive. Also, the processes that had opened a file for reading would only read parts of the file because the file offset they were using could be updated by other processes.

Exercise 4.23

Suppose a process opens a file for reading and then forks a child process. Both the parent and child can read from the file. How are reads by these two processes related ? What about writes?

Answer:

The child receives a copy of the parent's file descriptor table at the time of the fork. The processes share a system file table entry and therefore also share the file offset. The two processes read different parts of the file. If no other processes have the file open, writes append to the end of the file and no data is lost on writes. Subsection 4.6.3 covers this situation in more detail.

4.6.2 File pointers and buffering

The ISO C standard I/O library uses file pointers rather than file descriptors as handles for I/O. A file pointer points to a data structure called a FILE structure in the user area of the process.

Example 4.24

The following code segment opens the file /home/ann/my.dat for output and then writes a string to the file.

 FILE *myfp; if ((myfp = fopen("/home/ann/my.dat", "w")) == NULL)    perror("Failed to open /home/ann/my.dat"); else    fprintf(myfp, "This is a test");

Figure 4.3 shows a schematic of the FILE structure allocated by the fopen call of Example 4.24. The FILE structure contains a buffer and a file descriptor value. The file descriptor value is the index of the entry in the file descriptor table that is actually used to output the file to disk. In some sense the file pointer is a handle to a handle.

Figure 4.3. Schematic handling of a file pointer after `fopen` .

graphics/04fig03.gif

What happens when the program calls fprintf ? The result depends on the type of file that was opened. Disk files are usually fully buffered, meaning that the fprintf does not actually write the This is a test message to disk, but instead writes the bytes to a buffer in the FILE structure. When the buffer fills, the I/O subsystem calls write with the file descriptor, as in the previous section. The delay between the time when a program executes fprintf and the time when the writing actually occurs may have interesting consequences, especially if the program crashes. Buffered data is sometimes lost on system crashes, so it is even possible for a program to appear to complete normally but its disk output could be incomplete.

How can a program avoid the effects of buffering? An fflush call forces whatever has been buffered in the FILE structure to be written out. A program can also call setvbuf to disable buffering.

Terminal I/O works a little differently. Files associated with terminals are line buffered rather than fully buffered (except for standard error, which by default, is not buffered). On output, line buffering means that the line is not written out until the buffer is full or until a newline symbol is encountered .

Exercise 4.25 `bufferout.c`

How does the output appear when the following program executes?

 #include <stdio.h> int main(void) {    fprintf(stdout, "a");    fprintf(stderr, "a has been written\n");    fprintf(stdout, "b");    fprintf(stderr, "b has been written\n");    fprintf(stdout, "\n");    return 0; }

Answer:

The messages written to standard error appear before the 'a' and 'b' because standard output is line buffered, whereas standard error is not buffered.

Exercise 4.26 `bufferinout.c`

How does the output appear when the following program executes?

 #include <stdio.h> int main(void) {    int i;    fprintf(stdout, "a");    scanf("%d", &i);    fprintf(stderr, "a has been written\n");    fprintf(stdout, "b");    fprintf(stderr, "b has been written\n");    fprintf(stdout, "\n");    return 0; }

Answer:

The scanf function flushes the buffer for stdout , so 'a' is displayed before the number is read in. After the number has been entered, 'b' still appears after the b has been written message.

The issue of buffering is more subtle than the previous discussion might lead you to believe. If a program that uses file pointers for a buffered device crashes, the last partial buffer created from the fprintf calls may never be written out. When the buffer is full, a write operation is performed. Completion of a write operation does not mean that the data actually made it to disk. In fact, the operating system copies the data to a system buffer cache . Periodically, the operating system writes these dirty blocks to disk. If the operating system crashes before it writes the block to disk, the program still loses the data. Presumably, a system crash is less likely to happen than an individual program crash.

4.6.3 Inheritance of file descriptors

When fork creates a child, the child inherits a copy of most of the parent's environment and context, including the signal state, the scheduling parameters and the file descriptor table. The implications of inheritance are not always obvious. Because children receive a copy of their parent's file descriptor table at the time of the fork, the parent and children share the same file offsets for files that were opened by the parent prior to the fork.

Example 4.27 `openfork.c`

In the following program, the child inherits the file descriptor for my.dat . Each process reads and outputs one character from the file.

 #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/stat.h> int main(void) {    char c = '!';    int myfd;    if ((myfd = open("my.dat", O_RDONLY)) == -1) {       perror("Failed to open file");       return 1;    }    if (fork() == -1) {       perror("Failed to fork");       return 1;    }    read(myfd, &c, 1);    printf("Process %ld got %c\n", (long)getpid(), c);    return 0; }

Figure 4.4 shows the parent and child file descriptor tables for Example 4.27. The file descriptor table entries of the two processes point to the same entry in the system file table. The parent and child therefore share the file offset, which is stored in the system file table.

Figure 4.4. If the parent opens `my.dat` before forking, both parent and child share the system file table entry.

graphics/04fig04.gif

Exercise 4.28

Suppose the first few bytes in the file my.dat are abcdefg . What output would be generated by Example 4.27?

Answer:

Since the two processes share the file offset, the first one to read gets a and the second one to read gets b . Two lines are generated in the following form.

 Process nnn got a Process mmm got b

In theory, the lines could be output in either order but most likely would appear in the order shown.

Exercise 4.29

When a program closes a file, the entry in the file descriptor table is freed. What about the corresponding entry in the system file table?

Answer:

The system file table entry can only be freed if no more file descriptor table entries are pointing to it. For this reason, each system file table entry contains a count of the number of file descriptor table entries that are pointing to it. When a process closes a file, the operating system decrements the count and deletes the entry only when the count becomes 0.

Exercise 4.30

How does fork affect the system file table?

Answer:

The system file table is in system space and is not duplicated by fork . However, each entry in the system file table keeps a count of the number of file descriptor table entries pointing to it. These counts must be adjusted to reflect the new file descriptor table created for the child.

Example 4.31 `forkopen.c`

In the following program, the parent and child each open my.dat for reading, read one character, and output that character.

 #include <fcntl.h> #include <stdio.h> #include <unistd.h> #include <sys/stat.h> int main(void) {    char c = '!';    int myfd;    if (fork() == -1) {       perror("Failed to fork");       return 1;    }    if ((myfd = open("my.dat", O_RDONLY)) == -1) {       perror("Failed to open file");       return 1;    }    read(myfd, &c, 1);    printf("Process %ld got %c\n", (long)getpid(), c);    return 0; }

Figure 4.5 shows the file descriptor tables for Example 4.31. The file descriptor table entries corresponding to my.dat point to different system file table entries. Consequently, the parent and child do not share the file offset. The child does not inherit the file descriptor, because each process opens the file after the fork and each open creates a new entry in the system file table. The parent and child still share system file table entries for standard input, standard output and standard error.

Figure 4.5. If the parent and child open `my.dat` after the `fork` call, their file descriptor table entries point to different system file table entries.

graphics/04fig05.gif

Exercise 4.32

Suppose the first few bytes in the file my.dat are abcdefg . What output would be generated by Example 4.31?

Answer:

Since the two processes use different file offsets, each process reads the first byte of the file. Two lines are generated in the following form.

 Process nnn got a Process mmm got a

Exercise 4.33 `fileiofork.c`

What output would be generated by the following program?

 #include <stdio.h> #include <unistd.h> int main(void) {    printf("This is my output.");    fork();    return 0; }

Answer:

Because of buffering, the output of printf is likely to be written to the buffer corresponding to stdout , but not to the actual output device. Since this buffer is part of the user space, it is duplicated by fork . When the parent and the child each terminate, the return from main causes the buffers to be flushed as part of the cleanup. The output appears as follows .

 This is my output.This is my output.

Exercise 4.34 `fileioforkline.c`

What output would be generated by the following program?

 #include <stdio.h> #include <unistd.h> int main(void) {    printf("This is my output.\n");    fork();    return 0; }

Answer:

The buffering of standard output is usually line buffering. This means that the buffer is flushed when it contains a newline. Since in this case a newline is output, the buffer will probably be flushed before the fork and only one line of output will appear.

Team-FLY

4.6 File Representation

Exercise 4.19

4.6.1 File descriptors

Example 4.20

Figure 4.2. Schematic diagram of the relationship between the file descriptor table, the system file table and the in-memory inode table in a UNIX-like operating system after the code of Example 4.20 executes.

Exercise 4.21

Exercise 4.22

Exercise 4.23

4.6.2 File pointers and buffering

Example 4.24

Figure 4.3. Schematic handling of a file pointer after fopen .

Exercise 4.25 bufferout.c

Exercise 4.26 bufferinout.c

4.6.3 Inheritance of file descriptors

Example 4.27 openfork.c

Figure 4.4. If the parent opens my.dat before forking, both parent and child share the system file table entry.

Exercise 4.28

Exercise 4.29

Exercise 4.30

Example 4.31 forkopen.c

Figure 4.5. If the parent and child open my.dat after the fork call, their file descriptor table entries point to different system file table entries.

Exercise 4.32

Exercise 4.33 fileiofork.c

Exercise 4.34 fileioforkline.c

Figure 4.3. Schematic handling of a file pointer after `fopen` .

Exercise 4.25 `bufferout.c`

Exercise 4.26 `bufferinout.c`

Example 4.27 `openfork.c`

Figure 4.4. If the parent opens `my.dat` before forking, both parent and child share the system file table entry.

Example 4.31 `forkopen.c`

Figure 4.5. If the parent and child open `my.dat` after the `fork` call, their file descriptor table entries point to different system file table entries.

Exercise 4.33 `fileiofork.c`

Exercise 4.34 `fileioforkline.c`