Chapter 5: Dealing with Files | Programming from the Ground Up

A lot of computer programming deals with files. After all, when we reboot our computers, the only thing that remains from previous sessions are the things that have been put on disk. Data which is stored in files is called persistent data, because it persists in files that remain on the disk even when the program isn't running..

The UNIX File Concept

Each operating system has its own way of dealing with files. However, the UNIX method, which is used on Linux, is the simplest and most universal. UNIX files, no matter what program created them, can all be accessed as a sequential stream of bytes. When you access a file, you start by opening it by name. The operating system then gives you a number, called a file descriptor, which you use to refer to the file until you are through with it. You can then read and write to the file using its file descriptor. When you are done reading and writing, you then close the file, which then makes the file descriptor useless.

In our programs we will deal with files in the following ways:

Tell Linux the name of the file to open, and in what mode you want it opened (read, write, both read and write, create it if it doesn't exist, etc.). This is handled with the open system call, which takes a filename, a number representing the mode, and a permission set as its parameters. %eax will hold the system call number, which is 5. The address of the first character of the filename should be stored in %ebx. The read/write intentions, represented as a number, should be stored in %ecx. For now, use 0 for files you want to read from, and 03101 for files you want to write to (you must include the leading zero). ^[1] Finally, the permission set should be stored as a number in %edx. If you are unfamiliar with UNIX permissions, just use 0666 for the permissions (again, you must include the leading zero).
Linux will then return to you a file descriptor in %eax. Remember, this is a number that you use to refer to this file throughout your program.
Next you will operate on the file doing reads and/or writes, each time giving Linux the file descriptor you want to use. read is system call 3, and to call it you need to have the file descriptor in %ebx, the address of a buffer for storing the data that is read in %ecx, and the size of the buffer in %edx. Buffers will be explained in the Section called Buffers and .bss. read will return with either the number of characters read from the file, or an error code. Error codes can be distinguished because they are always negative numbers (more information on negative numbers can be found in Chapter 10). write is system call 4, and it requires the same parameters as the read system call, except that the buffer should already be filled with the data to write out. The write system call will give back the number of bytes written in %eax or an error code.
When you are through with your files, you can then tell Linux to close them. Afterwards, your file descriptor is no longer valid. This is done using close, system call 6. The only parameter to close is the file descriptor, which is placed in %ebx

^[1]This will be explained in more detail in the Section called Truth, Falsehood, and Binary Numbers in Chapter 10.