5.3 UNIX File System Implementation | UNIX Systems Programming: Communication, Concurrency and Threads

Team-FLY

Disk formatting divides a physical disk into regions called partitions . Each partition can have its own file system associated with it. A particular file system can be mounted at any node in the tree of another file system. The topmost node in a file system is called the root of the file system. The root directory of a process (denoted by / ) is the topmost directory that the process can access. All fully qualified paths in UNIX start from the root directory / .

Figure 5.2 shows a typical root file system tree containing some of the standard UNIX subdirectories. The /dev directory holds specifications for the devices (special files) on the system. The /etc directory holds files containing information regarding the network, accounts and other databases that are specific to the machine. The /home directory is the default directory for user accounts. The /opt directory is a standard location for applications in System V Release 4. Look for include files in the /usr/include directory. The /var directory contains system files that vary and can grow arbitrarily large (e.g., log files, or mail when it arrives but before it has been read). POSIX does not require that a file system have these subdirectories, but many systems organize their directory structure in a similar way.

Figure 5.2. Structure of a typical UNIX file system

graphics/05fig02.gif

5.3.1 UNIX file implementation

POSIX does not mandate any particular representation of files on disk, but traditionally UNIX files have been implemented with a modified tree structure, as described in this section. Directory entries contain a filename and a reference to a fixed-length structure called an inode . The inode contains information about the file size , the file location, the owner of the file, the time of creation, time of last access, time of last modification, permissions and soon.

Figure 5.3 shows the inode structure for a typical file. In addition to descriptive information about the file, the inode contains pointers to the first few data blocks of the file. If the file is large, the indirect pointer is a pointer to a block of pointers that point to additional data blocks. If the file is still larger, the double indirect pointer is a pointer to a block of indirect pointers. If the file is really huge, the triple indirect pointer contains a pointer to a block of double indirect pointers. The word block can mean different things (even within UNIX). In this context a block is typically 8K bytes. The number of bytes in a block is always a power of 2.

Figure 5.3. Schematic structure of a traditional UNIX file.

graphics/05fig03.gif

Exercise 5.11

Suppose that an inode is 128 bytes, pointers are 4 bytes long, and the status information takes up 68 bytes. Assume a block size of 8K bytes and block pointers of 32 bits each. How much room is there for pointers in the inode? How big a file can be represented with direct pointers? Indirect? Double indirect? Triple indirect?

Answer:

The single, double, and triple indirect pointers take 4 bytes each, so 128 - 68 - 12 = 48 bytes are available for 12 direct pointers. The size of the inode and the block size depend on the system. A file as large as 8192 x 12 = 98, 304 bytes can be represented solely with direct pointers. If the block size is 8K bytes, the single indirect pointer addresses an 8K block that can hold 8192 · 4 = 2048 pointers to data blocks. Thus, the single indirect pointer provides the capability of addressing an additional 2048 x 8192 = 16, 777, 216 bytes or 16 megabytes of information. Double indirect addressing provide 2048 x 2048 pointers with the capability of addressing an additional 32 gigabytes. Triple indirect addressing provides 2048 x 2048 x 2048 pointers with the capability of addressing an additional 64 terabytes. However, since 2048 ³ = 2 ³³ , pointers would need to be longer than 4 bytes to fully address this storage.

Exercise 5.12

How large a file can you access using only the single indirect, double indirect, and triple indirect pointers if the block size is 8K bytes and pointers are 64 bits?

Answer:

A block can now hold only 1024 pointers, so the single indirect pointer can address 1024 x 8192 = 8,388,608 bytes. Double indirect addressing provides 1024 x 1024 pointers with the capability of addressing an additional 8 gigabytes. Triple indirect addressing provides 1024 x 1024 x 1024 pointers with the capability of addressing an additional 8 terabytes.

Exercise 5.13

How big can you make a disk partition if the block size is 8K bytes and pointers are 32 bits? How can bigger disks be handled? What are the tradeoffs?

Answer:

32-bit addresses can access approximately 4 billion blocks (4,294,967,296 to be exact). 8K blocks give 2 ⁴⁵ 3.5 x 10 ¹³ bytes. With a block address of fixed size, there is a tradeoff between maximum partition size and block size. Larger blocks mean a larger partition for a fixed address size. The block size usually determines the smallest retrievable unit on disk. Larger blocks can be retrieved relatively more efficiently but can result in greater internal fragmentation because of partially filled blocks.

The tree-structured representation of files is fairly efficient for small files and is also flexible if the size of the file changes. When a file is created, the operating system finds free blocks on the disk in which to place the data. Performance considerations dictate that blocks of the same file should be located close to one another on the disk to reduce the seek time. It takes about twenty times as long to read a 16-megabyte file in which the data blocks are randomly placed than one in which the data blocks are contiguous.

When a system administrator creates a file system on a physical disk partition, the raw bytes are organized into data blocks and inodes. Each physical disk partition has its own pool of inodes that are uniquely numbered. Files created on that partition use inodes from that partition's pool. The relative layout of the disk blocks and inodes has been optimized for performance.

POSIX does not require that a system actually represent its files by using inodes. The ino_t st_ino member of the struct stat is now called a file serial number rather than an inode number . POSIX-compliant systems must provide the information corresponding to the mandatory members of the struct stat specified on page 155, but POSIX leaves the actual implementation unspecified. In this way, the POSIX standard tries to separate implementation from the interface.

Exercise 5.14

Give some limitations of a file implementation based on inodes.

Answer:

The file must fit entirely in a single disk partition. The partition size and maximum number of files are fixed when the system is set up.

5.3.2 Directory implementation

A directory is a file containing a correspondence between filenames and file locations. UNIX has traditionally implemented the location specification as an inode number, but as noted above, POSIX does not require this. The inode itself does not contain the filename. When a program references a file by pathname, the operating system traverses the file system tree to find the filename and inode number in the appropriate directory. Once it has the inode number, the operating system can determine other information about the file by accessing the inode. (For performance reasons, this is not as simple as it seems, because the operating system caches both directory entries and inode entries in main memory.)

A directory implementation that contains only names and inode numbers has the following advantages.

Changing the filename requires changing only the directory entry. A file can be moved from one directory to another just by moving the directory entry, as long as the move keeps the file on the same partition or slice. (The mv command uses this technique for moving files to locations within the same file system. Since a directory entry refers to an inode on the same partition as the directory entry itself, mv cannot use this approach to move files between different partitions.)
Only one physical copy of the file needs to exist on disk, but the file may have several names or the same name in different directories. Again, all of these references must be on the same physical partition.
Directory entries are of variable length because the filename is of variable length. Directory entries are small, since most of the information about each file is kept in its inode. Manipulating small variable-length structures can be done efficiently. The larger inode structures are of fixed length.

Team-FLY