6.4. Page CacheIn the introductory sections, we mentioned that the page cache is an in-memory collection of pages. When data is frequently accessed, it is important to be able to quickly access the data. When data is duplicated and synchronized across two devices, one of which typically is smaller in storage size but allows much faster access than the other, we call it a cache. A page cache is how an operating system stores parts of the hard drive in memory for faster access. We now look at how it works and is implemented. When you perform a write to a file on your hard drive, that file is broken into chunks called pages, that are swapped into memory (RAM). The operating system updates the page in memory and, at a later date, the page is written to disk. If a page is copied from the hard drive to RAM (which is called swapping into memory), it can become either clean or dirty. A dirty page has been modified in memory but the modifications have not yet been written to disk. A clean page exists in memory in the same state that it exists on disk. In Linux, the memory is divided into zones.[8] Each zone has a list of active and inactive pages. When a page is inactive for a certain amount of time, it gets swapped out (written back to disk) to free memory. Each page in the zones list has a pointer to an address_space. Each address_space has a pointer to an address_space_operations structure. Pages are marked dirty by calling the set_dirty_page() function of the address_space_operation structure. Figure 6.12 illustrates this dependency.
Figure 6.12. Page Cache and Zones6.4.1. address_space StructureThe core of the page cache is the address_space object. Let's take a close look at it. ----------------------------------------------------------------------- include/linux/fs.h 326 struct address_space { 327 struct inode *host; /* owner: inode, block_device */ 328 struct radix_tree_root page_tree; /* radix tree of all pages */ 329 spinlock_t tree_lock; /* and spinlock protecting it */ 330 unsigned long nrpages; /* number of total pages */ 331 pgoff_t writeback_index;/* writeback starts here */ 332 struct address_space_operations *a_ops; /* methods */ 333 struct prio_tree_root i_mmap; /* tree of private mappings */ 334 unsigned int i_mmap_writable;/* count VM_SHARED mappings */ 335 struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings */ 336 spinlock_t i_mmap_lock; /* protect tree, count, list */ 337 atomic_t truncate_count; /* Cover race condition with truncate */ 338 unsigned long flags; /* error bits/gfp mask */ 339 struct backing_dev_info *backing_dev_info; /* device readahead, etc */ 340 spinlock_t private_lock; /* for use by the address_space */ 341 struct list_head private_list; /* ditto */ 342 struct address_space *assoc_mapping; /* ditto */ 343 }; ----------------------------------------------------------------------- The inline comments of the structure are fairly descriptive. Some additional explanation might help in understanding how the page cache operates. Usually, an address_space is associated with an inode and the host field points to this inode. However, the generic intent of the page cache and address space structure need not require this field. It could be NULL if the address_space is associated with a kernel object that is not an inode. The address_space structure has a field that should be intuitively familiar to you by now: address_space_operations. Like the file structure file_operations, address_space_operations contains information about what operations are valid for this address_space. ----------------------------------------------------------------------- include/linux/fs.h 297 struct address_space_operations { 298 int (*writepage)(struct page *page, struct writeback_control *wbc); 299 int (*readpage)(struct file *, struct page *); 300 int (*sync_page)(struct page *); 301 302 /* Write back some dirty pages from this mapping. */ 303 int (*writepages)(struct address_space *, struct writeback_control *); 304 305 /* Set a page dirty */ 306 int (*set_page_dirty)(struct page *page); 307 308 int (*readpages)(struct file *filp, struct address_space *mapping, 309 struct list_head *pages, unsigned nr_pages); 310 311 /* 312 * ext3 requires that a successful prepare_write() call be followed 313 * by a commit_write() call - they must be balanced 314 */ 315 int (*prepare_write)(struct file *, struct page *, unsigned, unsigned); 316 int (*commit_write)(struct file *, struct page *, unsigned, unsigned); 317 /* Unfortunately this kludge is needed for FIBMAP. Don't use it */ 318 sector_t (*bmap)(struct address_space *, sector_t); 319 int (*invalidatepage) (struct page *, unsigned long); 320 int (*releasepage) (struct page *, int); 321 ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov, 322 loff_t offset, unsigned long nr_segs); 323 }; ----------------------------------------------------------------------- These functions are reasonably straightforward. readpage() and writepage() read and write pages associated with an address space, respectively. Multiple pages can be written and read via readpages() and writepages(). Journaling file systems, such as ext3, can provide functions for prepare_write() and commit_write(). When the kernel checks the page cache for a page, it must be blazingly fast. As such, each address space has a radix_tree, which performs a quick search to determine if the page is in the page cache or not. Figure 6.13 illustrates how files, inodes, address spaces, and pages relate to each other; this figure is useful for the upcoming analysis of the page cache code. Figure 6.13. Files, Inodes, Address Spaces, and Pages6.4.2. buffer_head StructureEach sector on a block device is represented by the Linux kernel as a buffer_head structure. A buffer_head contains all the information necessary to map a physical sector to a buffer in physical memory. The buffer_head structure is illustrated in Figure 6.14. ----------------------------------------------------------------------- include/linux/buffer_head.h 47 struct buffer_head { 48 /* First cache line: */ 49 unsigned long b_state; /* buffer state bitmap (see above) */ 50 atomic_t b_count; /* users using this block */ 51 struct buffer_head *b_this_page;/* circular list of page's buffers */ 52 struct page *b_page; /* the page this bh is mapped to */ 53 54 sector_t b_blocknr; /* block number */ 55 u32 b_size; /* block size */ 56 char *b_data; /* pointer to data block */ 57 58 struct block_device *b_bdev; 59 bh_end_io_t *b_end_io; /* I/O completion */ 60 void *b_private; /* reserved for b_end_io */ 61 struct list_head b_assoc_buffers; /* associated with another mapping */ 62 }; ----------------------------------------------------------------------- Figure 6.14. buffer_head StructureThe physical sector that a buffer_head structure refers to is logical block b_blocknr on device b_dev. The physical memory that a buffer_head structure refers to is a block of memory starting at b_data of b_size bytes. This memory block is within the physical page of b_page. The other definitions within the buffer_head structure are used for managing housekeeping tasks for how the physical sector is mapped to the physical memory. (Because this is a digression on bio structures and not buffer_head structures, refer to mpage.c for more detailed information on struct buffer_head.) As mentioned in Chapter 4, each physical memory page in the Linux kernel is represented by a struct page. A page is composed of a number of I/O blocks. As each I/O block can be no larger than a page (although it can be smaller), a page is composed of one or more I/O blocks. In older versions of Linux, block I/O was only done via buffers, but in 2.6, a new way was developed, using bio structures. The new way allows the Linux kernel to group block I/O together in a more manageable way. Suppose we write a portion of the top of a text file and the bottom of a text file. This update would likely need two buffer_head structures for the data transfer: one that points to the top and one that points to the bottom. A bio structure allows file operations to bundle discrete chunks together in a single structure. This alternate way of looking at buffers and pages occurs by looking at the contiguous memory segments of a buffer. The bio_vec structure represents a contiguous memory segment in a buffer. The bio_vec structure is illustrated in Figure 6.15. ----------------------------------------------------------------------- include/linux/bio.h 47 struct bio_vec { 48 struct page *bv_page; 49 unsigned int bv_len; 50 unsigned int bv_offset; 51 }; ----------------------------------------------------------------------- Figure 6.15. Bio StructureThe bio_vec structure holds a pointer to a page, the length of the segment, and the offset of the segment within the page. A bio structure is composed of an array of bio_vec structures (along with other housekeeping fields). Thus, a bio structure represents a number of contiguous memory segments of one or more buffers on one or more pages.[9]
|