Table and Index Management > Module Functionality
The module helps the VM to organize all tables and indexes into B/B+-trees: one B+-tree for each table, and one B-tree for each index. Each tree consists of one or more database pages. The VM can store and retrieve variable length records in any tree. It can delete records from a tree any time. The tree module does self-balancing on insert or delete, and it performs automatic reclamation and reuse of free space. For a tree with m tuples, the module provides the VM with O(log m) time-bound lookup, insert, and delete of records, and O(1) amortized bidirectional traversal of records. For more information on B- and B+-tree algorithms, see "The Art of Computer Programming, Volume 3: Sorting and Searching" by Donald E. Knuth.
The module receives requests for cell insertions and deletions at random from the VM. An insert operation requires allocation of space in tree (and overflow) pages. A delete operation frees up occupied space from tree (and overflow) pages. The management of free space on each page is very crucial for effective utilization of space allocated to the database.
When a page is removed from a tree, the page is added to the file freelist for later reuse. When a tree needs expansion, a page is taken off the freelist and is added to the tree. If the freelist is empty, a page is taken from the native filesystem. (Pages taken from the filesystem are always appended at the end of the database file.)
There are three partitions of free space on a tree page:
The space between the cell pointer array and the top of the cell content area.
The free blocks inside the cell content area.
Scattered fragments in the cell content area.
The space allocation is done from the former two partitions. At each allocation or deallocation, the corresponding affected partition is updated accordingly:
Cell allocation
The space allocator does not allocate space less than 4 bytes; such requests are rounded up to 4 bytes. Suppose a new request for nRequired, nRequired >= 4, bytes comes on a page when the page has is a total of nFree bytes. If nRequired > nFree, the request fails. Suppose nRequired <= nFree. The allocator performs the following steps to satisfy the request:
It traverses down the free block list to see whether there is a large enough block to satisfy the request. This is a first fit search. If a suitable block is found, it does one of the following:
If the block size is less than nRequired + 4, it removes the block from the free block list, satisfies the request from the beginning of the block, and puts the remaining space (<= 3 bytes) in the fragment partition.
Otherwise, it satisfies the request from the bottom part of the block, and reduces the size of the block by nRequired bytes.
Otherwise, there is no sufficiently large free block to satisfy the request. If there is not much space in the middle unallocated partition, or there are too many fragments, the module defragments the page first. It runs a compaction algorithm on the page to consolidate the entire free space in the middle. During the compaction, it transfers the existing cells, one after another, to the bottom of the page.
It allocates nRequired bytes from the bottom of the free space area, and increases the top value by nRequired bytes.
Cell deallocation
Suppose a request comes to release nFree(>=4) bytes that were previously allocated by the allocator. The allocator creates a new free block of size nFree bytes, and inserts the block in the free block list at appropriate position. Then, it tries to merge free blocks in the neighborhood of the released one. If there is a fragment in between two adjacent free blocks, it also merges the fragment with the blocks. If there is a free block right at the top pointer, it merges the block with the space in the middle unallocated partition, and increases the value of the top.