Table and Index Management > B+-Tree Structure
I already discussed how the pager module implements a page-oriented file abstraction on the top of a native byte-oriented file. In this section, I will discuss how the B+-tree module implements a tuple (row) oriented file abstraction on the top of a page-oriented file.
Tuples of a table can be organized in many ways, such as entry sequence, relative, hash, key sequence. SQLite uses a single B+-tree to organize all tuples of a table, and different ones for different tables. SQLite treats the content of an index as a kind of table, and stores the content in a B-tree and different indexes in different B-trees. B- and B+-trees are similar kind of key sequence data structures. SQLite does not use any other tuple organization techniques. Consequently, a database is a collection of B- and B+-trees. All of these trees are spread across database pages, and they can be interspersed. But, no database page stores tuples from two or more tables or indexes. It is the duty of the tree module to organize pages of a tree so that tuples can be stored and retrieved efficiently.
In the rest of this section, I restrict myself primarily to B+-tree. I will use B+-tree for generic purpose. The module implements primitives to read, insert, and delete individual tuples from trees, and, of course, to create and delete trees. It is a passive entity as far as the structures of tuples are concerned, and it treats tuples as variable length byte strings.
B-tree, where B stands for "balanced," is by far the most important index structure in many external storage based DBMSs. It is a way of organizing a collection of similar data records in a sorted order by their keys. (A sort order is any total order on the keys. Different B-trees in the same database can have different sort orders.) B-tree is a special kind of height balanced n-ary, n > 2, tree, where all leaf nodes are at the same level. Entries[*] and search information (i.e., key values) are stored in both internal nodes and leaf nodes. B-tree provides near optimum performance over the full range of tree operations, namely insertion, deletion, search, and search-next.
[*] To avoid confusion, I use the term "entry" for a tuple or record here. An entry consists of a key and other optional data.
A B+-tree is a variant of B-tree, where all entries reside at leaf nodes. The entries are (key value, data value) pairs; and they are sorted by the key values. Internal nodes contain only search information (key values) and child pointers. Keys in an internal node are stored in the sort order, and are used in directing a search to appropriate child node.
For either kind of trees, internal nodes can have a variable number of child pointers within a preset range. For a particular implementation, there are lower and upper bounds on the number of child pointers an internal node can have. The lower bound is normally larger or equal to half of the upper bound. The root node can violate this rule; it can have any number of child pointers (from zero up to the upper bound limit). All leaf nodes are at the same lowest level, and in some implementations, they are linked in an ordered chain. The root node is always an internal node for B+-tree.
For a given upper bound of n + 1, n > 1, in B+-trees, each internal node contains at most n keys and at most n + 1 child pointers. The logical organization of keys and pointers is shown in Figure 5-1. For any internal node, the following holds:
All of the keys on the child subtree that Ptr(0) points to have values less than or equal to Key(0);
All of the keys on the child subtree that Ptr(1) points to have values greater than Key(0) and less than or equal to Key(1), and so forth;
All of the keys on the child subtree that Ptr(n) points to have values greater than Key(n - 1).
Finding a particular entry for a given key value requires traversing O(log m) nodes, where m is the total number of entries in the tree.