Section 6.1. Bytecode Programming Language

SQLite Engine > Bytecode Programming Language

Chapter 6. SQLite Engine

The topmost module of the backend is popularly called virtual database engine, or the virtual machine (VM) in SQLite terminology. The VM is the heart of SQLite, and is the interface between the frontend and the backend. Core information processing happens in it. It implements an abstraction of a new machine on the top of the native system, and it executes programs written in the SQLite internal bytecode programming language. This programming language is specifically designed to search, read, and modify databases. The VM accepts bytecode programs (generated by the frontend), and executes the programs. (You may recall that bytecode programs are prepared statements.) The VM uses the infrastructures provided by the B⁺-tree module to execute bytecode programs and to produce output of program execution.

The VM does not do any query optimization work. It blindly executes bytecode programs. In doing so, it converts data from one format into another on demand. On-the-fly-data conversion is the primary task of the VM; everything else is controlled by the bytecode programs it executes.

A bytecode program is encapsulated by an in-memory object of type sqlite3_stmt (internally called Vdbe). Following SQLite, APIs can be applied on the object to manipulate it, to execute the bytecode program, and to retrieve output produced by the program: sqlite3_bind_*, sqlite3_step, sqlite3_column_*, sqlite3_finalize.

The internal state of a Vdbe object includes the following:

a bytecode program
names and data types for all result columns
values bound to input parameters
a program counter
an execution stack of operands
an arbitrary amount of "numbered" memory cells
other run-time state information (such as open BTree objects, sorters, lists, sets)

6.1. Bytecode Programming Language

SQLite defines an internal programming language to prepare bytecode programs. The language is akin to the assembly language used by physical as well as virtual machines: it defines bytecode instructions. A bytecode instruction is of the form <opcode, P1, P2, P3>, where opcode identifies a specific bytecode operation, and P1, P2, and P3 are operands to the operation. Each bytecode operation defines a small amount of VM work. The P1 operand is a 32-bit signed integer. The P2 operand is a 31-bit non-negative integer; it is always the jump destination in any operation that might cause a jump. It is also used for other purposes. The P3 operand is a pointer to a null terminated string, or a pointer to a different structured object or a native NULL (0). Some opcodes use all three operands, some typically ignore one or two operands, and many ignore all three operands.

NOTE

Opcodes are internal VM operation names, and they are not a part of the SQLite interface specification. Consequently, their operational semantics may change from one release to another. The SQLite development team does not encourage SQLite users to write bytecode programs on their own.

Table 6-1 displays a typical bytecode program that is equivalent to SELECT * FROM t1. The table t1 has two columns, namely x and y. The top line on the table is not a part of the program. Every other line is a bytecode instruction.

Table 6-1. A typical bytecode program
Address	Opcode	P1	P2	P3
0	Goto	0	11
1	Integer	0	0
2	OpenRead	0	2	#`t1`
3	SetNumColumn	0	2
4	Rewind	0	9
5	Column	0	0	#`x`
6	Column	0	1	#`y`
7	Callback	2	0
8	Next	0	5
9	Close	0	0
10	Halt	0	0
11	Transaction	0	0
12	VerifyCookie	0	1
13	Goto	0	1

6.1.1. Program execution

The VM is an interpreter, and here is its structure:

for (; pc < nOp && rc == SQLITE_OK; pc++){    switch (aOp[pc].opcode){    case OP_Add:       /* Implementation of the ADD operation here */       break;    case OP_Goto:       pc = op[pc].p2-1;       break;    case OP_Halt:       pc = nOp;       break;    /* other cases for other opcodes */    } }

The interpreter is a simple loop containing a massive switch statement. Each case statement implements one bytecode instruction. (Opcode names are prefixed by OP_.) In each iteration, the VM fetches the next bytecode instruction from the program, i.e., from aOp array using pc (both are members of Vdbe object) as index into the array. It decodes and carries out the operation specified by the instruction. The VM begins an execution of a bytecode program starting at the instruction number 0.

The VM accesses a database using cursors. It can have zero or more open cursors on the database. Each cursor is a pointer into a single table or index tree. The cursor can seek to an entry with a particular key, or loop over all entries of the tree. The VM inserts new entries, retrieves the key/data at the current entry on the cursor, or deletes the entry.

The VM uses an operand stack and an arbitrary amount of numbered memory locations to hold all intermediate results. Many of the opcodes use operands from the stack. Computation results are also stored on the stack. Each stack or memory location holds a single data value. The memory locations are typically used to hold the result of a scalar SELECT that is part of a larger expression.

The VM continues a bytecode program execution until it processes a halt instruction or encounters an error (in the interpreter program, the rc variable stores the status of instruction executions), or the program counter points past the last instruction. When the VM halts, it releases all allocated memory, and closes all cursors. If the execution has stopped due to an error, the VM terminates the transaction or subtransaction, and removes changes made by the (sub)transaction from the database.

Table of content