SQLite Basics

If you have used SQL with other database systems, the language elements will be familiar to you, but it is still worth following the tutorial to see some of the differences between SQLite's implementation of SQL and the version you are used to.

If you are new to SQL, you will begin to pick up the basics of the language by following the examples, and we will go into more depth on each topic in the following chapters.

Prerequisites

To follow the examples in this tutorial you will need access to a workstation or space on a server system on which you can create a SQLite database.

You will also need the sqlite command-line tool to be installed and in your path. Full installation instructions can be found in Appendix A, "Downloading and Installing SQLite," but for now the quickest way to get started is to use one of the precompiled sqlite binaries from http://www.hwaci.com/sw/sqlite/download.html.

Remember, SQLite writes its databases to the filesystem and does not require a database server to be running. The single executable sqlite (or sqlite.exe on Windows systems) is all that you need.

Obtaining the Sample Database

All of the code in this book is available for download on the Sams Publishing website at http://www.samspublishing.com/. Enter this book's ISBN (without the hyphens) in the Search box and click Search. When the book's title is displayed, click the title to go to a page where you can download the code to save you from retyping all the CREATE TABLE commands that follow. However, you don't have to download the sample database as we have included the full set of commands required for the tutorial in this book.

Creating and Connecting to a Database

SQLite stores its databases in files, and you should specify the filename or path to the file when the sqlite utility is invoked. If the filename given does not exist, a new one is created; otherwise, you will connect to the database that you specified as the filename.

 $ sqlite demodb SQLite Version 2.8.12 Enter ".help" for instructions sqlite>

Even without executing any SQLite commands, without any tables being created, connecting to a database as indicated in the preceding example will create an empty database file with the name specified on the command line.

 $ ls l -rw-r--r--    1 chris    chris           0 Apr  1 12:00 demodb

No file extension is required or assigned by sqlite; the database file is created with the exact name specified. So how do we know that this is a SQLite database? You could opt for a consistent file extension, for example using somename.db (as long as .db is a unique extension for your system) or use a well-organized directory structure to separate your data from your program files.

When there is something inside your database, you can identify the file as a SQLite database by taking a look at the first few bytes using a binary-safe paging program such as less. The first line you will see will look like this:

 ** This file contains an SQLite 2.1 database **

We'll examine the format of the database file in more detail later in this chapter, but for now there's nothing more you need to know about creating and connecting to databases in SQLite. The CREATE DATABASE statement that you may be used to is not used.

Executing SQL from a File

The sample database is defined by a series of CREATE TABLE statements in a file called demodb.sql and there are some sample data records in sampledata.sql.

To execute the SQL in these files, use the .read command in sqlite.

 $ sqlite demodb SQLite version 2.8.12 Enter ".help" for instructions sqlite> .read demodb.sql

Nothing will be displayed on screen unless there is an error in the SQL filesilence means that the commands were executed successfully.

It is also possible to execute an SQL file by redirecting it into sqlite from the command line, for instance:

 $ sqlite demodb < sampledata.sql

As before, there will only be any output to screen if there are errors in the SQL file.

The Sample Database

The sample database used in this tutorial is a very simple time-tracking system that might be used by a company to log employees' hours against client projects. There are five tables in the system, with the schema illustrated in Figure 2.1.

Figure 2.1. Schema of the sample database.

The tables holding employee and client details are very simple for this tutorial. In a real system the number of fields in these tables would be much higher, but for simplicity we've limited the employees and clients tables to just the basics.

 CREATE TABLE employees (   id INTEGER PRIMARY KEY,   first_name CHAR NOT NULL,   last_name CHAR NOT NULL,   sex CHAR NOT NULL,   email CHAR NOT NULL ); CREATE TABLE clients (   id INTEGER PRIMARY KEY,   company_name CHAR,   contact_name CHAR,   telephone CHAR );

Each table has an id field specified as INTEGER PRIMARY KEY, which we'll look at in more detail shortly. In a nutshell, this field is used to autogenerate a unique id number for each record added so that we don't have to.

Suppose that the company is working on, or has undertaken in the past, more than one job for each client and wants to keep the time spent on each project separate. To enable them to associate the hours on a timesheet with a particular project, we will create a projects table that looks like this:

 CREATE TABLE projects (   code CHAR PRIMARY KEY,   client_id INTEGER NOT NULL,   title CHAR NOT NULL,   start_date INTEGER,   due_date INTEGER );

This time, rather than having SQLite assign our primary key, we've specified it as CHAR so that the company would assign its own project codes and employees would know the textual codes of the projects they are working on.

There is no direct relationship between employees and projects or clientswe have assumed that it is possible that any member of the workforce could be working on any of the current projects. Our timesheets table creates a link between projects and employees when they have done some work and records the date and how long they spent on that project.

 CREATE TABLE timesheets (   id INTEGER PRIMARY KEY,   employee_id INTEGER NOT NULL,   project_code CHAR NOT NULL,   date_worked INTEGER NOT NULL,   hours DECIMAL(5,2) NOT NULL );

The final table, employee_rates, stores data on how much employees get paid. Rather than a single attribute in the employees table, this system allows us to remember the previous rates that an employee was paid before any raises were given. If we wanted to recalculate the cost of a project from some time ago, we would want to use the rates of pay in effect at the time and not the present ones.

 CREATE TABLE employee_rates (   employee_id INTEGER NOT NULL,   rate DECIMAL(5,2),   start_date INTEGER NOT NULL,   end_date INTEGER );

Typeless Data Types

The way in which SQLite stores data is built upon the realization that strong typing of data, as found in virtually every RDBMS on the market, is actually not a useful thing. A database is designed to store and retrieve data, plain and simple, and as such the developer should not have to declare a column as numeric, textual, or binary, or tie it to any other specific data type. Needing to do so is a legacy weakness of the underlying system that had to be reflected in SQL, not a feature added to the language.

SQLite is "typeless." It makes no difference whether you put a text string into an integer column or try to shove binary data into a text type. In fact you canand shouldspecify the column types when each table is created. SQLite's SQL implementation allows you to do this, but it's actually ignored by the engine.

That said, it's still a good idea to include the column data types in your CREATE TABLE statements to help you think through the database design thoroughly and as a reminder to yourself and a hint to other programmers as to what your intended use of each column was. Suppose, also, that in the future you want to migrate or mirror your data onto an RDBMS that does require column typingyou'll be glad your table definitions are well documented.

You can always retrieve the CREATE TABLE statement used to create a table using the .schema command from within sqlite:

 sqlite> .schema employees CREATE TABLE employees (   id INTEGER PRIMARY KEY,   first_name CHAR NOT NULL,   last_name CHAR NOT NULL,   sex CHAR NOT NULL,   email CHAR NOT NULL );

In the demo database, all the data types were declared using familiar data type labels, but you may have noticed that we did not specify any lengths for the CHAR types, which would usually be required.

In fact the syntax of SQLite's CREATE TABLE statement specifies the column type attribute as optional. A valid column type is defined as any sequence of zero or more character strings (other than reserved words) optionally followed by one or two signed integers in parentheses.

Listing 2.1 shows a CREATE TABLE statement with some examples of ANSI SQL data types that are also valid in SQLite.

Listing 2.1. Creating a Table with ANSI SQL Data Types

 CREATE TABLE dummy (   mychar CHAR(6),   myvarchar VARCHAR(10)   mydecimal DECIMAL(7,2),   myinteger INTEGER,   mytinyint TINYINT,   myfloat FLOAT,   mydate DATE );

Listing 2.2 creates a table identical to the one in Listing 2.1, with imaginary column types to show that SQLite pays no attention to the actual types given.

Listing 2.2. Creating a Table with Made-Up Data Types

 CREATE TABLE dummy (   mychar CHARACTER(6),   myvarchar VARIABLE LENGTH CHARACTER   mydecimal DECIMAL NUMBER(7,2),   myinteger INT,   mytinyint SMALL NUMBER,   myfloat FLOATING POINT NUMBER,   mydate CALENDAR DATE );

Finally, Listing 2.3 shows that the same table can actually be created, if you are in a real hurry, without specifying any column types whatsoever.

Listing 2.3. Creating a Table Without Specifying Data Types

 CREATE TABLE dummy (   mychar,   myvarchar,   mydecimal,   myinteger,   mytinyint,   myfloat,   mydate );

The INTEGER PRIMARY KEY

You saw in the preceding section that the words that make up the name of a data type in the CREATE TABLE statement have no bearing on the data that can be stored in the table. There is actually one exceptionif the column is defined exactly as INTEGER PRIMARY KEY, the value in the column must be a 32-bit signed integer; otherwise, an insert instruction will fail.

Note

INTEGER PRIMARY KEY must be used exactly as written here. If it's abbreviated, for instance INT PRIMARY KEY, the column will be created typeless.

So why, with the decision that data typing was a bad idea, has SQLite implemented one column type that's different? The reason for an INTEGER PRIMARY KEY is to allow for the equivalent of an AUTOINCREMENT or IDENTITY column.

You can insert any 32-bit signed integer value into an INTEGER PRIMARY KEY column, or insert a NULL to tell SQLite to assign the value itself. Inserting a NULL will cause a value of one greater than the largest value already stored in that column to be used.

If the largest key value is already the maximum value that the column type allows2^311, or 2147483647then it's not possible to use a greater value for a new key. In this situation, SQLite picks a new key randomly.

We have used an INTEGER PRIMARY KEY called id on the employees, clients, and timesheets tables so that as new records are added, we do not need to assign this value ourselves.

Working with Dates

In our sample tables, we have declared the date fields as INTEGER. Though SQLite has included some date and time functions since version 2.8.7, the typeless nature of SQLite does not provide a native storage format for dates. In our demo database tables, using an integer representation of the date serves our purposes well, for instance timesheets.date_worked is declared as INTEGER.

The SQLite date functions are based on a string date format, and we will examine them in detail in the next chapter.

Of course, SQLite will not enforce that the value stored in our (typeless) date columns must be an integer, but the use of INTEGER in the CREATE TABLE statement shows our intention for this field.

Using an integer date allows us to use one of two common formats. The UNIX timestamp format is the number of seconds since midnight on January 1, 1970 (known as the epoch), and a signed 32-bit integer is capable of storing dates up to 2037.

Some date calculations are made very easy using the timestamp format; for instance, to add one day to a value you just need to add 86400the number of seconds in a dayto it. Most programming languages that SQLite can be embedded in work with this date format, and the SQLite date functions are also able to convert to and from a timestamp.

A more readable integer format is to present a date as YYYYMMDD (or YYYYMMDDHHIISS if the time is also required). It's much easier to see that 20040622 is June 22, 2004, than its equivalent timestamp value of 1322222400.

Although the YYYYMMDD format does not make date arithmetic possible without splitting up the individual components or converting to some other format, it can still be used for comparisons because an earlier date will always be represented as a lower integer value than a later date.

For our sample system, we are storing the dates that an employee worked and a due date for each project, so we might want to find out the hours worked in a particular month or how many projects are overdue, but we don't need to do any calculations and the time element is not required. Therefore, YYYYMMDD is a suitable date format for this system.

Inserting the Sample Data

First of all, we'll add some employees and clients to the system.

 INSERT INTO employees (id, first_name, last_name, sex, email) VALUES (101, 'Alex', 'Gladstone', 'M', 'alex@mycompany.com'); INSERT INTO employees (id, first_name, last_name, sex, email) VALUES (102, 'Brenda', 'Dudson', 'F', 'brenda@mycompany.com'); INSERT INTO employees (id, first_name, last_name, sex, email) VALUES (103, 'Colin', 'Aynsley', 'M', 'colin@mycompany.com'); INSERT INTO employees (id, first_name, last_name, sex, email) VALUES (104, 'Debbie', 'Churchill', 'F', 'peter@mycompany.com'); INSERT INTO clients (id, company_name, contact_name, telephone) VALUES (501, 'Acme Products', 'Mr R. Runner', '555-6800'); INSERT INTO clients (id, company_name, contact_name, telephone) VALUES (502, 'ABC Enterprises', 'Mr T. Boss', '555-2999'); INSERT INTO clients (id, company_name, contact_name, telephone) VALUES (503, 'Premier Things Ltd', 'Mr U. First', '555-4001');

Note

The values in the primary key columns arbitrarily begin at 101 for employees and 501 for clients. Using different sequences will make it easier to show whether an id number refers to an employee or a client when we are selecting data in this tutorial. It doesn't actually create a problem if the primary key values of different tables overlap.

Figures 2.2 and 2.3 show the contents of the employees and clients tables after this data has been loaded.

Figure 2.2. Contents of the `employees` table.

Figure 2.3. Contents of the `clients` table.

The sample data we have loaded specifies each id value explicitly so that, for instance, we can be sure that projects belong to the right client. However, if we add a new client to the system using the following statement, it will be given a new id of the next highest available number:

 sqlite> INSERT INTO clients (company_name, contact_name, telephone)    ...> VALUES ('Integer Primary Key Corp', 'Mr A. Increment', '555-1234');

We don't have to go trawling through the data to find out what value SQLite assigned for the new primary key. The function last_insert_rowid() will return the integer value that was generated.

 sqlite> select last_insert_rowid(); 504

Next we'll insert details of the clients' projects. The due_date field is optional, and in the case of the second item in the following code being an ongoing project, we give it a NULL value.

 INSERT INTO projects (code, client_id, title, start_date, due_date) VALUES ('ACME01', 501, 'Ordering system', 20030401, 20031231); INSERT INTO projects (code, client_id, title, start_date, due_date) VALUES ('ABCCONS', 502, 'Ongoing consultancy', 20030601, NULL); INSERT INTO projects (code, client_id, title, start_date, due_date) VALUES ('PREM3K', 503, 'Thing 3000', 20031201, 20040430); INSERT INTO projects (code, client_id, title, start_date, due_date) VALUES ('PREM4K', 503, 'Thing 4000', 20031201, 20040731); INSERT INTO projects (code, client_id, title, start_date, due_date) VALUES ('PREM5K', 503, 'Thing 5000', 20031201, 20041031);

Figure 2.4 shows the contents of the projects table after these INSERT statements have been executed. Notice that the values in the client_id column correspond to records in the clients table, shown in Figure 2.3.

Figure 2.4. Contents of the `projects` table.

The following statements insert a rate of pay for each employee. The start_date is the date they joined the company and in most cases that rate is still in effect. We indicate that a rate is current by the end_date field being NULL.

Debbie Churchill, employee id 104, was given a raise, so there are two entries. Her hours worked before January 1, 2004, were paid at $20; from this date onwards she will be paid $25.

 INSERT INTO employee_rates (employee_id, rate, start_date, end_date) VALUES (101, 30.00, 20030401, NULL); INSERT INTO employee_rates (employee_id, rate, start_date, end_date) VALUES (102, 15.00, 20020601, NULL); INSERT INTO employee_rates (employee_id, rate, start_date, end_date) VALUES (103, 25.00, 20011001, NULL); INSERT INTO employee_rates (employee_id, rate, start_date, end_date) VALUES (104, 20.00, 20010401, 20031231); INSERT INTO employee_rates (employee_id, rate, start_date, end_date) VALUES (104, 25.00, 20040101, NULL);

Figure 2.5 shows the records inserted into the employee_rates table by the preceding statements. The employee_id column references records from the employees tablethe values in this column correspond to the id column in Figure 2.2.

Figure 2.5. Contents of the `employee_rates` table.

Finally we have some timesheet information. This is only a small sample of information that might be in such a system where new data is added every day, but should be sufficient for demonstration purposes.

 INSERT INTO timesheets (employee_id, project_code, date_worked, hours) VALUES (101, 'ACME01', 20031229, 4); INSERT INTO timesheets (employee_id, project_code, date_worked, hours) VALUES (101, 'ABCCONS', 20031229, 2); INSERT INTO timesheets (employee_id, project_code, date_worked, hours) VALUES (101, 'PREM3K', 20040102, 6); INSERT INTO timesheets (employee_id, project_code, date_worked, hours) VALUES (102, 'ACME01', 20031229, 3); INSERT INTO timesheets (employee_id, project_code, date_worked, hours) VALUES (102, 'PREM4K', 20040102, 5); INSERT INTO timesheets (employee_id, project_code, date_worked, hours) VALUES (103, 'ABCCONS', 20031229, 2); INSERT INTO timesheets (employee_id, project_code, date_worked, hours) VALUES (103, 'PREM4K', 20040102, 3); INSERT INTO timesheets (employee_id, project_code, date_worked, hours) VALUES (103, 'ACME01', 20031229, 8); INSERT INTO timesheets (employee_id, project_code, date_worked, hours) VALUES (104, 'ACME01', 20040102, 8); INSERT INTO timesheets (employee_id, project_code, date_worked, hours) VALUES (104, 'ABCCONS', 20040102, 4);

Figure 2.6 shows the contents of the timesheets table after we have inserted this sample data. Notice that this table references two others, employees and clientsreferenced by employee_id and project_code respectively.

Figure 2.6. The `timesheets` table after it has had the sample data inserted.

BLOBs and Binary Data

When you define a column as BLOB the column is still typeless, though as you saw earlier this does tell SQLite to use a text-based comparison when sorting on the field. SQLite is fully capable of storing binary data in any column, provided that there are no NUL (0x00, ASCII character zero) characters in the data.

As SQLite will store strings in its columns much more often than binary data, a NUL is the terminator for a variable-length character string and must be encoded somehow if it appears inside a piece of binary data.

Encoding can be done at the application level, for instance by encoding to base-64 or with URL-style encoding where NUL becomes its hex value represented as %00, and the percent character itself is encoded to %25.

SQLite also includes a pair of functions to encode and decode binary data, sqlite_encode_binary() and sqlite_decode_binary(). How binary data is encoded for any given column is an implementation choice left to the developer, so you can use supplied functions, adapt them from the sources in src/encode.c, or create your own.