SQL Equivalents

Everything that I have done so far can be done using the Database object's SQLSelect and SQLExecute methods. Because SQL is a programming language of its own, I do not have the time to fully explain or demonstrate its wonders, which is unfortunate because you can do much, much more with SQL than you can with REALbasics database controls and classes (it's the classic trade-off of simplicity-of-use against flexibility).

The SQL supported by the database varies according to which database you are using. That means that you should really look to documentation about the particular database engine you will be using to find out exactly which SQL statements they support, as well as what kind of extensions to SQL they have implemented. There is a lot of information online for the open source database, including SQLite, MySQL, and Postgres, and you can get to it at the following websites:

http://sqlite.org/
http://mysql.org/
http://postgresql.org/

SQLite supports a large part of the SQL specification, including many advanced features. It's really very cool. Here is a quick overview of what you will find:

SELECT UPDATE INSERT DELETE REPLACE COPY BEGIN TRANSACTION COMMIT TRANSACTION ROLLBACK TRANSACTION END TRANSACTION CREATE INDEX REINDEX DROP INDEX CREATE TABLE ALTER TABLE DROP TABLE CREATE TRIGGER DROP TRIGGER CREATE VIEW DROP VIEW ON CONFLICT clause PRAGMA

SQL is not case sensitive, but there is a common practice of using uppercase letters for SQL keywords such as SELECT, INSERT, and so on. This makes the SQL a little easier to read.

Delete Statement

The following statement deletes a record from the Subscriptions table:

DELETE FROM Subscriptions WHERE Name='choate.info'

Insert Statement

The INSERT statement inserts a new record into the Subscriptions table:

INSERT INTO Subscriptions (Name, URL) VALUES ('A Site Name', 'an URL')

Update Statement

Suppose I want to change the Name field in a record where the URL is http://w3c.org. The following UPDATE statement takes care of it:

UPDATE Subscriptions SET Name='W3C' WHERE URL='http://w3c.org'

Creating Indexes

In addition to replacing the Insert, Update, and Delete methods of the REALbasic database classes, you can also use SQL to create indexes. Each index you create has a name, and you need to specify the name of the table and the particular field that should be indexed:

CREATE INDEX NameIndex ON Subscriptions(Name)

Select and Joins

A relational database is so named because you are able to relate fields in two different tables based on the existence in each table of fields that share a common value. Suppose I create another table, called Feeds, using the following SQL statement:

CREATE TABLE feeds (ID VarChar Not Null, LastDownloaded DateTime, Content VarChar)

The field in this table that I will use to link to the other table will be the ID field. I will use the URL field in the other table. I could have called this field URL as well, because they both will contain URLs, but I did it this way because it avoids some ambiguity and simplifies the example. Note that the fields on which two tables are joined do not have to share a common name, only a common value, but very often they do share a common name because they tend to represent the same thing and are thus called the same thing. I could join these two tables with the following SQL statement:

SELECT * FROM Subscriptions,Feeds WHERE URL=ID

If the fields shared a common name, I would have to use the fully qualified name of the fields to avoid any ambiguity, and the statement would look like this:

SELECT * FROM Subscriptions,Feeds WHERE Subscriptions.URL= Feeds.URL

The result of the first select statement would be a RecordSet that contained the following fields:

Name URL ID LastDownloaded Content

You work with the RecordSet in the same way that you would work with a RecordSet that was the result of a query against one table, rather than a join of two. There is one exception, however, and that is that most databases will not let you update or delete the database from a joined RecordSet.

You can use a join with a DatabaseQuery control and a DataControl control, but the same rule applies for DataControls with respect to joinsnow adding, updating, or deleting is allowed.

Data Normalization

There is a tendency on the part of new database developers to try to fit everything into one table because it is simpler. REALbasic's data controls tend to reinforce this tendency because they make it so easy to work with tables. In the long run, especially with relatively more complex data, you should begin to take a different approach and use multiple tables for the data in your application. This process is called data normalization, which is one of those phrases for which there are many definitions and no clear hard-and-fast rules.

I can go back to the RSSReader application to provide a more specific example. Suppose that instead of storing the cached XML files as text files, I want to store the data in a database. The first step, then, would be to create a table that had fields that represented the data in question. In fact, the members of the Group and Item classes introduced previously can serve as a good starting point, so we could create a table using a SQL statement like this:

[View full width]
CREATE TABLE Feeds (Title VarChar, Guid VarChar, Description VarChar, ModifiedDate  DateTime, CreationDate DateTime)

Although these fields represent the common data that the different RSS formats have, there is also a lot of data in some formats that is not available in others; in addition, some of the data is considered optional and is not available in all RSS files of that type. One strategy is to add additional fields to this table that will hold data that any individual feed may or may not have, but after a while, your database will be taking up more information than it needs to and won't be as efficient over time. There will also be situations where you may want to add a field at a later time, which you can do with the SQL statement ALTER TABLE, but there are some limitations on what you can do with that statement.

The other solution, and generally a better one overall, is to place those optional or unique fields in separate tables and reference the main table. For example, some RSS feeds are broken down into categories, but not all of them. If I were to create a separate table for categories, I would do this:

CREATE TABLE Categories(FeedGuid VarChar, Category VarChar)

This table would be linked to the other by way of the FeedGuid. It would contain the Guid value for the Feeds record it is associated with. Doing this also opens up another possibility, which is that you can now have multiple categories for the same feed. If you had decided instead to use the single table method and added a category field, you would have space for only one category. If you thought you might have more than one category, you might add a second field, calling it category2 or something similar. Either way, you have to decide in advance how many fields to associate with categories. The very ugly alternative is to have multiple categories listed in one field, separated by some delimiter. Regardless of the single-table alternative you choose, it's a lot better to use a separate table. You have much more flexibility overall.

Inner/Outer Joins

Previously, I showed an example of a JOIN similar to this:

SELECT * FROM Subscriptions,Categories WHERE URL=FeedGuid

An alternative syntax uses the word JOIN, like so:

SELECT * From Subscriptions JOIN Categories ON Subscriptions.URL = Categories.FeedUrl

There is more than one kind of JOIN. By default, all JOINS are INNER JOINS, so the only reason to use INNER is to make it clear to the next developer what you are doing. The problem comes when you want to get results from a search that includes records from the Subscriptions table, even if they are not referenced in the Feeds table. With the previous JOIN, none of those records would show up. In this case, you can use a LEFT OUTER JOIN to accomplish this.

To illustrate a JOIN, consider the following two tables:

Table 8.1. Subscriptions Table
Name	URL
Choate.info	http://www.choate.info/Blog/rss.xml
REAL Software	http://www.realsoftware.com/rss.xml

Table 8.2. Categories Table
FeedGuid	Category
Choate.info/Blog/rss.xml	REALbasic
Choate.info/Blog/rss.xml	Cocoon
http://w3c.org/	HTML

If I were to do a LEFT JOIN on the previous two tables, I would get results that included only choate.info. If I wanted to include all records from the Subscriptions table, whether or not the record had been assigned a category, I could use a LEFT OUTER JOIN:

[View full width]
SELECT * From Subscriptions LEFT OUTER JOIN Categories ON Subscriptions.URL = Categories .FeedUrl

Now, if I wanted to include all categories whether or not a record in the Subscriptions table matched, I could do a RIGHT OUTER JOIN:

[View full width]
SELECT * From Subscriptions RIGHT OUTER JOIN Categories ON Subscriptions.URL = Categories .FeedUrl

Primary Key

A primary key is a field used as the basis for JOINS. As such, it has certain characteristics, the most important of which is that it has to be unique. Because the primary key is unique, it serves as a unique identifier for that particular record, so that when you search for a particular value for the primary key, you will get only one record back. When you are joining a table with a primary key with another table, the other table must have what is called a foreign key. This is a field whose value will reference the primary key of the first table. Unlike primary keys, foreign keys aren't a special type of field, and they do not have to be unique.

Trigger

A trigger is an action that is automatically taken by the database when a certain condition occursusually the insertion, deletion, or updating of a record. If you have a database with multiple tables and the records in each table relate to records in other tables, deleting a record in one table becomes a more complex process. In database parlance, you need to maintain referential integrity. In other words, if you have one record that refers to a second record and the second record gets deleted, you need to do something with the first record because it is now pointing to nothing. That may or may not be what you want. Either way, triggers are one way to do this because you can create a trigger that says when record x gets deleted in table one, then delete record x1 in table two.

Example:

CREATE TRIGGER delete_refs DELETE ON Subscriptions   BEGIN     DELETE Categories WHERE FeedUrl = old.URL;   END;

Now, whenever you delete a record in the Subscriptions table, any references to that record in the Categories table are deleted as well.