Section 3.1. Types of Data Structures

3.1 Types of Data Structures

In this chapter, we plan to discuss the following three major types of data structures:

Lists
Stacks and queues
Arrays and matrices

Lists, stacks, and queues are all linear data structures; the term linear referring to the fact that, conceptually, you're dealing with a set of elements arranged in the form of a line. The use of SQL is well-suited to such structures because they map easily onto the relational table paradigm. Arrays and matrices, on the other hand, are multidimensional data structures. You can use SQL to work with arrays and matrices, but it's better-suited for linear structures.

3.1.1 Lists

Lists are the most common linear data structure in computing. In contrast to other programming languages, lists are the easiest type of data structure to implement in SQL. A list is a sequence of elements with a known order. A list is unlimited in size and can shrink or grow on demand. Elements can be added to a list, removed from a list, and you can query to see whether a given element is in a list.

As you can see, the properties of a list are very similar to the properties of a SQL table, with the exception that a list has an order to it. That is why lists are so easy to implement. SQL tables are also unlimited in size, can be queried, and allow for entries to be added (inserted) and removed (deleted).

In this chapter we use a SQL table with an index to define a list:

 CREATE TABLE List (    Id INTEGER,     ... )

A list in SQL is, therefore, a table with at least two attributes. One attribute is an index identifying the order of each element in the list. Any other attributes are data-carrying attributes, meaning that they hold the actual data in the list.

When you implement a list in SQL, it's very helpful if your list index is arithmetically increasing. By arithmetically, we mean that your list index should increase by 1 as new elements are appended to the list. We are going to assume in this chapter's recipes that our lists have arithmetically increasing indices.

All basic list operations are directly implementable in SQL. INSERT is used to add a new element to a list, DELETE is used to remove an element from a list, and SELECT is used to query for elements by value or by index. These operations are so straightforward that we won't show recipes for them in this chapter.

Lists are of particular interest in SQL, since they allow for pattern matching and for cumulative aggregation queries. The most important pattern-matching queries are for finding regions , runs, and sequences. Having said that, let's take a look at what those terms mean.

3.1.1.1 Cumulative aggregation queries

A cumulative aggregation query is similar to a running aggregate query, with only one difference. A running aggregate is the type produced by the GROUP BY clause in conjunction with an aggregate function, and it generates one aggregate value for each group of elements. Cumulative aggregates, on the other hand, include all elements from the beginning of the list. For example, you might produce a report showing sales in January, sales in January and February, sales in January through March, and so forth. While running aggregates are used mainly in statistics, cumulative aggregates are used mainly in combinatorial problems.

3.1.1.2 Regions

A region is a continuous pattern in a list in which all values of the elements are the same. A typical usage of such a pattern is when you look for empty slots in a list. Let's say that you have a list where elements can be either empty or full slots. If you are looking for five sequential empty slots, you are actually looking for a region of that size. The most typical example is when you build a warehouse application and you have a triple-size container. To store it, you need to find an empty slot that could fit three containers of regular size. Therefore, you are looking for a region of size three.

3.1.1.3 Runs

A run is a continuous pattern in which values are increasing monotonically. Each value is larger than the previous one. For example, say that you have a list of consecutively taken measurements for a volcano. In a given period of time, you may find that the temperature of the volcano were consistently rising . The measurements in that period of time, each representing a higher temperature than the one previous, would be considered a run.

3.1.1.4 Sequences

A sequence is a run in which values form an arithmetic progression with a difference of 1. Each value is exactly one increment larger than the previous value. With respect to the volcano example that we used to describe a run, a series of temperature measurements in which each measurement was exactly 1 degree higher than the previous would represent a sequence.

3.1.2 Stacks and Queues

When building a software system, there is often a requirement to serialize data that is being processed . In such cases, the following special linear structures are particularly useful:

Stack
Queue
Priority queue

3.1.2.1 Stacks

A stack is a special kind of list in which all addition and removal of elements takes place on one side. A stack is a last in, first out (LIFO) structure. The LIFO concept is one that accountants and computer programmers are equally likely to understand. Each element that is added to a stack is put on top of all the others. Only the topmost item is accessible at any given time, and items in a stack are removed from the top down.

The position of an element in the stack is maintained with its index. The top of the stack is the element with the highest index. To find the highest index, a simple query is needed:

 SELECT MAX(Id) FROM Stack

Typical operations on a stack are the following:

PUSH: To add a new element onto a stack
POP: To remove the top element from the stack
TOP: To find the topmost element

3.1.2.2 Queues

A queue is a list to which items are added at the top and from which items are removed at the bottom. Thus, a queue implements a first in, first out (FIFO) mechanism. In all other aspects, a queue is exactly the same as a stack. Queues are probably the most used data structure of all because they provide a robust serialization mechanism.

Typical operations on queues are the following:

TOP: To find the top element
ENQUEUE: To add a new element to the queue
DEQUEUE: To remove an element from the queue

In a multiclient application, you can use a queue as a temporary repository for data. For example, several users might collect data through forms on their computers and store the data for each form in a queue. On the server side, you might have one or more interface applications that DEQUEUE and process the sets of data ENQUEUED by the client applications. Serialization is achieved because all client applications place their data into the queue, and all server-side applications pull that data out in the same order in which it was placed into the queue.

3.1.2.3 Priority queues

A priority queue is a special kind of a queue in which the top element ” the element that is next in line to be DEQUEUED ” is not the oldest one. A priority queue uses a priority property to find the top candidate, thus implementing a first in, priority out (FIPO) structure. No matter what the input order, items are DEQUEUED in priority order. The easiest mechanism for implementing a priority queue is to order the elements in the queue by the value of one of their properties. For example, given a queue of scheduling data, you might prioritize based on time, with the most imminent events taking priority over those that are farther out into the future. As you'll see, it is very easy in SQL to implement priority queues once you have the basic mechanisms for stacks and queues in place.

3.1.3 Arrays and Matrices

As opposed to linear data structures, multidimensional data structures are not very natural to SQL. In this chapter, we discuss two types of multidimensional data structure:

Arrays
Matrices

It is possible to implement such structures in SQL, but their manipulation soon becomes inefficient, and it is probably worthwhile considering an external implementation where feasible . If you need to perform an operation on arrays or matrices and if you are using MS SQL 2000, consider the TABLE datatype. It provides a convenient way to work with arrays; however, it is limited only to stored procedures and can be used only as temporary storage.

3.1.3.1 Arrays

An array is a multidimensional data structure that, rather than having just one value, can have many values associated with it. An array can have one or more dimensions. A one-dimensional array allows you to store a list of values. A two-dimensional array allows you to store a grid of values, essentially a list of lists. A three-dimensional array allows you to store a cube of values, and so forth. SQL programmers typically must make some compromises when working with arrays to achieve reasonable performance. The most widely used implementation of an array is to have one column in a table store the index of dimension in an array and have other columns store values for the element in question. In other words, each row in an array table stores the coordinates of one array element together with its values.

One of the biggest dangers when working with arrays is to break an array's structure. Therefore, you have to build some mechanism into your implementation to ensure that your array structures remain consistent. Otherwise, operations with arrays are fairly simple. The addition of a new element is as simple as an INSERT with new coordinates. Likewise, the removal of an element is as simple as executing a DELETE.

3.1.3.2 Matrices

A matrix is a special case of array. Matrices have only two dimensions, must be finite, and their indices must arithmetically increase (no gaps). We show operations on matrices to demonstrate that SQL can be used for algebraic operations. However, it is unlikely that you will have much use for matrices in business applications. In the recipes, we show you how to perform arithmetic operations on matrices, how to transpose matrices, how to print matrices, and more.

When working with matrices, SQL can be useful if you store a large number of small matrices in one table. An advantage over other languages is that SQL can easily perform an operation on many elements of a matrix. Other programming languages require you to use a FOR loop or some other kind of looping mechanism to step through all elements of a matrix to perform the operation.