Vectors Versus Linked Lists | Working with Arrays and HashTables

One is usually picked over the other based on the specific type of application being written, how much data it needs to store, and how quickly it needs to be able to retrieve it. For the sake of speaking the same vocabulary, let's look at both these storage mechanisms in brief.

Vectors

A vector is a contiguous block of memory that contains successive pieces of data at regular intervals. Easily the most ubiquitous example of a vector is the string variable (char* or char[]), which contains a sequence of characters (bytes) one after the next.

char foo[4] = "bar";

Here, foo[0] contains the letter 'b'; immediately afterward, you would find the letter 'a' in foo[1] and so on ending up with a NULL character '' at foo[3].

Almost as common, pointers to other structures are stored in vectors as well such as in the zval vector you saw last chapter while using the zend_get_parameters_array_ex() function. There, you saw var_dump() declare a zval*** function variable and then allocate space to store the zval** pointers, which would ultimately come from the zend_get_parameters_ex() call.

zval ***args = safe_emalloc(ZEND_NUM_ARGS(), sizeof(zval**), 0);

Similar to accessing characters in strings, the var_dump() implementation used args[i] to pass individual zval** elements, in turn, to the php_var_dump() internal function.

The biggest advantage to vectors is the speed with which individual elements can be accessed at runtime. A variable reference such as args[i] is quickly calculated as being the data at the address pointed to by args plus i times the size of args' data type. Storage space for this index structure is allocated and freed in a single, efficient call.

Linked Lists

Another common approach to data storage is the linked list. With a linked list, every data element is a struct with at least two properties: A pointer to the next item in the list, and some piece of actual data. Consider the following hypothetical data structure:

typedef struct _namelist namelist;
struct {
 struct namelist *next;
 char *name;
} _namelist;

An application using this data struct might have a variable declared as

static namelist *people;

The first name in the list can be found by checking the name property of the people variable: people->name; the second name is accessed by following the next property: people->next->name, and then people->next->next->name, and so on until next is NULL meaning that no more names exist in the list. More commonly, a loop might be used to iterate through such a list:

void name_show(namelist *p)
{
 while (p) {
 printf("Name: %s
", p->name);
 p = p->next;
 }
}

Such lists are very handy for FIFO chains where new data is added to the end of a list as it comes in, leaving another branch or thread to handle consuming the data:

static namelist *people = NULL, *last_person = NULL;
void name_add(namelist *person)
{
 person->next = NULL;
 if (!last_person) {
 /* No one in the list yet */
 people = last_person = person;
 return;
 }
 /* Append new person to the end of the list */
 last_person->next = person;

 /* Update the list tail */
 last_person = person;
}
namelist *name_pop(void)
{
 namelist *first_person = people;
 if (people) {
 people = people->next;
 }
 return first_person;
}

New namelist structures can be shifted in and popped out of this list as many times as necessary without having to adjust the structure's size or block copy elements between positions.

The form of the linked list you just saw is singly linked, and while it has some interesting features, it also has some serious weaknesses. Given a pointer to one item in the linked list, it becomes difficult to cut that element out of the chain and ensure that the prior element will be properly linked to the next one.

In order to even know what the prior element was, it's necessary to iterate through the entire list until the element to be removed is found within the next property of a given temp element. On large lists this can present a significant investment in CPU time. A simple and relatively inexpensive solution to this problem is the doubly linked list.

With the doubly linked list, every element gets an additional pointer value indicating the location of the previous element:

typedef struct _namelist namelist;
struct {
 namelist *next, *prev;
 char *name;
} _namelist;

When an element is added to a doubly linked list, both of these pointers are updated accordingly:

void name_add(namelist *person)
{
 person->next = NULL;
 if (!last_person) {
 /* No one in the list yet */
 people = last_person = person;
 person->prev = NULL;
 return;
 }
 /* Append new person to the end of the list */
 last_person ->next = person;
 person->prev = last_person;

 /* Update the list tail */
 last_person = person;
}

So far you haven't seen any advantage to this, but now imagine you have an arbitrary namelist record from somewhere in the middle of the people list and you want to remove it. In the singly linked list you'd need to do something like the following:

void name_remove(namelist *person)
{
 namelist *p;
 if (person == people) {
 /* Happens to be the first person in the list */
 people = person->next;
 if (last_person == person) {
 /* Also happens to be the last person */
 last_person = NULL;
 }
 return;
 }
 /* Search for prior person */
 p = people;
 while (p) {
 if (p->next == person) {
 /* unlink */
 p->next = person->next;
 if (last_person == person) {
 /* This was the last element */
 last_person = p;
 }
 return;
 }
 p = p->next;
 }
 /* Not found in list */
}

Now compare that code with the simpler approach found in a doubly linked list:

void name_remove(namelist *person)
{
 if (people == person) {
 people = person->next;
 }
 if (last_person == person) {
 last_person = person->prev;
 }
 if (person->prev) {

 person->prev->next = person->next;
 }
 if (person->next) {
 person->next->prev = person->prev;
 }
}

Rather than a long, complicated loop, delinking this element from the list requires only a simple set of reassignments wrapped in conditionals. A reverse of this process also allows elements to be inserted at arbitrary points in the list with the same improved efficiency.

HashTablesThe Best of Both Worlds

Although you'll quite likely use vectors or linked lists in a few places in your application, there exists one more type of collection that you'll end up using even more: The HashTable.

A HashTable is a specialized form of a doubly linked list that adds the speed and efficiency of vectors in the form of lookup indices. HashTables are used so heavily throughout the Zend Engine and the PHP Core that an entire subset of the Zend API is devoted to handling these structures.

As you saw in Chapter 2, "Variables from the Inside Out," all userspace variables are stored in HashTables as zval* pointers. In later chapters you'll see how the Zend Engine uses HashTables to store userspace functions, classes, resources, autoglobal labels, and other structures as well.

To refresh from Chapter 2, a Zend Engine HashTable can literally store any piece of data of any size. Functions, for example, are stored as a complete structure. Autoglobals are smaller elements of just a few bytes, whereas other structures such as variables and PHP5 class definitions are simply stored as pointers to other structs located elsewhere in memory.

Further into this chapter you'll look at the function calls that make up the Zend Hash API and how you can use these methods in your extensions.

The PHP Life Cycle

Variables from the Inside Out

Memory Management

Setting Up a Build Environment

Your First Extension

Returning Values

Accepting Parameters

Working with Arrays and HashTables

The Resource Data Type

PHP4 Objects

PHP5 Objects

Startup, Shutdown, and a Few Points in Between

INI Settings

Accessing Streams

Implementing Streams

Diverting the Stream

Configuration and Linking

Extension Generators

Setting Up a Host Environment

Advanced Embedding

Appendix A. A Zend API Reference

Appendix B. PHPAPI

Appendix C. Extending and Embedding Cookbook

Appendix D. Additional Resources