Abstract Data Types (ADTs)

I l @ ve RuBoard

In programming, you try to match the data type to the needs of a programming problem. For instance, you would use the int type to represent the number of shoes you own and the float or double type to represent your average cost per pair of shoes. In the movie examples, the data formed a list of items, each of which consisted of a movie name (a C string) and rating (an int ). No basic C type matches that description, so we defined a structure to represent individual items, and then we devised a couple of methods for tying together a series of structures to form a list. In essence, we used C's capabilities to design a new data type that matched our needs, but we did so unsystematically. Now we'll take a more systematic approach to defining types.

What constitutes a type? A type specifies two kinds of information: a set of properties and a set of operations. For instance, the int type's property is that it represents an integer value and, therefore, shares the properties of integers. The allowed arithmetic operations are changing the sign, adding two ints , subtracting two ints , multiplying two ints , dividing one int by another, and taking the modulus of one int with respect to another. When you declare a variable to be an int , you're saying that these and only these operations can affect it.

Integer Properties

Behind the C int type is a more abstract concept, that of the integer . Mathematicians can, and do, define the properties of integers in a formal abstract manner. For instance, if N and M are integers, then N + M = M + N, or for every two integers N and M, there is an integer S, such that N + M = S. If N + M = S and if N + Q = S, M = Q. You can think of mathematics as supplying the abstract concept of the integer and of C as supplying an implementation of that concept. For example, C provides a means of storing an integer and of performing integer operations such as addition and multiplication. Note that providing support for arithmetic operations is an essential part of representing integers. The int type would be much less useful if all you could do was store a value but not use it in arithmetic expressions. Note also that the implementation doesn't do a perfect job of representing integers. For instance, there are an infinity of integers, but a 2-byte int can represent only 65,536 of them; don't confuse the abstract idea with a particular implementation.

Suppose you want to define a new data type. First, you have to provide a way to store the data, perhaps by designing a structure. Second, you have to provide ways of manipulating the data. For instance, consider the films2.c program (refer to Listing 17.2). It has a linked set of structures to hold the information and supplies code for adding information and displaying information. This program, however, doesn't do these things in a way that makes it clear we were creating a new type. What should we have done?

Computer science has developed a very successful way to define new data types. It's a three-step process that moves from the abstract to the concrete:

Provide an abstract description of the type's properties and of the operations you can perform on the type. This description shouldn't be tied to any particular implementation. It shouldn't even be tied to a particular programming language. Such a formal abstract description is called an abstract data type (ADT).
Develop a programming interface that implements the ADT. That is, indicate how to store the data and describe a set of functions that perform the desired operations. In C, for instance, you might supply a structure definition along with prototypes for functions to manipulate the structures. These functions play the same role for the user -defined type that C's built-in operators play for the fundamental C types. Someone who wants to use the new type will use this interface for her or his programming.
Write code to implement the interface. This step is essential, of course, but the programmer using the new type need not be aware of the details of the implementation.

Let's work through an example to see how this process works. Because you've already invested some effort into the movie listing example, let's redo it using the new approach.

Getting Abstract

Basically, all you need for the movie project is a list of items. Each item contains a movie name and a rating. You need to be able to add new items to the end of the list, and you need to be able to display the contents of the list. Let's call the abstract type that will handle these needs a list . What properties should a list have? Clearly, a list should be able to hold a sequence of items. That is, a list can hold several items, and these items are arranged in some kind of order, so you can speak of the first item in a list or of the second item or of the last item. Next , the list type should support operations such as adding an item to the list. Here are some useful operations:

Initializing a list to empty.
Adding an item to the end of a list.
Determining whether the list is empty.
Determining whether the list is full.
Determining how many items are in the list.
Visiting each item in a list to perform some action, such as displaying the item.

You don't need any further operations for this project, but a more general list of operations for lists might include the following:

Inserting an item anywhere in the list.
Removing an item from the list.
Retrieving an item from the list (list left unaltered).
Replacing one item in the list with another.
Searching for an item in the list.

The informal, but abstract, definition of a list, then, is that it is a data object capable of holding a sequence of items and to which you can apply any of the preceding operations. This definition doesn't state what kind of items can be stored in the list. It doesn't specify whether an array or a linked set of structures or some other data form should be used to hold the items. It doesn't dictate what method to use, for example, to find the number of elements in a list. These matters are all details left to the implementation.

To keep the example simple, adopt a simplified list as the abstract data type, one that embodies only the features needed for the movie project. Here's a summary of the type:

Type Name:

Simple List

Type Properties:

Can hold a sequence of items.

Type Operations:

Initialize list to empty.

Determine whether list is empty.

Determine whether list is full.

Determine number of items in the list.

Add item to end of list.

Traverse list, processing each item in list.

The next step is to develop a C-language interface for the simple list ADT.

Building an Interface

The interface for the simple list has two parts . The first part describes how the data will be represented, and the second part describes functions that implement the ADT operations. For instance, there will be functions for adding an item to a list and for reporting the number of items in the list. The interface design should parallel the ADT description as closely as possible. Therefore, it should be expressed in terms of some general Item type instead of in terms of some specific type, such as int or struct film . One way to do this is to use C's typedef facility to define Item as the needed type:

 #define TSIZE 45   /* size of array to hold title  */ struct film {    char title[TSIZE];    int rating; }; typedef struct film Item;

Then you can use the Item type for the rest of the definitions. If you later want a list of some other form of data, you can redefine the Item type and leave the rest of the interface definition unchanged.

Having defined Item , you now have to decide how to store items of that type. This step really belongs to the implementation stage, but making a decision now makes the example easier to follow. The linked structure approach worked pretty well in the films2.c program, so let's adapt it as shown here:

 typedef struct node {    Item item;    struct node * next; } Node; typedef Node * List;

In a linked list implementation, each link is called a node . Each node contains information that forms the contents of the list along with a pointer to the next node. To emphasize this terminology, we've used the tag name node for a node structure, and we've used typedef to make Node the type name for a struct node structure. Finally, to manage a linked list, we need a pointer to its beginning, and we've used typedef to make List the name for a pointer of this type. Therefore, the declaration

 List movies;

establishes movies as a pointer suitable for referring to a linked list.

Is this the only way to define the List type? No. For instance, you could incorporate a variable to keep track of the number of entries:

 typedef struct list {    Node * head;   /* pointer to head of list        */    int size;      /* number of entries in list      */ } List;           /* alternative definition of list */

You could add a second pointer to keep track of the end of the list. Later, you'll see an example that does that. For now, let's stick to the first definition of a List type. The important point is that you should think of the declaration

 List movies;

as establishing a list, not as establishing a pointer to a node or as establishing a structure. The exact data representation of movies is an implementation detail that should be invisible at the interface level.

For example, a program should initialize the head pointer to NULL when starting out, but you should not use code like this:

 movies = NULL;

Why not? Because later you might find you like the structure implementation of a List type better, and that would require the following initializations:

 movies.next = NULL; movies.size = 0;

Anyone using the List type shouldn't have to worry about such details. Instead, they should be able do something along the following lines:

 InitializeList(movies);

Programmers need to know only that they should use the InitializeList() function to initialize a list. They don't have to know the exact data implementation of a List variable. This is an example of data hiding , the art of concealing details of data representation from the higher levels of programming.

To guide the user, you can supply a function prototype along these lines:

 /* operation:        initialize a list                  */ /* preconditions:    plist points to a list             */ /* postconditions:   the list is initialized to empty   */ void InitializeList(List * plist);

There are three points you should notice. First, the comments outline preconditions , that is, conditions that should hold before the function is called. Here, for instance, you need a list to initialize. Second, the comments outline postconditions , that is, conditions that should hold after the function executes. Finally, the function uses a pointer to a list instead of a list as its argument, so this would be the function call:

 InitializeList(&movies);

The reason is that C passes arguments by value, so the only way a C function can alter a variable in the calling program is by using a pointer to that variable. Here the restrictions of the language make the interface deviate slightly from the abstract description.

The C way to tie all the type and function information into a single package is to place the type definitions and function prototypes (including precondition and postcondition comments) in a header file. This file should supply all the information a programmer needs to use the type. Listing 17.3 shows a header file for the simple list type. It defines a particular structure as the Item type, and then defines Node in terms of Item and List in terms of Node . The functions representing list operations then use Item types and List types as arguments. If the function needs to modify an argument, it uses a pointer to the corresponding type instead of using the type directly. The file capitalizes each function name as a way of marking it as part of an interface package. Also, the file uses the # ifndef technique discussed in Chapter 16, "The C Preprocessor and the C Library," to protect against multiple inclusions of a file.

Listing 17.3 The `list.h` interface header file.

 /* list.h -- header file for a simple list type */ #ifndef _LIST_H_ #define _LIST_H_ /* program-specific declarations */ #define TSIZE      45    /* size of array to hold title  */ struct film {    char title[TSIZE];    int rating; }; /* general type definitions */ typedef struct film Item; typedef enum boolean {False, True} BOOLEAN; typedef struct node {    Item item;    struct node * next; } Node; typedef Node * List; /* function prototypes */ /* operation:        initialize a list                          */ /* preconditions:    plist points to a list                     */ /* postconditions:   the list is initialized to empty           */ void InitializeList(List * plist); /* operation:        determine if list is empty                 */ /* preconditions:    l is an initialized list                   */ /* postconditions:   function returns True if list is empty     */ /*                   and returns False otherwise                */ BOOLEAN EmptyList(List l); /* operation:        determine if list is full                  */ /* preconditions:    l is an initialized list                   */ /* postconditions:   function returns True if list is full      */ /*                   and returns False otherwise                */ BOOLEAN FullList(List l); /* operation:        determine number of items in list          */ /* preconditions:    l is an initialized list                   */ /* postconditions:   function returns number of items in list   */ unsigned int ListItems(List l); /* operation:        add item to end of list                    */ /* preconditions:    item is an item to be added to list        */ /*                   plist points to an initialized list        */ /* postconditions:   if possible, function adds item to end     */ /*                   of list and returns True; otherwise the    */ /*                   function returns False                     */ BOOLEAN AddItem(Item item, List * plist); /* operation:        apply a function to each item in list      */ /* preconditions:    l is an initialized list                   */ /*                   pfun points to a function that takes an    */ /*                   Item argument and has no return value      */ /* postcondition:    the function pointed to by pfun is         */ /*                   executed once for each item in the list    */ void Traverse (List l, void (* pfun)(Item item) ); #endif

One of the prototypes in the header file is a bit more complex than the others.

 /* operation:        apply a function to each item in list      */ /* preconditions:    l is an initialized list                   */ /*                   pfun points to a function that takes an    */ /*                   Item argument and has no return value      */ /* postcondition:    the function pointed to by pfun is         */ /*                   executed once for each item in the list    */ void Traverse (List l, void (* pfun)(Item item) );

The argument pfun is a pointer to a function. In particular, it is a pointer to a function that takes an item as an argument and that has no return value. As you might recall from Chapter 14, "Structures and Other Data Forms," you can pass a pointer to a function as an argument to a second function, and the second function can then use the pointed-to function. Here, for instance, you can let pfun point to a function that displays an item. The Traverse() function would then apply this function to each item in the list, thus displaying the whole list.

Using the Interface

Our claim is that you should be able to use this interface to write a program without knowing any further details, for example, without knowing how the functions are written. Let's write a new version of the movie program right now before we write the supporting functions. Because the interface is in terms of List and Item types, the program should be phrased in those terms. Here's a pseudocode representation of one possible plan:

 Create a List variable. Create an Item variable. Initialize the list to empty. While the list isn't full and while there's more input:     Read the input into the Item variable.     Add the item to the end of the list. Visit each item in the list and display it.

The program shown in Listing 17.4 follows this basic plan, with some error-checking. Note how it makes use of the interface described in the list.h file (refer to Listing 17.3). Also note that the listing has code for a showmovies() function that conforms to the prototype required by Traverse() . Therefore, the program can pass the pointer showmovies to Traverse() so that Traverse() can apply the showmovies() function to each item in the list. (Recall that the name of a function is a pointer to the function.)

Listing 17.4 The `films3.c` program.

 /* films3.c -- using an ADT-style linked list */ #include <stdio.h> #include <stdlib.h>    /* prototype for exit() */ #include "list.h"      /* defines List, Item   */ void showmovies(Item item); int main(void) {    List movies;    Item temp;    InitializeList(&movies);    if (FullList(movies))    {       fprintf(stderr,"No memory available! Bye!\n");       exit(1);    }    puts("Enter first movie title:");    while (gets(temp.title) != NULL && temp.title[0] != `   /* films3.c -- using an ADT-style linked list */ #include <stdio.h> #include <stdlib.h> /* prototype for exit() */ #include "list.h" /* defines List, Item */ void showmovies(Item item); int main(void) { List movies; Item temp; InitializeList(&movies); if (FullList(movies)) { fprintf(stderr,"No memory available! Bye!\n"); exit(1); } puts("Enter first movie title:"); while (gets(temp.title) != NULL && temp.title[0] != `\0') { puts("Enter your rating <0-10>:"); scanf("%d", &temp.rating); while( getchar () != `\n') continue; if (AddItem(temp, &movies)== False) { fprintf(stderr,"Problem allocating memory\n"); break; } if (FullList(movies)) { puts("The list is now full."); break; } puts("Enter next movie title (empty line to stop):"); } if (EmptyList(movies)) printf("No data entered. "); else { printf ("Here is the movie list:\n"); Traverse(movies, showmovies); } printf("Bye!\n"); return 0; } void showmovies(Item item) { printf("Movie: %s Rating: %d\n", item.title, item.rating); }  
 ')    {       puts("Enter your rating <0-10>:");       scanf("%d", &temp.rating);       while(getchar() != `\n')          continue;       if (AddItem(temp, &movies)== False)       {          fprintf(stderr,"Problem allocating memory\n");          break;       }       if (FullList(movies))       {          puts("The list is now full.");          break;       }       puts("Enter next movie title (empty line to stop):");    }    if (EmptyList(movies))       printf("No data entered. ");    else    {       printf ("Here is the movie list:\n");       Traverse(movies, showmovies);    }    printf("Bye!\n");    return 0; } void showmovies(Item item) {       printf("Movie: %s  Rating: %d\n", item.title,          item.rating); }

Implementing the Interface

Of course, you still have to implement the List interface. The C approach is to collect the function definitions in a file called list.c . The complete program, then, consists of three files: list.h , which defines the data structures and provides prototypes for the user interface, list.c , which provides the function code to implement the interface, and films3.c , which is a source code file that applies the list interface to a particular programming problem. Listing 17.5 shows one possible implementation of list.c . To run the program, you must compile both films3.c and list.c and link them. (You might want to review the discussion in Chapter 9, "Functions," on compiling multiple-file programs.) Together, the files list.h, list.c , and films3.c constitute a complete program (see Figure 17.5).

Figure 17.5. The three part of a program package.

Listing 17.5 The `list.c` implementation file.

 /* list.c -- functions supporting list operations */ #include <stdio.h> #include <stdlib.h> #include "list.h" /* local functions */ static void CopyToNode(Item item, Node * pnode); /* interface functions */ /* set the list to empty   */ void InitializeList(List * plist) {    * plist = NULL; } /* returns true if list is empty */ BOOLEAN EmptyList(List l) {    if (l == NULL)       return True;    else       return False; } /* returns true if list is full */ BOOLEAN FullList(List l) {    Node * pt;    BOOLEAN full;    pt = (Node *) malloc(sizeof(Node));    if (pt == NULL)       full = True;    else       full = False;    free(pt);    return full; } /* returns number of nodes */ unsigned int ListItems(List l) {    unsigned int count = 0;    while (l != NULL)    {       ++count;       l = l->next;   /* set l to next node */    }     return count; } /* creates node to hold item and adds it to the end of */ /* the list pointed to by plist (slow implementation)  */ BOOLEAN AddItem(Item item, List * plist) {    Node * pnew;    Node * scan = *plist;  (Node *) malloc(sizeof(Node));    if (pnew == NULL)       return False;  /* quit function on failure  */    CopyToNode(item, pnew);    pnew->next = NULL;    if (scan == NULL)      /* empty list, so place */       * plist = pnew;     /* pnew at head of list */    else    {       while (scan->next != NULL)          scan = scan->next;  /* find end of list  */       scan->next = pnew;     /* add pnew to end   */    }    return True; } /* visit each node and execute function pointed to by pfun */ void Traverse (List l, void (* pfun)(Item item) ) {    while (l != NULL)    {       (*pfun)(l->item);   /* apply function to item in list */       l = l->next;    } } /* copies an item into a node */ static void CopyToNode(Item item, Node * pnode) {    pnode->item = item;  /* structure copy */ }

Program Notes

The list.c file has many interesting points. For one, it illustrates when you might use static storage class functions. As described in Chapter 13, "Storage Classes and Program Development," static functions are known only in the file where they are defined. When implementing an interface, you might find it convenient sometimes to write auxiliary functions that aren't part of the official interface. For instance, the example uses a function CopyToNode() to copy a type Item value to a type Item variable. Because this function is part of the implementation but not part of the interface, we hid it in the list.c file by making it static. Now, examine the other functions.

The InitializeList() function initializes a list to empty. In our implementation, that means setting a type List variable to NULL . As mentioned earlier, this requires passing a pointer to the List variable to the function.

The EmptyList() function is quite simple, but it does depend on the list variable being set to NULL when the list is empty. Therefore, it's important to initialize a list before first using the EmptyList() function. Also, if you were to extend the interface to include deleting items, you should make sure the deletion function resets the list to empty when the last item is deleted. Because this function doesn't alter the list, you don't need to pass a pointer as an argument, so the argument is a List instead of a pointer to a List .

With a linked list, the size of the list is limited by the amount of memory available. The FullList() function tries to allocate enough space for a new item. If it fails, the list is full. If it succeeds, it has to free the memory it just allocated so that it is available for a real item.

The ListItems() function uses the usual linked-list algorithm to traverse the list, counting items as it goes:

 unsigned int ListItems(List l) {    unsigned int count = 0;    while (l != NULL)    {       ++count;       l = l->next;   /* set l to next node */    }     return count; }

The AddItem() function is the most elaborate of the group :

 BOOLEAN AddItem(Item item, List * plist) {    Node * pnew;    Node * scan = *plist;    pnew = (Node *) malloc(sizeof(Node));    if (pnew == NULL)       return False;           /* quit function on failure */    CopyToNode(item, pnew);    pnew->next = NULL;    if (scan == NULL)          /* empty list, so place     */       * plist = pnew;         /* pnew at head of list     */    else    {       while (scan->next != NULL)          scan = scan->next;   /* find end of list         */       scan->next = pnew;      /* add pnew to end          */    }    return True; }

The first thing the AddItem() function does is allocate space for a new node. If this succeeds, the function uses CopyToNode() to copy the item to the node. Then it sets the next member of the node to NULL . This, recall, indicates that the node is the last node in the linked list. Finally, after creating the node and assigning the correct values to its members , the function attaches the node to the end of the list. If the item is the first item added to the list, the program sets the head pointer to the first item. (Remember, AddItem() is called with the address of the head pointer as its second argument, so * plist is the value of the head pointer.) Otherwise, the code marches through the linked list until it finds the item having its next member set to NULL . That node is currently the last node, so the function resets its next member to point to the new node.

Good programming practice dictates that you call FullList() before trying to add an item to the list. However, a user might fail to observe this dictate, so AddItem() checks for itself whether malloc() has succeeded. Also, it's possible a user might do something else to allocate memory between calling FullList() and calling AddItem() , so it's best to check whether malloc() worked.

Finally, the Traverse() function is similar to the ListItems() function with the addition of applying a function to each item in the list:

 void Traverse (List l, void (* pfun)(Item item) ) {    while (l != NULL)    {       (*pfun)(l->item);      /* apply function to item in list */       l = l->next;    } }

Recall that l->item represents the data stored in a node, and the l->next identifies the next node in the linked list. For example, the function call

 Traverse(movies, showmovies);

applies the showmovies() function to each item in the list.

Contemplating Your Work

Take a little time now to evaluate what the ADT approach has done for you. First, compare Listing 17.2 with Listing 17.4. Both programs use the same fundamental method (dynamic allocation of linked structures) to solve the movie listing problem, but Listing 17.2 exposes all the programming plumbing, putting malloc() and prev->next in public view. Listing 17.4, on the other hand, hides these details and expresses the program in a language that relates directly to the tasks . That is, it talks about creating a list and adding items to the list, not about calling memory functions or resetting pointers. In short, Listing 17.4 expresses the program in terms of the problem to be solved , not in terms of the low-level tools needed to solve the problem. The ADT version is oriented to the end user's concerns and is much easier to read.

Next, the list.h and list.c files together constitute a reusable resource. If you need another simple list, just haul out these files. Suppose you need to store an inventory of your relatives: names , relationships, addresses, and phone numbers . First, you would go to the list.h file and redefine the Item type:

 typedef struct itemtag {    char fname[14];    char lname [24];    char relationship[36];    char address [60];    char phonenum[20]; }  Item;

Next...well, that's all you have to do in this case because all the simple list functions are defined in terms of the Item type. In some cases, you would also have to redefine the CopyToNode() function. For example, if an item were an array, you couldn't copy it by assignment.

Another important point is that the user interface is defined in terms of abstract list operations, not in terms of some particular set of data representation and algorithms. This leaves you free to fiddle with the implementation without having to redo the final program. For instance, the current AddItem() function is a bit inefficient because it always starts at the beginning of the list, and then searches for the end. You can fix this problem by keeping track of the end of the list. For instance, you can redefine the List type this way:

 typedef struct list {    Node * head;      /* points to head of list */    Node * end;       /* points to end of list  */ } List;

Of course, you would then have to rewrite the list-processing functions using this new definition, but you wouldn't have to change a thing in Listing 17.4. This sort of isolating implementation from the final interface is particularly useful for large programming projects. It's called data hiding because the detailed data representation is hidden from the final user.

Note that this particular ADT doesn't even force you to implement the simple list as a linked list. Here's another possibility:

 #define MAXSIZE 100 typedef struct list {    Item entries[MAXSIZE];   /* array of items          */    int items;               /* number of items in list */ } List;

Again, this would require rewriting the list.c file, but the program using the list doesn't need to be changed.

I l @ ve RuBoard

Abstract Data Types (ADTs)