3.4 Elementary List Processing

Linked lists bring us into a world of computing that is markedly different from that of arrays and simple classes. With arrays and classes, we save an item in memory and later refer to it by name (or by index) in much the same manner as we might put a piece of information in a file drawer or an address book; with linked lists, the manner in which we save information makes it more difficult to access but easier to rearrange. Working with data that are organized in linked lists is called list processing.

When we use arrays, we are susceptible to program bugs involving out-of-bounds array accesses. The most common bug that we encounter when using linked lists is a similar bug where we reference an undefined object. Another common mistake is to use a reference that we have changed unknowingly. One reason that this problem arises is that we may have multiple references to the same node without necessarily realizing that that is the case. Program 3.8 avoids several such problems by using a circular list that is never empty, so that each link always refers to a well-defined node, and each link can also be interpreted as referring to the list.

Developing correct and efficient code for list-processing applications is an acquired programming skill that requires practice and patience to develop. In this section, we consider examples and exercises that will increase our comfort with working with list-processing code. We shall see numerous other examples throughout the book, because linked structures are at the heart of some of our most successful algorithms.

As mentioned in Section 3.3, we use a number of different conventions for the first and final links in a list. We consider some of them in this section, even though we adopt the policy of reserving the term linked list to describe the simplest situation.

Definition 3.3 A linked list is either a null link or a link to a node that contains an item and a link to a linked list.

This definition is more restrictive than Definition 3.2, but it corresponds more closely to the mental model that we have when we write list-processing code. Rather than exclude all the other various conventions by using only this definition, and rather than provide specific definitions corresponding to each convention, we let both stand, with the understanding that it will be clear from the context which type of linked list we are using.

Program 3.9 is an implementation of a simple list-processing task, reversing the order of the nodes on a list. It takes a linked list as an parameter, and returns a linked list comprising the same nodes, but with the order reversed. Figure 3.9 shows the change that the method makes for each node in its main loop. Such a diagram makes it easier for us to check each statement of the program to be sure that the code changes the links as intended, and programmers typically use these diagrams to understand the operation of list-processing implementations.

Figure 3.9. List reversal

To reverse the order of a list, we maintain a pointer `r` to the portion of the list already processed, and a pointer `y` to the portion of the list not yet seen. This diagram shows how the pointers change for each node in the list. We save a pointer to the node following `y` in `t`, change `y`'s link to point to `r`, and then move `r` to `y` and `y` to `t`.

graphics/03fig09.gif

Program 3.9 List reversal

This method reverses the links in a list, returning a pointer to the final node, which then points to the next-to-final node, and so forth, with the link in the first node of the original list set to null. To accomplish this task, we need to maintain links to three consecutive nodes in the list.

 static Node reverse(Node x)    { Node t, y = x, r = null;      while (y != null)        { t = y.next; y.next = r; r = y; y = t; }      return r;    }

One of the most common operations that we perform on lists is to traverse them: We scan through the items on the list sequentially, performing some operation on each. For example, if x refers to the first node of a list, the final node has a null link, and visit is a method that takes an item as a parameter, then we might write

 for (Node t = x; t != null; t = t.next) visit(t.item);

to traverse the list. This loop (or its equivalent while form) is as ubiquitous in list-processing programs as is the corresponding loop of the form for (int i = 0; i < N; i ++) in array-processing programs.

Program 3.10 illustrates the implementation of three other basic list-processing tasks: it builds a list from a sequence of numbers on standard input, rearranges the nodes of the list to put the items in sorted order, and prints out the sorted sequence. As we discuss in Chapter 6, the expected running time of this program is proportional to N², so the program is not useful for large N. Beyond this observation, we defer discussing the sort aspect of this program to Chapter 6, because we shall see a great many methods for sorting in Chapters 6 through 10. Our purpose now is to present the implementation as an example of a list-processing application.

The lists in Program 3.10 illustrate another commonly used convention: We maintain a dummy node called a head node at the beginning of each list. We ignore the item field in a list's head node but maintain its link as a reference to the node containing the first item in the list. The program uses two lists: one to collect the random input in the first loop, and the other to collect the sorted output in the second loop. Figure 3.10 diagrams the changes that Program 3.10 makes during one iteration of its main loop. We take the next node off the input list, find where it belongs in the output list, and link it into position.

Figure 3.10. Linked-list sort

This diagram depicts one step in transforming an unordered linked list (referred to by `a`) into an ordered one (referred to by `b`), using insertion sort. We take the first node of the unordered list, keeping a pointer to it in `t` (top). Then, we search through `b` to find the first node `x` with `x.next.item > t.item` (or `x.next = null`), and insert `t` into the list following `x` (center). These operations reduce the length of `a` by one node, and increase the length of `b` by one node, keeping `b` in order (bottom). Iterating, we eventually exhaust `a` and have the nodes in order in `b`.

graphics/03fig10.gif

Program 3.10 List insertion sort

This code builds a linked list with one number from standard input per node (create method), then rearranges the nodes so that the numbers appear in order on the list (sort method), then prints the numbers out in sorted order (print method). To accomplish the sort, it maintains two lists, an input (unsorted) list and an output (sorted) list. On each iteration of the loop, it removes a node from the input and inserts it into position in the output. The code is simplified by the use of head nodes for each list that contain the links to the first nodes on the lists.

 class ListSortExample    {      static class Node        { int val; Node next;          Node(int v, Node t) { val = v; next = t; }        }      static Node create()        { Node a = new Node(0, null);          for (In.init(); !In.empty(); )            a.next = new Node(In.getInt(), a.next);          return a;        }      static Node sort(Node a)        { Node t, u, x, b = new Node(0, null);          while (a.next != null)            {              t = a.next; u = t.next; a.next = u;              for (x = b; x.next != null; x = x.next)                if (x.next.val > t.val) break;              t.next = x.next; x.next = t;            }          return b;        }      static void print(Node h)        { for (Node t = h.next; t != null; t = t.next)            Out.println(t.val + ""); }      public static void main(String[] args)        { print(sort(create())); }    }

The primary reason to use the head node at the beginning becomes clear when we consider the process of adding the first node to the sorted list. This node is the one in the input list with the smallest item, and it could be anywhere on the list. We have three options:

Duplicate the for loop that finds the smallest item and start with a one-node list in the same manner as in Program 3.8.
Test whether the output list is empty every time that we wish to insert a node.
Use a dummy head node whose link points to the first node on the list, as in the given implementation.

The first option is inelegant and requires extra code; the second is also inelegant and requires extra time. Incidentally, the concise body of the main method in Program 3.8 provides an example of what is known as functional programming: it consists entirely of method invocations (function calls).

The use of a head node does incur some cost (the extra node), and we can avoid the head node in many common applications. For example, we can also view Program 3.9 as having an input list (the original list) and an output list (the reversed list), but we do not need to use a head node in that program because all insertions into the output list are at the beginning. We shall see still other applications that are more simply coded when we use a dummy node, rather than a null link, at the tail of the list. There are no hard-and-fast rules about whether or not to use dummy nodes the choice is a matter of style combined with an understanding of effects on performance. Good programmers enjoy the challenge of picking the convention that most simplifies the task at hand. We shall see several such tradeoffs throughout this book.

For reference, a number of options for linked-list conventions are laid out in Table 3.1; others are discussed in the exercises. In all the cases in Table 3.1, we use a reference head to refer to the list, and we maintain a consistent stance that our program manages references to nodes, using the given code for various operations. Instantiating nodes and filling them with information is the same for all the conventions. Robust methods implementing the same operations would have extra code to check for error conditions (for example, every use of new should be enclosed in a try-catch block, as in Program 3.5). The purpose of the table is to expose similarities and differences among the various options.

To this point, our programs have implemented the linked-list abstraction with code that directly manipulates data fields (items and links) in nodes, relying on programming conventions to ensure that the lists themselves have the desired structure. An alternative is to define a data type for the lists themselves, making the conventions an explicit part of the implementation. This approach frees client programs from tedious low-level operations. For example, Program 3.11 is a class that implements the circular-list abstraction, and Program 3.12 is our Josephus-election program (Program 3.8) recast as a client program that uses this class. The CircularList class has the responsiblity of making sure that the list is always a proper circular list, and the client program works with high-level operations such as "insert a new node into the list" rather than low-level operations such as assigning specific values to links.

Table 3.1. Head and tail conventions in linked lists
This table gives implementations of basic list-processing operations with four commonly used conventions. This type of code is used in simple applications where the list-processing code is inline.
Circular, never empty
first insert:	`head.next = head;`
insert t after x:	`t.next = x.next; x.next = t;`
remove after x:	`x.next = x.next.next;`
traversal loop:	t = head; do { ... t = t.next; } while (t != head);
test if one item:	`if (head.next == head)`
Head reference, null tail
initialize:	`head = null;`
insert t after x:	if (x == null) { head = t; head.next = null; } else { t.next = x.next; x.next = t; }
remove after x:	`t = x.next; x.next = t.next;`
traversal loop:	`for (t = head; t != null; t = t.next)`
test if empty:	`if (head == null)`
Dummy head node, null tail
initialize:	head = new Node(); head.next = null;
insert t after x:	`t.next = x.next; x.next = t;`
remove after x:	`t = x.next; x.next = t.next;`
traversal loop:	`for (t = head.next; t != null; t = t.next)`
test if empty:	`if (head.next == null)`
Dummy head and tail nodes
initialize:	head = new Node(); z = new Node(); head.next = z; z.next = z;
insert t after x:	`t.next = x.next; x.next = t;`
remove after x:	`x.next = x.next.next;`
traversal loop:	`for (t = head.next; t != z; t = t.next)`
test if empty:	`if (head.next == z)`

Program 3.11 Circular-list class

This class implements basic operations on circular linked lists. Its purpose is to allow clients to manipulate such lists and to ensure that they do so without dependence upon implementation details. Clients can insert a new node with a given value after a given node in a list (and, by convention, create a one-node list if the first parameter is null)and remove the node following a given node (remove has no effect if the list has only one node). The accessor methods next and val provide the values of fields to the clients; their use gives us the freedom to change the implementation (see Chapter 4).

 class CircularList    {      static class Node        { int val; Node next;          Node(int v) { val = v; }        }      Node next(Node x)        { return x.next; }      int val(Node x)        { return x.val; }      Node insert(Node x, int v)        { Node t = new Node(v);          if (x == null) t.next = t;          else { t.next = x.next; x.next = t; }          return t;        }      void remove(Node x)        { x.next = x.next.next; }    }

We consider a completely different class implementation with the same interface in Section 3.5. This example is yet another illustration of the client-interface-implementation scenario that we use throughout the book. We can use any class implementation that has the same methods without changing Program 3.12 at all. Identifying the important operations that we use in a computation and encapsulating them all in a single class has two advantages:

Program 3.12 Solving the Josephus problem with circular lists

This program for the Josephus problem is an example of a client program that utilizes the circular-list class of Program 3.13. The low-level details of maintaining the list structure are left to the class implementation.

 class JosephusY    {      public static void main(String[] args)        { int N = Integer.parseInt(args[0]);          int M = Integer.parseInt(args[1]);          CircularList L = new CircularList();          CircularList.Node x = null;          for (int i = 1; i <= N; i++)            x = L.insert(x, i);          while (x != L.next(x))            {              for (int i = 1; i < M; i++)                x = L.next(x);              L.remove(x);            }          Out.println("Survivor is " + L.val(x));        }    }

We can work at a higher level of abstraction in the client.
We consider different concrete implementations of critical operations and test their effectiveness.

In this case, our purpose is to expose underlying mechanisms, not find more efficient implementations.

Some programmers prefer to encapsulate all operations on data structures such as linked lists by defining methods for every operation that a client might need in classes like Program 3.11. Indeed, as just illustrated and as we shall see in Chapter 4, the Java class mechanism makes it easy to do so. However, that extra layer of abstraction sometimes masks the fact that just a few low-level operations are involved. In this book, when we are implementing higher-level interfaces, we usually write low-level operations on linked structures directly in order to clearly expose the essential details of our algorithms and data structures. We shall see many examples in Chapter 4.

By adding more links, we can add the capability to move backward through a linked list. For example, we can support the operation "find the item before a given item" by using a doubly linked list in which we maintain two links for each node: one (prev) to the item before, and another (next) to the item after. With dummy nodes or a circular list, we can ensure that x, x.next.prev, and x.prev.next are the same for every node in a doubly linked list. Figures 3.11 and 3.12 show the basic link manipulations required to implement remove, insert after, and insert before, in a doubly linked list. Note that, for remove, we do not need extra information about the node before it (or the node after it) in the list, as we did for singly linked lists that information is contained in the node itself.

Figure 3.11. Deletion in a doubly linked list

In a doubly linked list, a reference to a node is sufficient information for us to be able to remove it, as diagrammed here. Given `t`, we set `t.next.prev` to `t.prev` (center) and `t.prev.next` to `t.next` (bottom).

graphics/03fig11.gif

Figure 3.12. Insertion in a doubly linked list

To insert a node into a doubly linked list, we need to set four references, whether we want to insert the new node after a given node (diagrammed here) or before a given node. We insert a given node `t` after another given node `x` by setting `t.next` to `x.next` and `x.next.prev` to `t`(center), and then setting `x.next` to `t` and `t.prev` to `x` (bottom).

graphics/03fig12.gif

Indeed, the primary significance of doubly linked lists is that they allow us to remove a node when the only information that we have about that node is a link to it. Typical situations are when the link is passed as an parameter in a method invocation, and when the node has other links and is also part of some other data structure. Providing this extra capability doubles the space needed for links in each node and doubles the number of link manipulations per basic operation, so doubly linked lists are not normally used unless specifically called for. We defer considering detailed implementations to a few specific situations where we have such a need for example, in Section 9.5.

We use linked lists throughout this book, first for basic ADT implementations (see Chapter 4), then as components in more complex data structures. For many programmers, linked lists provide the first exposure to an abstract data structure that is under their direct control. They represent an essential tool for our use in developing the high-level abstract data structures that we need for a host of important problems, as we shall see.

Exercises

3.35 Write a method that moves the largest item on a given list to be the final node on the list.

3.36 Write a method that moves the smallest item on a given list to be the first node on the list.

3.37 Write a method that rearranges a linked list to put the nodes in even positions after the nodes in odd positions in the list, preserving the relative order of both the evens and the odds.

3.38 Implement a code fragment for a linked list that exchanges the positions of the nodes after the nodes referenced by two given links t and u.

3.40 Write a method that takes two parameters a link to a list and an object with a method that takes a link as an parameter and removes all items on the given list for which the method returns a nonzero value.

3.41 Solve Exercise 3.40, but make copies of the nodes that pass the test and return a link to a list containing those nodes, in the order that they appear in the original list.

3.42 Implement a version of Program 3.9 that uses a head node.

3.43 The create() method in Program 3.10 builds a list with the numbers in the nodes in the list appearing in the reverse of the order in which they appear in standard input. Give an implementation of that method that preserves the order.

3.44 Implement a version of Program 3.10 that does not use head nodes.

3.45 Implement a method that exchanges two given nodes on a doubly linked list.

3.47 Give an entry for Table 3.1 for a circular list that has a dummy node, which serves as both head and tail.

Top