Java 1.4.2 Style Collections | Java For Artists: The Art, Philosophy, And Science Of Object-Oriented Programming

Java Collections Framework Overview

The Java collections framework provides interfaces, implementation classes, algorithms, and infrastructure code designed to make the manipulation of collections of objects safe, reliable, and easy. This section describes the purpose and organization of the Java collections framework. It also discusses the benefits you can expect to derive from utilizing the collections framework in your Java programs.

The Purpose Of The Collections Framework

Suppose for a moment the Java collections framework did not exist. Every Java programmer who needed to manage a collection of objects would have to create custom-designed classes like DynamicArray to satisfy their needs. These custom-designed classes would have different interface methods, questionable performance characteristics, and dubious track records from lack of rigorous and proper testing. Somewhere along the line one or more enterprising developers would offer up their custom-designed classes for use by other programmers. Some of these APIs might be open source and others might be offered as commercial products. Some would perform better than others and all might have unique features not found in the other APIs. Java programmers would be faced with learning one or more collections APIs with an eye towards getting best-of-breed performance. There would be no guarantees regarding the interoperability between different APIs. The life of the Java programmer would be a mess indeed!

The Java collections framework directly addresses these issues by providing a comprehensive API for collections management. Java developers learn one API and their collection manipulation code is guaranteed to work across deployment platforms. Custom classes can be built by implementing the required collection interface and following a few simple rules. Java developers can write more reliable code because the collections framework has been subjected to rigorous testing by the entire Java developer community. Life is good!

Framework Organization

The Java collections framework is organized into four primary areas:

a set of core collection interfaces
a set of general purpose implementation classes
ready-made algorithms for collection manipulation
infrastructure code to facilitate collection manipulation

To gain full benefit from the collections framework you need to know the following things:

the basic characteristics of each core interface and the methods supported by each
the general purpose implementation classes and their fundamental differences
how to manipulate collections using the supplied algorithms
how objects are ordered in collections
how to manipulate collections using iterators

Core Collections Interfaces

Figure 17-4 shows the core collections interface hierarchy for the Java 1.4.2 platform. As you can see from the diagram there are six core interfaces grouped into two distinct inheritance hierarchies: one with the Collection interface as the root, and the other with the Map interface as the root. Another point illustrated in figure 17-4 is that a Map is not a Collection per se, however, the Map interface does declare methods that return different views of a Map as a Collection object. A Map whose elements have been converted to a Collection can then be manipulated accordingly.

image from book
Figure 17-4: Java 1.4.2 Collections Framework Core Interface Hierarchy

Table 17-1 lists the general characteristics of each Java collections framework core interface. An understanding of each interface’s characteristics will help you select the right type of collection to suit your programming needs.

Table 17-1: Core Collection Interface Characteristics
Interface	Characteristic
Collection	The Collection interface is the base interface of all collections. Its purpose is to provide a unified way to manipulate collection objects. Some of the methods declared by the Collection interface are optiona l. An optional method may, or may not, be implemented by a general purpose collection implementation class. As with all collection classes, you should consult the API documentation to learn which methods are supported. A Collection represents a group of objects referred to as its elements. The elements of a collection can be accessed and manipulated via an Iterator.
List	The List interface represents a collection whose elements are arranged sequentially. Elements can be inserted into the list at any location. Duplicate elements are usually allowed. List elements can be accessed and manipulated via a ListIterator. A ListIterator is different from an ordinary Iterator in that it provides the means to traverse the list forwards and backwards.
Set	The Set interface represents a collection that contains no duplicate elements.
SortedSet	The SortedSet interface represents a Set whose Iterator will traverse its elements according to their natural ordering. Insertions into a sorted set may take longer but retrievals will occur within predictable time.
Map	The Map interface represents an object that stores and retrieves an element (value) whose location within the map is determined by a key. A Map cannot contain duplicate values and a key can map to only one value. Note that a Map is not a Collection although it does provide methods that return a Collection object that contains the map’s elements. The Collection object returned in this fashion can be manipulated accordingly.
SortedMap	The SortedMap interface represents a Map whose elements are stored according to their natural ordering. Insertions into a SortedMap may take longer but retrievals will occur within predictable time.

General Purpose Implementation Classes

The Java collections framework provides a set of general purpose implementation classes ready for immediate use in your programs. These classes implement the core collection interfaces, so you have, at your disposal, lists, sets, sorted sets, maps and sorted maps.

There are generally two or more flavors of each implementation class. Each flavor is based upon a fundamental underlying data structure such as an array, linked list, or a tree. (Here I use the words flavor and subtype synonymously.) To better understand how to choose the right collection class to suit your needs you must understand the general performance characteristics of each of these underlying data structures.

Array Performance Characteristics

As you know already from reading chapter 8, an array is a contiguous collection of homogeneous elements. You can have arrays of primitive types or arrays of references to objects. The general performance issues to be aware of regarding arrays concern inserting new elements into the array at some position prior to the last element, accessing elements, and searching for particular values within the array.

When a new element is inserted into an array at a position other than the end, room must be made at that index location for the insertion to take place by shifting the remaining references one element to the right. This series of events is depicted in figures 17-5 through 17-7.

image from book
Figure 17-5: Array of Object References Before Insertion

image from book
Figure 17-6: New Reference to be Inserted at Array Element 3 (index 2)

image from book
Figure 17-7: Array After New Reference Insertion

Referring to figures 17-5 through 17-7 — an array of object references contains references that may point to an object or to null. In this example array elements 1 through 4 (index values 0 through 3) point to objects while the remaining array elements point to null.

A reference insertion is really just an assignment of the value of the reference being inserted to the reference residing at the target array element. To accommodate the insertion, the values contained in references located to the right of the target element must be reassigned one element to the right. (i.e., They must be shifted to the right.) It is this shifting action which can cause a performance hit when inserting elements into an array-based collection. If the insertion triggers the array growth mechanism, then you’ll receive a double performance hit. The insertion performance penalty, measured in time, grows with the length of the array. Element retrieval, on the other hand, takes place fairly quickly because of the way array element addresses are computed.

Linked List Performance Characteristics

A linked list is a data structure whose elements stand alone in memory. (And may indeed be located anywhere in the heap!) Each element is linked to another by a reference. Unlike the elements of an array, which are ordinary references, each linked list node is a complex data structure that contains a reference to the previous node in the list, the next node in the list, and a reference to an object payload as figure 17-8 illustrates.

image from book
Figure 17-8: Linked List Node Organization

So, whereas an array’s elements are always located one right after the other in memory, and thus their memory addresses can be quickly calculated, the linked list’s elements can be, and usually are, scattered in memory hither and yonder. The nice thing about linked lists is that element insertions can take place fairly quickly because no element shifting is required. Figures 17-9 through 17-11 show the sequence of events for a circular linked list node insertion.

image from book
Figure 17-9: Linked List Before New Element Insertion

image from book
Figure 17-10: New Reference Being Inserted Into Second Element Position

image from book
Figure 17-11: References of Previous, New, and Next List Elements Must Be Manipulated

Referring to figures 17-9 through 17-11 — a linked list can contain one or more non-contiguous nodes. A node insertion requires the rewiring of the references involved. This includes setting the previous and next references on the new node in addition to resetting the affected references of its adjacent list nodes. If this looks complicated guess what? It is! And if you take a data structures class you’ll get the chance to create a linked list from scratch!

image from book
Figure 17-12: Linked List Insertion Complete

HashTable Performance Characteristics

A hashtable is an array whose elements can point to a series of nodes. Structurally, as you’ll see, a hashtable is a cross between an array and a one-way linked list. In an ordinary array elements are inserted by index value. If there are potentially many elements to insert the array space required to hold all the elements would be correspondingly large as well. This may result in wasted memory space. The hashtable addresses this problem by reducing the size of the array used to point to its elements and assigning each element to an array location based on a hash function as figure 17-13 illustrates.

image from book
Figure 17-13: A Hash Function Transforms a Key Value into an Array Index

Referring to figure 17-13 — the purpose of the hash function is to transform the key value into a unique array index value. However, sometimes two unique key values will translate to the same index value. When this happens a collision is said to have occurred. This problem is resolved by chaining together nodes that share the same hashtable index as is shown in figure 17-14.

image from book
Figure 17-14: Hash Table Collisions are Resolved by Linking Nodes Together

The benefits of a hashtable include lower initial memory overhead and relatively fast element insertions. On the other hand, if too many insertion collisions occur, the linked elements must be traversed to insert new elements or to retrieve existing elements. List traversal extracts a performance penalty.

Red-Black Tree Performance Characteristics

Collections that have the word “tree” in their name implement the SortedSet or SortedMap interface and automatically sort their elements upon insertion. These automatically sorting collections are implemented using an underlying data structure known as a red-black tree. A red-black tree is a type of binary search tree with a self-balancing characteristic. Tree nodes have an additional data element, color, that can be set to either red or black. The data elements of a red-black tree node are shown in figure 17-15.

image from book
Figure 17-15: Red-Black Tree Node Data Elements

Insertions into a red-black tree are followed by a self-balancing operation that ensures all leaf nodes are the same number of black nodes away from the root node. Figure 17-16 shows the state of a red-black tree after inserting the integer values 1 through 9 in the following insertion order: 9, 3, 5, 6, 7, 2, 8, 4, 1. (Red nodes shown lightly shaded.)

image from book
Figure 17-16: Red-Black Tree After Inserting Integer Values 9, 3, 5, 6, 7, 8, 4, 1

Referring to figure 17-16 — the numbers appearing to the left of each node represent the height of the tree in black nodes. The primary benefit associated with a red-black tree is the generally overall good node search performance that can be obtained regardless of the number of nodes the tree contains. However, because the tree reorders itself with each insertion, an insertion into a tree that contains lots of nodes will incur a performance penalty.

Think of it in terms of a clean room versus a messy room. You can store things really fast in a messy room because you just throw your stuff anywhere. Finding things in a messy room takes some time. You may have to look high and low before finding what you desire. Storing things in a clean room, conversely, takes a little while longer, but when you need something you can get to it in a hurry!

Mapping An Implementation Class To Its Underlying Data Structure

Now that you have a better idea of the performance characteristics associated with the fundamental data structures used in the Java collections framework general purpose implementation classes you need to know how to correlate the implementation class to its underlying data structure. That way you will know what performance characteristics to expect based on the class you choose.

The general purpose implementation class names are formulated by combining the name of the underlying data structure (i.e., Array, Linked, Hash, or Tree) followed by the name of the interface the class implements. (i.e., List, Set, or Map). For example, a HashSet is a Set whose underlying data structure is a hashtable; a TreeSet is a Set whose underlying data structure is implemented as a red-black tree.

Algorithms

The Java collections framework provides a ready-made set of algorithms you can use to manipulate collections. These algorithms are implemented as static methods in the Collections class. (That’s Collections with an ‘s’.) The Collections class provides methods to sort and search collections as well as find the maximum and minimum element values. The full range of Collections class functionality is too large to treat fully here, however, I recommend you explore the methods of the Collections class and get a feel for what they can do for you.

The Arrays class provides static methods to sort, search, and manipulate arrays. (I introduced you to the functionality of the Arrays class formally in chapter 8.) The Collection interface declares a method named toArray() that returns a collection’s elements as an array of Objects or as an array of a specified type. Once in the form of an array, the elements can be manipulated using the static methods provided by the Arrays class.

Infrastructure

The Java collections framework provides several classes and interfaces whose purpose is to help you manipulate collection elements (Iterator and ListIterator), create collection-aware classes (Comparable and Comparator), and catch collection-specific exceptions (UnsupportedOperationException and ConcurrentModificationException). Because these classes and interfaces are used across the entire collections framework they are referred to as infrastructure.

Quick Review

The purpose of the Java collections framework is to provide Java programmers with a unified, comprehensive way to manipulate collections of objects in their programs across deployment platforms. The Java collections framework is organized into four primary areas: 1) core collection interfaces, 2) general purpose implementation classes, 3) algorithms, and 4) infrastructure.

The core collection interfaces consist of Collection, List, Set, SortedSet, Map, and SortedMap. Map and Sorted-Map are not collections but have methods that return their elements in the form of a Collection object. The general purpose implementation classes employ arrays, linked lists, hashtables, and red-black trees as their underlying data structures.

Array elements can be quickly retrieved but insertions into the array at element positions other than the end can extract a performance penalty because element values to the right of the insertion must be copied to the right.

Linked lists overcome the problems associated with array element insertion, however, the linked list must be traversed to find a desired element. List traversal extracts a performance penalty. Hashtables are a cross between an array and a linked list.

Hashtable element storage locations are assigned based on the results of a hash function. Hashtable collisions are resolved by linking together elements that share the same hashtable element location. Hashtables can save storage space but a poor hash function may result in frequent hashtable collisions which will result in a list traversal performance penalty.

Red-black trees store their elements in sorted order upon insertion. Element insertion times grow with tree size. However, tree element access time is good overall regardless of tree size.

The Collections class contains static methods that implement collection manipulation algorithms. The Arrays class contains static methods that can be used to manipulate arrays. A collection’s elements can be converted into an array and in this form can be manipulated by the static methods of the Arrays class.

Infrastructure classes and interfaces are employed across the entire collections framework.