9.5 Minimal Public Interfaces for Reusable Components

< Free Open Study >

In 1989 I constructed a commercial C++ class library along with two other developers. We first discussed the classes we would include in the library and came up with the usual assortment of data structures, mathematical entities, collection classes, and primitive data type mimics. Since we were shipping source code, we wanted our code to look consistent to the users of the library. This started us on the subject of coding standards and guidelines. One of the topics we explored was the construction of a minimal public interface that all classes in the library would implement [14]. We discussed the types of operations we wanted, as well as their signatures and abstract behavior. I believe that a minimal public interface should be established if classes are to be reused. The minimal public interface gives users of a collection of reusable classes a basis for understanding the collection's architecture. Users come to expect a minimal functionality from anything they use in the collection.

A Minimal Public Interface for All Reusable Classes

Constructor.

All classes that have data members should have a constructor (initialization) message that initializes that data. In addition, the class should be defined in such a way that it is not possible for users of the class to create objects in an invalid state, namely, a state for which one or more methods of the class are unprepared. Some languages provide better support for this constraint than others. Languages that have automatic calls to constructors (e.g., C++) are particularly useful in guaranteeing that the user building an object of the class has passed through one of the constructors of that class.

Destructor.

In general, only classes that need to clean up a portion of their object require a destructor or cleanup function. However, often a class that does not require this cleanup is extended in a way that does require cleanup. For this reason, it is common to add empty destructors for extensibility.

Copying Objects.

The notion of all classes knowing how to make a copy of their objects seems reasonable. There is a bit of a problem in languages that distinguish containment by value and containment by reference (e.g., C++). In these languages, what do we mean by "copying"? Consider the Point objects shown in Figure 9.8. They have an example of containment by reference (the color field). Is P2 a copy of P1 ? Is P3 a copy of P1 ? What's the relationship between P2 and P3 ?

Figure 9.8. Shallow versus deep copying.

graphics/09fig08.gif

Before answering any of our questions, we need to consider the fact that containment by reference implies two different types of copying. These are called deep copy and shallow copy. A deep copy of an object is a copy of the entire structure, not just copies of pointers. The original object and its deep copy do not share any memory space. A shallow copy of an object is a copy of the first-level data members. If one of the data members is a pointer, then only the pointer value is copied , not the structure associated with it. The original object and its shallow copy share memory space.

Many real-world examples illustrate this distinction. If you go into a restaurant and tell the waiter/waitress that you want the meal that the person at the next table is eating , you are implying that you want a deep copy of that meal. A shallow copy of the meal would have the waiter/waitress put your fork in the other person's plate and walk away. In this application, a shallow copy of the meal is not adequate. On the other hand, if you are flying on a plane, you would hope that all of the air traffic controllers have a shallow copy of the air traffic map. It would not do any good for each air traffic controller to have a deep copy. Such a situation would imply that if air traffic controller X adds a new plane to the airport's air space, air traffic controllers Y and Z will not see it. This application clearly calls for shallow copying of the airport's air space.

In this example, P2 is a shallow copy of P1 because it shares the memory for the color string. P3 is a deep copy of P1 because it has its own memory for the color string. Technically speaking, P3 is a deep copy of P2 in that they are structurally equivalent but do not share memory. It is important to note that it is not possible to tell whether P1 is a shallow copy of P2 or P2 is a shallow copy of P1 . That information is lost in this implementation of shallow copy. Some implementations add flags to the objects in order to determine the original from its shallow copies. Others use a technique called reference counting to make the difference between the two meaningless. We will explore the reference-counting technique later in this chapter.

Assigning Objects.

If each class understands how to copy its objects, then it must understand what it means to assign one of its objects to another of its objects. The only question is whether to use shallow or deep copy. We chose deep copy since it is the least likely to cause side effects to its implementations. In many languages, shallow copies are easiest to implement with some level of side effect. Users of the class should not be aware of whether or not it uses containment by reference in its implementation.

Equality Testing.

Given the differences between shallow and deep copying of objects, what does it mean for an object to be equal to another object? Since there exist two methods of copying objects, there must also exist two methods to test equality. We call these equal and same. The equal method tests for structural equivalence, while the same method tests for memory sharing. In our example, all three point objects are equal to one another since structurally they are equivalent. Only P1 and P2 are the same since they share data. P3 is not the same with either P1 or P2 .

Print.

All classes should have a method that knows how to print out its objects in some format; many choose ASCII text. The need for this operation goes back to action-oriented programming, where many developers had set and get operations for each of their data structures. These are useful for debugging applications that use the class in question or for a minimal form of persistence (along with a parse method).

Parse.

All classes should have a parse method that knows how to initialize an object based on the output of the corresponding print method. Given a print method and a parse method that share output/input, the class can be said to have a form of minimal persistence. A user can tell an object of the class to print itself to a file and, at a later date, use the parse function to recreate that object from the description stored in the file. This is very useful for testing and debugging the use of a class in an application.

Self-Test.

Brad Cox [15] published an analogy between hardware reuse and software reuse. Hardware used to cost the earth until we decided to build everything from a set of standard, well-defined components. Now software costs the earth. In order to make software development cheap, we need to define an equivalent set of standard, well-defined components from which all of our software can be built. There are many problems with this analogy, including the fact that the economic models of hardware and software development are different. Hardware costs are all paid for during the development phase. The cost of manufacturing is insignificant compared to the up-front investment. In the case of software, the cost of development is often a small percentage of the cost of the software over its lifetime. Maintenance and extensibility are often the expenses parts of the software lifecycle. In fact, cheap development costs often imply more expensive maintenance in the future, because designers did not take the time to install extensibility hooks into their software.

Having said this, we still borrowed a little of this analogy after reasoning that hardware failure does not imply searching the smallest details to detect the problem. If you were to turn on your PC and it failed, you would not immediately start worrying that you had lost your all-important data on the hard drive. Nor would you drag out an oscilloscope and start checking individual chips until you found a defective one. The PC provides board-level diagnostics to at least attempt to pinpoint the area of the PC causing the problem. We feel that each class should have an equivalent, component-level test mechanism, which we named self-test . The self-test method is a class-specific method (as opposed to all others in the interface, which are object methods) that builds several objects of the class and exercises the public interface, checking for correct results. Some developers pointed out that if the constructor of the class is flawed, we really cannot test much of the class's public interface. We agree. However, if you turn on your PC and nothing happens, you cannot really test much of the PC either. Having that little knowledge does give you much useful information. You do not worry about your hard disk failing, or memory failing. You know to verify that the PC is plugged in, its switch is on, power is being supplied to the wall socket, etc. Likewise, if our self-test method fails outright , we know to check the constructor for problems. Like the PC analogy, there may or may not be many additional problems with the object in question.

This method proved invaluable in the porting of our library from one compiler/platform to another. It was reassuring to have information like, "33 of our 40 classes passed their self-test, but seven need further examination for portability problems." Of course, the self-test function is only as good as the person who wrote it. Like all good testing procedures, it should be written by someone other than the developer of the class. If the developer thought to test something, then he or she probably did not make that mistake in the development of that class.

Several recent publications have been critical of minimal public interfaces, describing them as misguided efforts [16,17]. The authors of this material demonstrate classes whose semantics seem to break under each of the items listed in this minimal interface. I will avoid this particular argument and refuse to get too religious about the above interface. I invite the reader to check the provided reference, and he or she can be the final judge. I will leave you with the thought that many of the users of the class library that motivated the minimal interface found it useful as a learning hook into the library. Having used one class, they knew something about all of the other classes. The fact that there exist exceptions to the support of such an interface doesn't bother me or, presumably, other users of minimal interfaces.

< Free Open Study >