Introduction to Object-Oriented Programming

Object-oriented programming (or OOP for short) is the dominant programming paradigm these days, having replaced the "structured," procedural programming techniques that were developed in the early 1970s. Java is totally object oriented, and it is impossible to program it in the procedural style that you may be most comfortable with. We hope this section especially when combined with the example code supplied in the text and on the companion web site will give you enough information about OOP to become productive with Java.

Let's begin with a question that, on the surface, seems to have nothing to do with programming: How did companies like Compaq, Dell, Gateway, and the other major personal computer manufacturers get so big, so fast? Most people would probably say they made generally good computers and sold them at rock-bottom prices in an era when computer demand was skyrocketing. But go further how were they able to manufacture so many models so fast and respond to the changes that were happening so quickly?

Well, a big part of the answer is that these companies farmed out a lot of the work. They bought components from reputable vendors and then assembled them. They often didn't invest time and money in designing and building power supplies, disk drives, motherboards, and other components. This made it possible for the companies to produce a product and make changes quickly for less money than if they had done the engineering themselves.

What the personal computer manufacturers were buying was "prepackaged functionality." For example, when they bought a power supply, they were buying something with certain properties (size, shape, and so on) and a certain functionality (smooth power output, amount of power available, and so on). Compaq provides a good example of how effective this operating procedure is. When Compaq moved from engineering most of the parts in its machines to buying many of the parts, it dramatically improved its bottom line.

OOP springs from the same idea. Your program is made of objects, with certain properties and operations that the objects can perform. Whether you build an object or buy it might depend on your budget or on time. But, basically, as long as objects satisfy your specifications, you don't care how the functionality was implemented. In OOP, you only care about what the objects expose. So, just as computer manufacturers don't care about the internals of a power supply as long as it does what they want, most Java programmers don't care how an object is implemented as long as it does what they want.

Traditional structured programming consists of designing a set of procedures (or algorithms) to solve a problem. After the procedures were determined, the traditional next step was to find appropriate ways to store the data. This is why the designer of the Pascal language, Niklaus Wirth, called his famous book on programming Algorithms + Data Structures = Programs (Prentice Hall, 1975). Notice that in Wirth's title, algorithms come first, and data structures come second. This mimics the way programmers worked at that time. First, you decided how to manipulate the data; then, you decided what structure to impose on the data to make the manipulations easier. OOP reverses the order and puts data first, then looks at the algorithms that operate on the data.

The key to being most productive in OOP is to make each object responsible for carrying out a set of related tasks. If an object relies on a task that isn't its responsibility, it needs to have access to another object whose responsibilities include that task. The first object then asks the second object to carry out the task. This is done with a more generalized version of the procedure call that you are familiar with in procedural programming. (Recall that in the Java programming language these procedure calls are usually called method calls.)

In particular, an object should never directly manipulate the internal data of another object, nor should it expose data for other objects to access directly. All communication should be through method calls. By encapsulating object data, you maximize reusability, reduce data dependency, and minimize debugging time.

Of course, just as with modules in a procedural language, you will not want an individual object to do too much. Both design and debugging are simplified when you build small objects that perform a few tasks, rather than building humongous objects with internal data that are extremely complex, with hundreds of procedures to manipulate the data.

The Vocabulary of OOP

You need to understand some of the terminology of OOP to go further. The most important term is the class, which you have already seen in the code examples of Chapter 3. A class is the template or blueprint from which objects are actually made. This leads to the standard way of thinking about classes: as cookie cutters. Objects are the cookies themselves. When you construct an object from a class, you are said to have created an instance of the class.

As you have seen, all code that you write in Java is inside a class. The standard Java library supplies several thousand classes for such diverse purposes as user interface design, dates and calendars, and network programming. Nonetheless, you still have to create your own classes in Java to describe the objects of the problem domains of your applications and to adapt the classes that are supplied by the standard library to your own purposes.

Encapsulation (sometimes called data hiding) is a key concept in working with objects. Formally, encapsulation is nothing more than combining data and behavior in one package and hiding the implementation of the data from the user of the object. The data in an object are called its instance fields, and the procedures that operate on the data are called its methods. A specific object that is an instance of a class will have specific values for its instance fields. The set of those values is the current state of the object. Whenever you invoke a message on an object, its state may change.

It cannot be stressed enough that the key to making encapsulation work is to have methods never directly access instance fields in a class other than their own. Programs should interact with object data only through the object's methods. Encapsulation is the way to give the object its "black box" behavior, which is the key to reuse and reliability. This means a class may totally change how it stores its data, but as long as it continues to use the same methods to manipulate the data, no other object will know or care.

When you do start writing your own classes in Java, another tenet of OOP makes this easier: classes can be built by extending other classes. Java, in fact, comes with a "cosmic superclass" called Object. All other classes extend this class. You will see more about the Object class in the next chapter.

When you extend an existing class, the new class has all the properties and methods of the class that you extend. You supply new methods and data fields that apply to your new class only. The concept of extending a class to obtain another class is called inheritance. See the next chapter for details on inheritance.

Objects

To work with OOP, you should be able to identify three key characteristics of objects:

The object's behavior what can you do with this object, or what methods can you apply to it?
The object's state how does the object react when you apply those methods?
The object's identity how is the object distinguished from others that may have the same behavior and state?

All objects that are instances of the same class share a family resemblance by supporting the same behavior. The behavior of an object is defined by the methods that you can call.

Next, each object stores information about what it currently looks like. This is the object's state. An object's state may change over time, but not spontaneously. A change in the state of an object must be a consequence of method calls. (If the object state changed without a method call on that object, someone broke encapsulation.)

However, the state of an object does not completely describe it, because each object has a distinct identity. For example, in an order-processing system, two orders are distinct even if they request identical items. Notice that the individual objects that are instances of a class always differ in their identity and usually differ in their state.

These key characteristics can influence each other. For example, the state of an object can influence its behavior. (If an order is "shipped" or "paid," it may reject a method call that asks it to add or remove items. Conversely, if an order is "empty," that is, no items have yet been ordered, it should not allow itself to be shipped.)

In a traditional procedural program, you start the process at the top, with the main function. When designing an object-oriented system, there is no "top," and newcomers to OOP often wonder where to begin. The answer is, you first find classes and then you add methods to each class.

TIP

A simple rule of thumb in identifying classes is to look for nouns in the problem analysis. Methods, on the other hand, correspond to verbs.

For example, in an order-processing system, some of these nouns are:

Item
Order
Shipping address
Payment
Account

These nouns may lead to the classes Item, Order, and so on.

Next, look for verbs. Items are added to orders. Orders are shipped or canceled. Payments are applied to orders. With each verb, such as "add," "ship," "cancel," and "apply," you identify the one object that has the major responsibility for carrying it out. For example, when a new item is added to an order, the order object should be the one in charge because it knows how it stores and sorts items. That is, add should be a method of the Order class that takes an Item object as a parameter.

Of course, the "noun and verb" rule is only a rule of thumb, and only experience can help you decide which nouns and verbs are the important ones when building your classes.

Relationships Between Classes

The most common relationships between classes are

Dependence ("uses a")
Aggregation ("has a")
Inheritance ("is a")

The dependence, or "uses a" relationship, is the most obvious and also the most general. For example, the Order class uses the Account class because Order objects need to access Account objects to check for credit status. But the Item class does not depend on the Account class, because Item objects never need to worry about customer accounts. Thus, a class depends on another class if its methods manipulate objects of that class.

TIP

Try to minimize the number of classes that depend on each other. The point is, if a class A is unaware of the existence of a class B, it is also unconcerned about any changes to B! (And this means that changes to B do not introduce bugs into A.) In software engineering terminology, you want to minimize the coupling between classes.

The aggregation, or "has-a" relationship, is easy to understand because it is concrete; for example, an Order object contains Item objects. Containment means that objects of class A contain objects of class B.

NOTE

Some methodologists view the concept of aggregation with disdain and prefer to use a more general "association" relationship. From the point of view of modeling, that is understandable. But for programmers, the "has-a" relationship makes a lot of sense. We like to use aggregation for a second reason the standard notation for associations is less clear. See Table 4-1.

Table 4-1. UML Notation for Class Relationships
Relationship	UML Connector
Inheritance
Interface Inheritance
Dependency
Aggregation
Association
Directed Association

The inheritance, or "is-a" relationship, expresses a relationship between a more special and a more general class. For example, a RushOrder class inherits from an Order class. The specialized RushOrder class has special methods for priority handling and a different method for computing shipping charges, but its other methods, such as adding items and billing, are inherited from the Order class. In general, if class A extends class B, class A inherits methods from class B but has more capabilities. (We describe inheritance more fully in the next chapter, in which we discuss this important notion at some length.)

Many programmers use the UML (Unified Modeling Language) notation to draw class diagrams that describe the relationships between classes. You can see an example of such a diagram in Figure 4-1. You draw classes as rectangles, and relationships as arrows with various adornments. Table 4-1 shows the most common UML arrow styles.

Figure 4-1. A class diagram

NOTE

A number of tools are available for drawing UML diagrams. Several vendors offer high-powered (and high-priced) tools that aim to be the focal point of your development process. Among them are Rational Rose (http://www.ibm.com/software/awdtools/developer/modeler) and Together (http://www.borland.com/together). Another choice is the open source program ArgoUML (http://argouml.tigris.org). A commercially supported version is available from GentleWare (http://gentleware.com). If you just want to draw a simple diagrams with a minimum of fuss, try out Violet (http://horstmann.com/violet).

OOP Contrasted with Traditional Procedural Programming Techniques

We want to end this short introduction to OOP by contrasting OOP with the procedural model that you may be more familiar with. In procedural programming, you identify the tasks to be performed and then you do the following:

Use a stepwise refinement process: Break the task to be performed into subtasks, and these into smaller subtasks until the subtasks are simple enough to be implemented directly (this is the top-down approach).
Write procedures to solve simple tasks and combine them into more sophisticated procedures until you have the functionality you want (this is the bottom-up approach).

Most programmers, of course, use a mixture of the top-down and bottom-up strategies to solve a programming problem. The rule of thumb for discovering procedures is the same as the rule for finding methods in OOP: look for verbs, or actions, in the problem description. The important difference is that in OOP, you first isolate the classes in the project. Only then do you look for the methods of the class. And there is another important difference between traditional procedures and OOP methods: each method is associated with the class that is responsible for carrying out the operation.

For small problems, the breakdown into procedures works very well. But for larger problems, classes and methods have two advantages. Classes provide a convenient clustering mechanism for methods. A simple web browser may require 2,000 procedures for its implementation, or it may require 100 classes with an average of 20 methods per class. The latter structure is much easier for a programmer to grasp. It is also much easier to distribute over a team of programmers. The encapsulation built into classes helps you here as well: classes hide their data representations from all code except their own methods. As Figure 4-2 shows, this means that if a programming bug messes up data, it is far easier to search for the culprit among the 20 methods that had access to that data item than among 2,000 procedures.

Figure 4-2. Procedural vs. OO programming

You may say that this doesn't sound much different from modularization. You have certainly written programs by breaking up the program into modules that communicate with each other through procedure calls only, not by sharing data. This (if well done) goes far in accomplishing encapsulation. However, in many programming languages, the slightest sloppiness in programming allows you to get at the data in another module encapsulation is easy to defeat.

There is a more serious problem. While classes are factories for multiple objects with the same behavior, you cannot get multiple copies of a useful module. Suppose you have a module encapsulating a collection of orders, together with a spiffy balanced binary tree module to access them quickly. Now it turns out that you actually need two such collections, one for the pending orders and one for the completed orders. You cannot simply link the order tree module twice. And you don't really want to make a copy and rename all procedures for the linker to work! Classes do not have this limitation. Once a class has been defined, it is easy to construct any number of instances of that class type (whereas a module can have only one instance).

We have only scratched a very large surface. The end of this chapter has a short section on "Class Design Hints." For more information on understanding the OO design process, consult one of the many books on OO and UML. We like The Unified Modeling Language User Guide by Grady Booch, Ivar Jacobson, and James Rumbaugh (Addison-Wesley, 1999).