3.1 Building Blocks | Algorithms in Java, Parts 1-4 (3rd Edition) (Pts.1-4)

In this section, we review the primary low-level constructs that we use to store and process information in Java. All the data that we process on a computer ultimately decompose into individual bits, but writing programs that exclusively process bits would be tiresome indeed. Types allow us to specify how we will use particular sets of bits, and methods allow us to specify the operations that we will perform on the data. We use Java classes to describe the information that we process, to define the methods for processing them, and to make objects that actually hold the information. All of our data structures are comprised of objects and references to objects. In this section, we consider these basic Java mechanisms, in the context of presenting a general approach to organizing our programs. Our primary goal is to lay the groundwork for the development, in the rest of the chapter and in Chapters 4 and 5, of the higher-level constructs that will serve as the basis for most of the algorithms that we consider in this book.

We write programs that process information derived from mathematical or natural-language descriptions of the world in which we live; accordingly, computing environments need to provide built-in support for the basic building blocks of such descriptions numbers and characters. In Java, our programs are all built from just a few basic types of data:

Boolean values (booleans).
Characters (chars).
8-bit integers (bytes).
16-bit integers (shorts).
32-bit integers (ints).
64-bit integers (longs).
32-bit floating-point numbers (floats).
64-bit floating-point numbers (doubles).

It is customary to refer to these basic types by their Java names int, float, char, and so forth although we often use the generic terminology integer, floating-point number, and character, as well. We use data of type boolean to store the logical values true or false, usually to record facts about other data that will affect decision-making later in a computation. Characters are most often used in higher-level abstractions for example, to make words and sentences so we defer consideration of character data to Section 3.6. All of the other primitive types are used to represent numbers.

We use a fixed number of bits to represent numbers, so when we represent integers we are working with a specific range of values that depends on the number of bits that we use to represent them. Floating-point numbers approximate real numbers, and the number of bits that we use to represent them affects the precision with which we can approximate a real number. In Java, we trade space for accuracy by choosing from among the types int, long, short, or byte for integers and from among float or double for floating-point numbers. On most systems, these types correspond to underlying hardware rep-resentations, but the number of bits used for the representation, and therefore the range of values (in the case of integers or precision (in the case of floating-point numbers), is guaranteed for each type by Java. In this book, we normally use int and double.

In modern programming, we think of the type of the data more in terms of the needs of the program than the capabilities of the machine, primarily in order to make programs portable. Thus, for example, we think of a short as an object that can take on values between -32,768 and 32,767, instead of as a 16-bit object. Moreover, our concept of an integer includes the operations that we perform on them: addition, multiplication, and so forth.

Definition 3.1 A data type is a set of values and a collection of operations on those values.

Operations are associated with types, not the other way around. When we perform an operation, we need to ensure that its operands and result are of the correct type. Neglecting this responsibility is a common programming error. In some situations, Java performs implicit type conversions; in other situations, we use casts, or explicit type conversions. For example, if x and N are integers, the expression

 ((float) x) / N

includes both types of conversion: the (float) is a cast that converts the value of x to floating point; then an implicit conversion is performed for N to make both arguments of the divide operator floating point, according to Java's rules for implicit type conversion.

Many of the operations associated with standard data types (for example, the arithmetic operations) are built into the Java language. Other operations are found in the form of methods that are defined in standard Java libraries; still others take form in the Java methods that we define in our programs. That is, the concept of a data type is relevant not just to integer, floating point, and character primitive types. We often define our own data types, as an effective way of organizing our software. When we define a simple method in Java, we are effectively creating a new data type, with the operation implemented by that method added to the operations defined for the types of data represented by its parameters. Indeed, in a sense, each Java program is a data type a list of sets of values (primitive or other types) and associated operations (methods). This point of view is perhaps too broad to be useful, but we shall see that narrowing our focus to understand our programs in terms of data types is valuable.

Program 3.1 Method definition

To implement new operations on data in Java, we define methods in Java class, as illustrated here. Each Java program is a class that includes a definition of the method main, and this code also defines lg.

Each Java class is kept in a file with the same name as the class and a .java extension (LogTable.java in this case). Environments differ on the way that we compile or interpret and actually run programs: some have interactive interfaces and others respond to typed commands such as java LogTable .

The method lg implements a single-argument mathematical function: the integer binary logarithm function (see Section 2.3). In Java, we refer to the arguments as parameters and the value as the return value. A method may have any number of parameters but at most one return value. The method main takes a parameter (not used here) that contains information from the command line that was used to start the application and has no return value (see Appendix).

A method's definition begins with its signature, which defines the type of its return value, its name, and the types of its parameters. This information identifies the method and is needed by other methods in order to invoke the method, using objects of the proper type in place of each parameter. The invoking method can use the method in an expression, in the same way as it uses variables of the return-value type. Following the signature, enclosed in braces, is the Java code that implements the method. In a method definition, we name the parameters and express the computation in terms of those names, as if they were local variables. When the method is invoked, these variables are initialized with values supplied for each parameter by the invoking method and the method code is executed. The return statement ends execution of the method and provides the return value to the calling method.

 class LogTable    {      static int lg(int N)        { int i = 0;          while (N > 0) { i++; N/= 2; }          return i;        }      public static void main(String[] args)        {          for (int N = 1000; N <= 1000000000; N *= 10)            Out.println(lg(N) +""+N);        }    }

All Java programs are based upon its class mechanism. The simplest Java program is a class consisting of a single method named main, as in the programs in Chapter 1. The first step in organizing a large program is to break it into smaller pieces by defining other methods, as illustrated in Program 3.1. The second simplest Java program is a class consisting of several methods (one of which is named main) that invoke one another. This programming model is a powerful one of great utility, but it breaks down in large programs with complex relationships among the methods and the data they operate upon. In this section and in Chapter 4, we describe the process of building more complicated programs by defining data types and implementing them as classes.

One goal that we have when writing programs is to organize them such that they apply to as broad a variety of situations as possible. The reason for adopting such a goal is that it might put us in the position of being able to reuse an old program to solve a new problem, perhaps completely unrelated to the problem that the program was originally intended to solve. First, by taking care to understand and to specify precisely which operations a program uses, we can easily extend it to any type of data for which we can support those operations. Second, by taking care to understand and to specify precisely what a program does, we can add the abstract operation that it performs to the operations at our disposal in solving new problems.

We often want to build data structures that allow us to handle collections of data. The data structures may be huge, or they may be used extensively, so we are interested in identifying the important operations that we will perform on the data and in knowing how to implement those operations efficiently. Doing these tasks is taking the first steps in the process of incrementally building lower-level abstractions into higher-level ones; that process allows us to conveniently develop ever more powerful programs. The Java class mechanism provides us with direct support for doing so.

One property of classes is that they are aggregate types that we can use to define collections of data such that we can manipulate an entire collection as a unit, but can still refer to individual components of a given datum by name. Classes are not at the same level as primitive types such as int or float in Java, because the only operations that are defined for them (beyond referring to their components) are to create objects and to manipulate references to them. Creating an object of a given class is known as instantiation. Thus, we can use a class to define a new type of data, name variables, and pass those variables as parameters to methods, but we have to define specifically as methods any operations that we want to perform.

For example, when processing geometric data we might want to work with the abstract notion of points in the plane. Accordingly, we can write

 class Point { double x; double y; }

to indicate that we will use Point to create objects that consist of pairs of floating-point numbers. To instantiate objects, we use the Java new operator. For example, the code

 Point a = new Point(), b = new Point();

creates two Point objects and puts references to them in the variables a and b. We can then use the references to refer to individual members of an object by name. For example, the statements

 a.x = 1.0; a.y = 1.0; b.x = 4.0; b.y = 5.0;

set a to represent the point (1, 1) and b to represent the point (4, 5).

When we create a variable of type int or double or any of the primitive types, the name that we use is a synonym for the address of the memory location that contains the information. For all other objects, the name is a synonym for what is known in Java as a reference: it specifies the address of a memory location that contains the address of the memory location that contains the information. Figure 3.1 illustrates this extra level of indirection for our Point example. Referring to an object indirectly via a reference is often more convenient than referring directly to the object, and it can also be more efficient, particularly for large objects. We will see many examples of this advantage in Sections 3.3 through 3.7. Even more important, as we shall see, we can use references to structure our data in ways that support efficient algorithms for processing the data. References are the basis for many data structures and algorithms.

Figure 3.1. Object representation

This diagram depicts how Java represents simple objects that aggregate data, for the two objects of the `Point` class described in the text. Each object's data is stored in a contiguous block of memory, and programs access the data and manipulate it with references to the data. The reference to point `a` is the memory address of the chunk of memory where the values `1.0` and `1.0` are stored; the reference to point `b` is the memory address of the chunk of memory where the values `4.0` and `5.0` are stored. To access the data, the Java system must follow the reference, but in programs we use the object reference to mean both the reference and the object itself.

graphics/03fig01.gif

For example, references make it easy for us to pass objects as parameters to methods. The code

 double distance(Point a, Point b)    { double dx = a.x - b.x; dy = a.y - b.y;      return Math.sqrt(dx*dx + dy*dy); }

defines a method that computes the distance between two given points. It is easily coded because the references that are passed as parameters provide it with easy access to each point's data.

In C and many other languages, references are known as pointers, and they are commonly used. In Java programs, there is no distinction between the reference and the object, so we have no choice but to refer to objects through references. This feature of Java simplifies many aspects of our programs, but it comes at a cost. When objects are large, it is much more efficient to use pointers to manipulate them than to actually move data around, but when objects are small, there is a performance penalty to pay for following the pointer every time that we need to get at the data. Full treatment of the reasons that the designers of Java chose this approach is beyond the scope of this book, but we will again touch upon this issue when it is relevant (for example, when doing performance studies).

Thus, classes allow us to aggregate our data in typical applications. We can define a class that specifies all the types of the data that we want to collect, create objects of that type (with those types of data), and write code that refers to those objects via references to them. The Point class example just given is a simple one that aggregates two data items of the same type. In general, we can mix different types of data in classes. We shall be working extensively with such classes throughout the rest of this chapter.

Normally, we go one step further and use the Java class mechanism to define data types, not just aggregate data. That is, we also specify methods that define the operations that we will perform on the data. For example, Program 3.2 is a class that embodies our definition of a data type for points in the plane. It specifies that points are an aggregate type comprised of pairs of floating-point numbers and includes definitions of several operations that we might perform on the points.

Every class consists of a collection of data fields that define the set of values for its data type and a collection of methods that operate upon the data. We also refer to the data fields and the methods as the class members. Program 3.2 has two data fields (x and y) and six methods (two POINTs, r, theta, distance, and toString).

Program 3.2 Point class implementation

This code, kept in a file named Point.java, is a class implemention of a data type for points in the plane. It specifies that all points will have double values representing their x-and y-coordinates, and it defines six methods that programs can use to manipulate points. The first two are constructors: a client can create a new Point object via new, either with random coordinates (no parameters) or specified coordinates (two parameters).

The operations that programs can perform on Point objects (besides creating them) are to compute its polar coordinates, compute the distance to another point, and compute a string representation of the point (for use in printing its value).

 class Point    { double x, y;      Point()        { x = Math.random(); y = Math.random(); }      Point(double x, double y)        { this.x = x; this.y = y; }      double r()        { return Math.sqrt(x*x + y*y); }      double theta()        { return Math.atan2(y, x); }      double distance(Point p)        { double dx = x - p.x, dy = y - p.y;          return Math.sqrt(dx*dx + dy*dy);        }      public String toString()        { return "(" + x + "," + y + ")"; }    }

Methods that have the same name as the class and no return type are known as constructors. When a constructor is invoked via new, Java creates an object, then passes control to the constructor method, which generally initializes the data fields. For example, we use the code Point a = new Point(1.0, 1.0); to use the class in Program 3.2 to create an object a that represents the point (1, 1), Point b = new Point(4.0, 5.0); to create an object b that represents the point (4, 5),and Point c = new Point(); to create a new point with random coordinates. All other methods are invoked through point objects: for example, we can use the expression c.r() to compute the distance to the origin from c, or the expression a.distance(b) (or b.distance(a)) to compute the distance between a and b.

Using identical names for two different methods is known as overloading, and it is perfectly acceptable as long as the system can distinguish them by differences in their signatures. In Program 3.2, the constructor is overloaded the two Point methods differ because one has no parameters and the other has two. The types of parameters and the presence or absence of a return value also can serve to distinguish methods with identical names.

This style of programming, which is sometimes called object-oriented programming, is directly supported by the Java class construct. We may think of classes as mechanisms that not only allow us to aggregate data but also define operations on that data. There may be many different objects that belong to a class, but all of them are similar in that the set of values that their data members can take on is the same, and the collection of operations that can be performed on their data members is the same; in short, they are instances of the same data type. In object-oriented programming, we direct objects to process their member data (as opposed to having free methods processing the data stored in objects).

We could also, if desired, prepend the keyword static to the distance method implementation given above that takes two points as parameters and include it as a method in our Point class. The static designation is for methods that are associated with the class (as opposed to methods that are associated with class objects). They are invoked by using the class name instead of an object name. If we make this change, the expressions Point.distance(a, b), a.distance(b), and b.distance(a) all represent the same value, and each might be preferable to the others in specific programming contexts.

Each Java class definition is kept in a separate file with the same name as the class and a .java suffix. Any Java program can then use the class. For example, Programs 3.7 and 3.18 build upon the abstraction implemented by Program 3.2.

Program 3.3 Statistics of a sequence of random numbers

This program computes the average µ and standard deviation s of a sequence x₁,x₂,...,x_N of random nonnegative four-digit integers, following the mathematical definitions

Note that a direct implementation from the definition of s² requires one pass to compute the average and another to compute the sums of the squares of the differences between the members of the sequence and the average, but rearranging the formula makes it possible for us to compute s² in one pass through the data (without having to store any of it!).

 class PrintStats    {      public static void main(String[] args)        { int N = Integer.parseInt(args[0]);          double m = 0.0, s = 0.0;          for (int i = 0; i < N; i++)            { int x = (int) (Math.random()*10000);              double d = (double) x;              m += d/N; s += (d*d)/ N;            }          s = Math.sqrt(s - m*m);          Out.println("     Avg.: " + m);          Out.println("Std. dev.: " + s);        }    }

We use classes like Point to define data types whenever possible because they encapsulate the definition of the data type in a clear and direct manner. We normally take further steps to make sure that other programs can use points without having to make any assumption about how they are represented. Such mechanisms are the topic of Chapter 4, but we briefly introduce them next.

Program 3.3 implements a simple computation on four-digit numbers: computing the average and standard deviation of a long sequence of random values. It is interesting from an algorithmic point of view because it computes both statistics in one pass through the numbers. Computing the average in one pass is easy, but a naive implementation for the standard deviation that follows the mathematical definition would require storing all the numbers in order to be able to sum the squares of the deviation of each from the average. This difference means that Program 3.3 can compute the average and standard deviation of a huge sequence of numbers (limited only by the time it might take), while the naive implementation is limited by the amount of space available.

What changes do we need to make Program 3.3 work with other types of numbers (say, floating-point numbers in the unit interval) rather than with four-digit integers? There are a number of different options. For such a small program, the simplest is to make a copy of the file, then to change the two lines involving x to

 double d = Math.random();

Even for such a small program, this approach is inconvenient because it leaves us with two copies of the main program, and we will have to make sure that any later changes in that program are reflected in both copies.

In Java, an alternative approach is to define a separate data type for sequences of numbers, as follows:

 class NumberSeq    { public double next() { return Math.random(); } }

To use this class in Program 3.3, we insert the following code before the for loop:

 NumberSeq NS = new NumberSeq();

(to create an object that can give us sequences of number); and we replace the two lines involving x in Program 3.3 by

 double d = NS.next();

(to use that object to give us the numbers). These changes allow us to use Program 3.3 to test different kinds of numbers (by substituting different implementations of NumberSeq) without modifying it at all.

By recasting Program 3.3 in terms of the NumberSeq class, we significantly extend its potential utility. We can do the computation for 3-digit numbers, or 20-digit numbers, or floats, or any other type of number. We could even arrange for Program 3.3 to process items of type Point by defining how we want to extract values of type double for a sequence of points:

 class NumberSeq    { double next() { return (new Point()).r(); } }

If we use this implementation of NumberSeq with Program 3.3, then it generates random points and computes the average and standard deviation of their distance from the origin (see Exercise 3.11). Generally speaking, this approach is likely to extend the useful lifetime of a program. When some new circumstance (a new application, or perhaps a new compiler or computer) presents us with a new type of number with which we need to work, we can update our program just by changing the data type.

Conceptually, these changes amount to splitting the computation into three distinct parts:

An interface, which declares the methods to be used
An implementation of the methods in the interface
A client program that uses the methods in the interface to work at a higher level of abstraction

We may think of the interface as a definition of the data type. It is a contract between the client and the implementation. The client agrees to access the data only through the methods defined in the interface, and the implementation agrees to deliver the promised methods.

For the example just given, the interface is the classname NumberSeq and the methods NumberSeq() and next(). Changing the two lines in Program 3.3 involving x to use NumberSeq, as just described, makes it a client of this interface.

An implementation of this interface includes the Java code that defines the body of the methods. If we put another version of NumberSeq with different code for each of the methods in the file NumberSeq.java and compile it with a client, then we can get different behavior, without changing the client code at all.

If we were to try to do operations other than arithmetic operations, we would soon find the need to add more operations to the data type. For example, if we want to print the numbers, we need to implement a toString method. Whenever we strive to develop a data type based on identifying the operations of importance in a program, we need to strike a balance between the level of generality that we choose and the ease of implementation and use that results.

Often, we want to build a new data type by adding features to an existing one. To facilitate this process, Java provides the ability to define a class that extends another one. The data type defined by the extended class is defined by the members of the base class plus all the members in the extended class. The extended class can either define new members or redefine members from the base class. We say that the extended class inherits the members of the base class. For example, we could extend the Point class to give us a class where we can associate a string with each point, as follows:

 class LabeledPoint extends Point    {      String id;      void label(String name) { id = name; }      public String toString()        { return name + "("+x+","+y+")"; }    }

The extends keyword means that LabeledPoint inherits all the data fields and methods from Point. We add the new data field String and redefine the toString method without having to make another copy of (or even know the details of) the code for Point.

This ability to use previously defined high-level abstract operations, even for newly defined types, is one of the essential and distinctive features of Java programming. Extended classes provide a convenient way for programmers to build upon the work of others and are an integral feature of object-oriented programming systems. However, we use inheritance sparingly in this book because it makes parts of interfaces implicit, where we normally strive to make the interfaces that correspond to fundamental data types fully explicit.

There are other ways to support data types besides the client interface implementation scenario just described, but we will not dwell on distinctions among various alternatives because such distinctions are best drawn in a systems-programming context rather than in an algorithm-design context (see reference section). However, we do often make use of this basic design paradigm because it provides us with a natural way to substitute improved implementations for old ones and therefore to compare different algorithms for the same problem. Chapter 4 is devoted to this topic.

Object-oriented programming is based on our ability to directly define our own data types by associating data and precisely defining the operations on the data (and the data structures and algorithms that support them) with classes. Classes form the basis for our code throughout the book, but before considering our methodology in further detail (in Chapter 4), we need to consider a number of lower-level mechanisms for manipulating and associating data.

So far, we have primarily talked about defining individual pieces of information for our programs to process. In many instances, we are need to work with potentially huge collections of data, and we now turn to basic methods for doing so. In general, we use the term data structure to refer to a mechanism for organizing our information to provide convenient and efficient mechanisms for accessing and manipulating it. Many important data structures are based on one or both of the two elementary approaches that we shall consider in this chapter. We may use an array, where we organize objects in a fixed sequential fashion that is more suitable for access than for manipulation; or a list, where we organize objects in a logical sequential fashion that is more suitable for manipulation than for access.

Exercises

3.1 Define a class suitable for representing playing cards, including methods for testing whether two cards are in the same suit and whether one has a higher value than the other. Also include a toString() method.

3.2 Write a class that uses the Point class to implement a data type for triangles in the unit square. Include both a method for testing whether the triangle is a right triangle and a toString method for printing the coordinates of the vertices.

3.3 Write a client program that uses the data type in Program 3.2 for the following task: Read a sequence of points (pairs of floating-point numbers) from standard input, and find the one that is closest to the first.

3.4 Add a method to the point data type (Program 3.2) that determines whether or not three points are collinear, to within a numerical tolerance of 10^-4. Assume that the points are all in the unit square.

3.5 Give a NumberSeq implementation that allows you to use Program 3.3 (modified to be a NumberSeq client, as described in the text) to print the average and standard deviation of a sequence of random 4-digit integers.

3.6 Add to your class from Exercise 3.2 a method for computing the area of the triangle. Then write a myNUMBER implementation that allows you to use Program 3.3 (modified as described in the text) to generate random triangles and compute their average area.

3.7 Test the random-number generator on your system by generating N random integers between 0 and r - 1 with ((int) (1000*Math.random())) %r and computing the average and standard deviation for r = 10, 100, and 1000 and N = 10³, 10⁴, 10⁵, and 10⁶.

3.8 Test the random-number generator on your system by generating N random integers between 0 and r - 1 with ((int) (r*Math.random())) and computing the average and standard deviation for r = 10, 100, and 1000 and N = 10³, 10⁴, 10⁵, and 10⁶.

r = 2, 4, and 16.

3.10 Adapt Program 3.3 to be used for random bits (numbers that can take only the values 0 or 1).

3.11 Give an analytic expression for the expected distance to the origin of a random point in the unit square and compare with the result of using Program 3.3 (in the manner described in the text) to find the average and standard deviation of this quantity for N random points, for N = 10³, 10⁴, 10⁵, and 10⁶.

Top