Chapter 7: Generics and Collections | SCJP Sun Certified Programmer for Java 5 Study Guide (Exam 310-055) (Certification Press)

Generics are possibly the most talked about feature of Java 5. Some people love 'em, some people hate 'em, but they're here to stay. At their simplest, they can help make code easier to write, and more robust. At their most complex, they can be very, very hard to create, and maintain. Luckily, the exam creators stuck to the simple end of generics, covering the most common and useful features, and leaving out most of the especially tricky bits. Coverage of collections in this exam has expanded in two ways from the previous exam: the use of generics in collections, and the ability to sort and search through collections.

Certification Objective —Overriding hashCode() and equals() (Objective 6.2)

6.2 Distinguish between correct and incorrect overrides of corresponding hashCode and equals methods, and explain the difference between == and the equals method.

You're an object. Get used to it. You have state, you have behavior, you have a job. (Or at least your chances of getting one will go up after passing the exam.) If you exclude primitives, everything in Java is an object. Not just an object, but an Object with a capital O. Every exception, every event, every array extends from java.lang.Object. For the exam, you don't need to know every method in Object, but you will need to know about the methods listed in Table 7-1.

Table 7-1: Methods of Class Object Covered on the Exam
Method	Description
boolean equals (Object obj)	Decides whether two objects are meaningfully equivalent.
void finalize()	Called by garbage collector when the garbage collector sees that the object cannot be referenced.
int hashcode()	Returns a hashcode int value for an object, so that the object can be used in Collection classes that use hashing, including Hashtable, HashMap, and HashSet.
final void notify()	Wakes up a thread that is waiting for this object's lock.
final void notifyAll ()	Wakes up all threads that are waiting for this object's lock.
final void wait()	Causes the current thread to wait until another thread calls notify() or notifyAll() on this subject.
String toString()	Returns a "text representation" of the object.

Chapter 9 covers wait(), notify(), and notifyAll(). The finalize() method was covered in Chapter 3. So in this section we'll look at just the hashCode() and equals() methods. Oh, that leaves out toString(), doesn't it. Okay, we'll cover that right now because it takes two seconds.

The toString() Method Override toString() when you want a mere mortal to be able to read something meaningful about the objects of your class. Code can call toString() on your object when it wants to read useful details about your object. When you pass an object reference to the System.out.println() method, for example, the object's toString() method is called, and the return of toString() is shown in the following example:

 public class HardToRead {    public static void main (String [] args) {      HardToRead h = new HardToRead();      System.out.println(h);    } }

Running the HardToRead class gives us the lovely and meaningful,

 % Java HardToRead HardToRead@a47e0

The preceding output is what you get when you don't override the toString() method of class Object. It gives you the class name (at least that's meaningful) followed by the @ symbol, followed by the unsigned hexadecimal representation of the object's hashcode.

Trying to read this output might motivate you to override the toString() method in your classes, for example,

 public class BobTest {    public static void main (String[] args) {      Bob f = new Bob("GoBobGo", 19);      System.out.println(f);    } } class Bob {    int shoeSize;    String nickName;    Bob(String nickName, int shoeSize) {      this.shoeSize = shoeSize;      this.nickName = nickName;    }    public String toString() {       return ("I am a Bob, but you can call me " + nickName +               ". My shoe size is " + shoeSize);    } }

This ought to be a bit more readable:

 % Java BobTest I am a Bob, but you can call me GoBobGo. My shoe size is 19

Some people affectionately refer to toString() as the "spill-your-guts method," because the most common implementations of toString() simply spit out the object's state (in other words, the current values of the important instance variables). That's it for toString(). Now we'll tackle equals() and hashCode().

Overriding equals()

You learned about the equals() method in earlier chapters, where we looked at the wrapper classes. We discussed how comparing two object references using the == operator evaluates to true only when both references refer to the same object (because == simply looks at the bits in the variable, and they're either identical or they're not). You saw that the String class and the wrapper classes have overridden the equals() method (inherited from class Object), so that you could compare two different objects (of the same type) to see if their contents are meaningfully equivalent. If two different Integer instances both hold the int value 5, as far as you're concerned they are equal. The fact that the value 5 lives in two separate objects doesn't matter.

When you really need to know if two references are identical, use ==. But when you need to know if the objects themselves (not the references) are equal, use the equals() method. For each class you write, you must decide if it makes sense to consider two different instances equal. For some classes, you might decide that two objects can never be equal. For example, imagine a class Car that has instance variables for things like make, model, year, configuration—you certainly don't want your car suddenly to be treated as the very same car as someone with a car that has identical attributes. Your car is your car and you don't want your neighbor Billy driving off in it just because, "hey, it's really the same car; the equals() method said so." So no two cars should ever be considered exactly equal. If two references refer to one car, then you know that both are talking about one car, not two cars that have the same attributes. So in the case of a Car you might not ever need, or want, to override the equals() method. Of course, you know that isn't the end of the story.

What It Means If You don't Override equals()

There's a potential limitation lurking here: if you don't override a class's equals() method, you won't be able to use those objects as a key in a hashtable and you probably won't get accurate Sets, such that there are no conceptual duplicates.

The equals() method in class Object uses only the == operator for comparisons, so unless you override equals(), two objects are considered equal only if the two references refer to the same object.

Let's look at what it means to not be able to use an object as a hashtable key. Imagine you have a car, a very specific car (say, John's red Subaru Outback as opposed to Mary's purple Mini) that you want to put in a HashMap (a type of hashtable we'll look at later in this chapter), so that you can search on a particular car and retrieve the corresponding Person object that represents the owner. So you add the car instance as the key to the HashMap (along with a corresponding Person object as the value). But now what happens when you want to do a search? You want to say to the HashMap collection, "Here's the car, now give me the Person object that goes with this car." But now you're in trouble unless you still have a reference to the exact object you used as the key when you added it to the Collection. In other words, you can't make an identical Car object and use it for the search.

The bottom line is this: if you want objects of your class to be used as keys for a hashtable (or as elements in any data structure that uses equivalency for searching for—and/or retrieving—an object), then you must override equals() so that two different instances can be considered the same. So how would we fix the car? You might override the equals() method so that it compares the unique VIN (Vehicle Identification Number) as the basis of comparison. That way, you can use one instance when you add it to a Collection, and essentially re-create an identical instance when you want to do a search based on that object as the key. Of course, overriding the equals() method for Car also allows the potential that more than one object representing a single unique car can exist, which might not be safe in your design. Fortunately, the String and wrapper classes work well as keys in hashtables—they override the equals() method. So rather than using the actual car instance as the key into the car/owner pair, you could simply use a String that represents the unique identifier for the car. That way, you'll never have more than one instance representing a specific car, but you can still use the car—or rather, one of the car's attributes—as the search key.

Implementing an equals() Method

Let's say you decide to override equals() in your class. It might look like this:

 public class EqualsTest {   public static void main (String [] args) {      Moof one = new Moof(8);      Moof two = new Moof(8);      if (one.equals(two)) {         System.out.println("one and two are equal");      }    } } class Moof {   private int moofValue;   Moof(int val) {      moofValue = val;   }   public int getMoofValue() {      return moofValue;   }   public boolean equals(Object o) {     if ((o instanceof Moof) && (((Moof)o).getMoofValue()          == this.moofValue)) {       return true;     } else {        return false;     }   } }

Let's look at this code in detail. In the main() method of EqualsTest, we create two Moof instances, passing the same value 8 to the Moof constructor. Now look at the Moof class and let's see what it does with that constructor argument—it assigns the value to the moofvalue instance variable. Now imagine that you've decided two Moof objects are the same if their moofvalue is identical. So you override the equals() method and compare the two moofValues. It is that simple. But let's break down what's happening in the equals() method:

 1.  public boolean equals(Object o) { 2.    if ((o instanceof Moof) && (((Moof)o).getMoofValue()             == this.moofValue)) { 3.       return true; 4.      } else { 5.        return false; 6.      } 7.  }

First of all, you must observe all the rules of overriding, and in line 1 we are indeed declaring a valid override of the equals() method we inherited from Object.

Line 2 is where all the action is. Logically, we have to do two things in order to make a valid equality comparison.

First, be sure that the object being tested is of the correct type! It comes in polymorphically as type Object, so you need to do an instanceof test on it. Having two objects of different class types be considered equal is usually not a good idea, but that's a design issue we won't go into here. Besides, you'd still have to do the instanceof test just to be sure that you could cast the object argument to the correct type so that you can access its methods or variables in order to actually do the comparison. Remember, if the object doesn't pass the instanceof test, then you'll get a runtime ClassCastException. For example:

 public boolean equals(Object o) {    if (((Moof) o).getMoofValue() == this.moofValue){       // the preceding line compiles, but it's BAD!       return true;    } else {       return false;    } }

The (Moof)o cast will fail if o doesn't refer to something that IS-A Moof.

Second, compare the attributes we care about (in this case, just moofValue). Only the developer can decide what makes two instances equal. (For best performance, you're going to want to check the fewest number of attributes.)

In case you were a little surprised by the whole ((Moof)o).getMoofValue() syntax, we're simply casting the object reference, o, just-in-time as we try to call a method that's in the Moof class but not in Object. Remember, without the cast, you can't compile because the compiler would see the object referenced by o as simply, well, an Object. And since the Object class doesn't have a moofvalue() method, the compiler would squawk (technical term). But then as we said earlier, even with the cast, the code fails at runtime if the object referenced by o isn't something that's castable to a Moof. So don't ever forget to use the instanceof test first. Here's another reason to appreciate the short circuit && operator—if the instanceof test fails, we'll never get to the code that does the cast, so we're always safe at runtime with the following:

 if ((o instanceof Moof) && (((Moof)o).getMoofValue()       == this.moofValue)) {     return true; } else {     return false; }

So that takes care of equals()…

Whoa…not so fast. If you look at the Object class in the Java API spec, you'll find what we call a contract specified in the equals() method. A Java contract is a set of rules that should be followed, or rather must be followed if you want to provide a "correct" implementation as others will expect it to be. Or to put it another way, if you don't follow the contract, your code may still compile and run, but your code (or someone else's) may break at runtime in some unexpected way.

Exam Watch

Remember that the equals(), hashCode(), and toString() methods are all public.The following would not be a valid override of the equals() method, although it might appear to be if you don't look closely enough during the exam:

 class Foo { boolean equals(Object o) { } }

And watch out for the argument types as well. The following method is an overload, but not an override of the equals() method:

 class Boo { public boolean equals(Boo b) { } }

Exam Watch

Be sure you're very comfortable with the rules of overriding so that you can identify whether a method from Object is being overridden, overloaded, or illegally redeclared in a class.The equals() method in class Boo changes the argument from Object to Boo, so it becomes an overloaded method and won't be called unless it's from your own code that knows about this new, different method that happens to also be named equals().

The equals() Contract

Pulled straight from the Java docs, the equals() contract says

It is reflexive. For any reference value x, x.equals(x) should return true.
It is symmetric. For any reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
It is transitive. For any reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) must return true.
It is consistent. For any reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the object is modified.
For any non-null reference value x, x.equals(null) should return false.

And you're so not off the hook yet. We haven't looked at the hashCode() method, but equals() and hashCode() are bound together by a joint contract that specifies if two objects are considered equal using the equals() method, then they must have identical hashcode values. So to be truly safe, your rule of thumb should be, if you override equals(), override hashcode() as well. So let's switch over to hashcode() and see how that method ties in to equals().

Overriding hashCode()

Hashcodes are typically used to increase the performance of large collections of data. The hashcode value of an object is used by some collection classes (we'll look at the collections later in this chapter). Although you can think of it as kind of an object ID number, it isn't necessarily unique. Collections such as HashMap and HashSet use the hashcode value of an object to determine how the object should be stored in the collection, and the hashcode is used again to help locate the object in the collection. For the exam you do not need to understand the deep details of how the collection classes that use hashing are implemented, but you do need to know which collections use them (but, um, they all have "hash" in the name so you should be good there). You must also be able to recognize an appropriate or correct implementation of hashcode(). This does not mean legal and does not even mean efficient. It's perfectly legal to have a terribly inefficient hashcode method in your class, as long as it doesn't violate the contract specified in the Object class documentation (we'll look at that contract in a moment). So for the exam, if you're asked to pick out an appropriate or correct use of hashcode, don't mistake appropriate for legal or efficient.

Understanding Hashcodes

In order to understand what's appropriate and correct, we have to look at how some of the collections use hashcodes.

Imagine a set of buckets lined up on the floor. Someone hands you a piece of paper with a name on it. You take the name and calculate an integer code from it by using A is 1, B is 2, and so on, and adding the numeric values of all the letters in the name together. A given name will always result in the same code; see Figure 7-1.

image from book
Figure 7-1: A simplified hashcode example

We don't introduce anything random, we simply have an algorithm that will always run the same way given a specific input, so the output will always be identical for any two identical inputs. So far so good? Now the way you use that code (and we'll call it a hashcode now) is to determine which bucket to place the piece of paper into (imagine that each bucket represents a different code number you might get). Now imagine that someone comes up and shows you a name and says, "Please retrieve the piece of paper that matches this name." So you look at the name they show you, and run the same hashcode-generating algorithm. The hashcode tells you in which bucket you should look to find the name.

You might have noticed a little flaw in our system, though. Two different names might result in the same value. For example, the names Amy and May have the same letters, so the hashcode will be identical for both names. That's acceptable, but it does mean that when someone asks you (the bucket-clerk) for the Amy piece of paper, you'll still have to search through the target bucket reading each name until we find Amy rather than May. The hashcode tells you only which bucket to go into, but not how to locate the name once we're in that bucket.

Exam Watch

In real-life hashing, it's not uncommon to have more than one entry in a bucket. Hashing retrieval is a two-step process.

Find the right bucket (using hashcode())
Search the bucket for the right element (using equals()).

So for efficiency, your goal is to have the papers distributed as evenly as possible across all buckets. Ideally, you might have just one name per bucket so that when someone asked for a paper you could simply calculate the hashcode and just grab the one paper from the correct bucket (without having to go flipping through different papers in that bucket until you locate the exact one you're looking for). The least efficient (but still functional) hashcode generator would return the same hashcode (say, 42) regardless of the name, so that all the papers landed in the same bucket while the others stood empty. The bucket-clerk would have to keep going to that one bucket and flipping painfully through each one of the names in the bucket until the right one was found. And if that's how it works, they might as well not use the hashcodes at all but just go to the one big bucket and start from one end and look through each paper until they find the one they want.

This distributcd-across-the-buckets example is similar to the way hashcodes are used in collections. When you put an object in a collection that uses hashcodes, the collection uses the hashcode of the object to decide in which bucket/slot the object should land. Then when you want to fetch that object (or, for a hashtable, retrieve the associated value for that object), you have to give the collection a reference to an object that the collection compares to the objects it holds in the collection. As long as the object (stored in the collection, like a paper in the bucket) you're trying to search for has the same hashcode as the object you're using for the search (the name you show to the person working the buckets), then the object will be found. But…and this is a Big One, imagine what would happen if, going back to our name example, you showed the bucket-worker a name and they calculated the code based on only half the letters in the name instead of all of them. They'd never find the name in the bucket because they wouldn't be looking in the correct bucket!

Now can you see why if two objects are considered equal, their hashcodes must also be equal? Otherwise, you'd never be able to find the object since the default hashcode method in class Object virtually always comes up with a unique number for each object, even if the equals() method is overridden in such a way that two or more objects are considered equal. It doesn't matter how equal the objects are if their hashcodes don't reflect that. So one more time: If two objects are equal, their hashcodes must be equal as well.

Implementing hashCode()

What the heck does a real hashcode algorithm look like? People get their PhDs on hashing algorithms, so from a computer science viewpoint, it's beyond the scope of the exam. The part we care about here is the issue of whether you follow the contract. And to follow the contract, think about what you do in the equals() method. You compare attributes. Because that comparison almost always involves instance variable values (remember when we looked at two Moof objects and considered them equal if their int moofValues were the same?). Your hashcode() implementation should use the same instance variables. Here's an example:

 class HasHash {   public int x;   HasHash(int xVal) { x = xval; }   public boolean equals(Object o) {     HasHash h = (HasHash) o; // Don't try at home without                              // instanceof test     if (h.x == this.x) {       return true;     } else {       return false;     }   }   public int hashCode() { return (x * 17) ; } }

This equals() method says two objects are equal if they have the same x value, so objects with the same x value will have to return identical hashcodes.

Exam Watch

A hashCode() that returns the same value for all instances whether they're equal or not is still a legal—even appropriate—hashCode() method! For example,

 public int hashCode() { return 1492; }

This does not violate the contract. Two objects with an x value of 8 will have the same hashcode. But then again, so will two unequal objects, one with an x value of 12 and the other a value of -920. This hashCode() method is horribly inefficient, remember, because it makes all objects land in the same bucket, but even so, the object can still be found as the collection cranks through the one and only bucket—using equals()—trying desperately to finally, painstakingly, locate the correct object. In other words, the hashcode was really no help at all in speeding up the search, even though improving search speed is hashcode's intended purpose! Nonetheless, this one-hash-fits-all method would be considered appropriate and even correct because it doesn't violate the contract. Once more, correct does not necessarily mean good.

Typically, you'll see hashCode() methods that do some combination of ^-ing (XOR-ing) a class's instance variables (in other words, twiddling their bits), along with perhaps multiplying them by a prime number. In any case, while the goal is to get a wide and random distribution of objects across buckets, the contract (and whether or not an object can be found) requires only that two equal objects have equal hashcodes. The exam does not expect you to rate the efficiency of a hashCode() method, but you must be able to recognize which ones will and will not work (work meaning "will cause the object to be found in the collection").

Now that we know that two equal objects must have identical hashcodes, is the reverse true? Do two objects with identical hashcodes have to be considered equal? Think about it—you might have lots of objects land in the same bucket because their hashcodes are identical, but unless they also pass the equals() test, they won't come up as a match in a search through the collection. This is exactly what you'd get with our very inefficient everybody-gets-the-same-hashcode method. It's legal and correct, just slooooow.

So in order for an object to be located, the search object and the object in the collection must have both identical hashcode values and return true for the equals() method. So there's just no way out of overriding both methods to be absolutely certain that your objects can be used in Collections that use hashing.

The hashCode() Contract

Now coming to you straight from the fabulous Java API documentation for class Object, may we present (drum roll) the hashCode() contract:

Whenever it is invoked on the same object more than once during an execution of a Java application, the hashcode() method must consistently return the same integer, provided no information used in equals() comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(object) method, then calling the hashCode() method on each of the two objects must produce the same integer result.
It is NOT required that if two objects are unequal according to the equals(Java.lang.Object) method, then calling the hashCode() method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.

And what this means to you is…

Condition	Required	Not Required (ButAllowed)
x.equals(y) == true	x.hashCode() == y.hashCode()
x.hashCode() == y.hashcode()		x.equals(y) == true
x.equals(y) == false		No hashCode() requirements
x.hashCode() != y.hashCode()	x.equals(y) == false

So let's look at what else might cause a hashCode() method to fail. What happens if you include a transient variable in your hasheode() method? While that's legal (compiler won't complain), under some circumstances an object you put in a collection won't be found. As you know, serialization saves an object so that it can be reanimated later by deserializing it back to full objectness. But danger Will Robinson—remember that transient variables are not saved when an object is serialized. A bad scenario might look like this:

 class SaveMe implements Serializable{   transient int x;   int y;    SaveMe(int xVal, int yVal) {       x = xVal;       y = yval;    }   public int hashCode() {      return (x ^ y);      // Legal, but not correct to                           // use a transient variable   }   public boolean equals(Object o) {      SaveMe test = (SaveMe)o;      if (test.y == y && test.x == x) { // Legal, not correct        return true;      } else {        return false;      }   } }

Here's what could happen using code like the preceding example:

Give an object some state (assign values to its instance variables).
Put the object in a HashMap, using the object as a key.
Save the object to a file using serialization without altering any of its state.
Retrieve the object from the file through deserialization.
Use the deserialized (brought back to life on the heap) object to get the object out of the HashMap.

Oops. The object in the collection and the supposedly same object brought back to life are no longer identical. The object's transient variable will come back with a default value rather than the value the variable had at the time it was saved (or put into the HashMap). So using the preceding SaveMe code, if the value of x is 9 when the instance is put in the HashMap, then since x is used in the calculation of the hashcode, when the value of x changes, the hashcode changes too. And when that same instance of SaveMe is brought back from deserialization, x == 0, regardless of the value of x at the time the object was serialized. So the new hashcode calculation will give a different hashcode, and the equals() method fails as well since x is used to determine object equality.

Bottom line: transient variables can really mess with your equals() and hashcode() implementations. Keep variables non-transient or, if they must be marked transient, don't use then to determine hashcodes or equality.