Overriding rules for equals and hashcode functions


A common source of bugs is the failure to override the hashCode method. You must override hashCode in every class that overrides equals. Failure to do so will result in a violation of the general contract for Object.hashCode, which will prevent your class from functioning properly in conjunction with all hash-based collections, including HashMap, HashSet, and Hashtable.
Here is the contract, copied from the java.lang.Object specification:

  • Whenever it is invoked on the same object more than once during an execution of an application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

  • It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

The key provision that is violated when you fail to override hashCodeis the second one: Equal objects must have equal hash codes. Two distinct instances may be logically equal according to the class's equals method, but to the Object class's hashCode method, they're just two objects with nothing much in common. Therefore object's hashCode method returns two seemingly random numbers instead of two equal numbers as required by the contract.
For example, consider the following simplistic PhoneNumber class,

public final class PhoneNumber {
 private final short areaCode;
 private final short exchange;
 private final short extension;

 public PhoneNumber(int areaCode, int exchange, int extension) {
  rangeCheck(areaCode, 999, "area code");
  rangeCheck(exchange, 999, "exchange");
  rangeCheck(extension, 9999, "extension");
  this.areaCode = (short) areaCode;
  this.exchange = (short) exchange;
  this.extension = (short) extension;
 }

 private static void rangeCheck(int arg, int max, String name) {
  if (arg < 0 || arg > max)
   throw new IllegalArgumentException(name + ": " + arg);
 }

 public boolean equals(Object o) {
  if (o == this)
   return true;
  if (!(o instanceof PhoneNumber))
   return false;
  PhoneNumber pn = (PhoneNumber) o;
  return pn.extension == extension && pn.exchange == exchange
    && pn.areaCode == areaCode;
 }
 // No hashCode method!
 // Remainder omitted
}
             
Suppose you attempt to use this class with a HashMap:
 Map m = new HashMap();
 m.put(new PhoneNumber(408, 867, 5309), "Jenny");
At this point, you might expect m.get(new PhoneNumber(408, 867, 5309)) to return "Jenny", but it returns null. Notice that two PhoneNumber instances are involved: One is used for insertion into the HashMap, and a second, equal, instance is used for (attempted) retrieval. The PhoneNumber class's failure to override hashCode causes the two equal instances to have unequal hash codes, in violation of the hashCode contract. Therefore the get method looks for the phone number in a different hash bucket from the one in which it was stored by the put method. Fixing this problem is as simple as providing a proper hashCode method for the PhoneNumber class.
So what should a hashCode method look like? It's trivial to write one that is legal but not good. This one, for example, is always legal, but it should never be used:
// The worst possible legal hash function - never use!
   public int hashCode() { return 42; }
It's legal because it ensures that equal objects have the same hash code. It's atrocious because it ensures that every object has the same hash code. Therefore every object hashes to the same bucket, and hash tables degenerate to linked lists. Programs that should run in linear time run instead in quadratic time. For large hash tables, this is the difference between working and not working.

Please refer Hashcode function implementation to know how to write an effective hashcode function

No comments:

Post a Comment