Saturday, January 7, 2012

hashCode() and equals() in Java

Java object's hash code is important to the performance of hash tables and other data structures that store objects in groups ("buckets") based on their computed hash values. Good implementation of hashCode() must:
  • Follow its general contract[1]
  • Generate values that are evenly distributed for varied inputs to avoid clustering
Overriding equals() and hashCode() in your class needs to be considered together. In this article, we'll examine why.

Object Identity vs. Logical Identity

  • Object identity
    • For two object references x and y, two objects are equal if and only if x and y refer to the same object. This is the default behavior of java.lang.Object.equals().
  • Logical identity
    • For two value object references x and y, two value objects are equal if their values are identical even they may refer to different objects.

When to Override equals()


Your class needs to override the default object identity behavior if logical identity is its desired behavior and a superclass has not already overridden Object's default equals() implementation.

Override hashCode() If equals() Is Overridden


An object's hashCode() method digests the data stored in the object into a single hash value (a 32-bit signed integer). Hash value cannot be used as a
unique identifier for an object because hashCode() is not required to produce distinct integer results for unequal objects based on its contract. However, a good hash function tends to produce unequal hash values for unequal objects.

Always override hashCode() when you override equals(). Failure to do so will prevent your class from functioning properly in conjunction with all hash-based collections, including
  • HashMap
  • HashSet
  • Hashtable
    • Note that you should use HashMap instead of Hashtable for better performance.
The reason is that one of the general contract of hashCode dictates that:
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
If you override equals() by changing its behavior from object identity to logical identity, then your hashCode() must produce the same hash value for two logically identical objects. However, the default hashCode() is typically implemented by converting the internal address of the object into an integer. Therefore, the default hashCode() will return different hash values for two logically identical objects, which violates the general contract.

Queries on hash-based collections should run in linear time. If a bad hash function is used, it might run in quadratic time. For example, if a hashCode() always returns a fixed hash value, hash tables essentially degenerate to linked lists.

As with any general hashing function, collisions are possible. If that happens, objects will be put in groups (or buckets) based on their computed hash values.

Summary

  • If a class overrides equals(), it must override hashCode()
  • equals() and hashCode() must use the same set of significant fields of your object
    • You can exclude redundant fields which can be computed from other significant fields that are included in the computation
    • If you exclude redundant fields, treat them the same way in both methods
  • If two objects are equal, then their hashCode values must be equal as well
    • However, unequal objects need not produce distinct hash values
  • If the object is immutable, then hashCode is a candidate for caching and lazy initialization

Read More

  1. java.lang.Object
  2. Equals and Hash Code in Java
  3. Effective Java by Joshua Bloch
  4. Performance tuning articles
  5. Java Performance Tips

No comments: