Introduction
Sometime in a developer's life things don't work as expected. This example is not real-life, I discovered a few notices about how hashMap and hashSet works while reading the documentation of Java.As a good practice it is always mentioned to implement equals() in data-objects. As a side-effect you are gently forced to also implement hashCode(). In common IDEs it is usual to use the same fields to calculate
hashCode()
as used in equals()
. But is this reasonable and prudent?
Define a sample class
Let's start with a very simplified example. For brevity reasons the member-variable is public instead of private to grant access to it easily without getters and setters.
public class Item {
public int value;
@Override
public boolean equals(Object o) {
boolean result = false;
if (o instanceof Item) {
Item that = (Item) o;
result = (this.value == that.value);
}
return result;
}
@Override
public int hashCode() {
int hash = 7;
hash = 97 * hash + this.value;
return hash;
}
}
The class Item
only consists of a single attribute called value
. The equals()
-function compares the type and if the value is equal. The hashCode()
-function also uses the same attribute to calculate its value.
Putting an item into a HashSet
Now let's define a HashSet and put an item into it.
HashSet <Item> set = new HashSet<>();
Item item = new Item();
item.value = 1;
set.add(item);
System.out.println(set.contains(item));
Also as expected, the set contains the element. So we change the value of the item.
item.value++;
System.out.println(set.contains(item));
But what happend now? The item is not inside the HashSet! Even the element has the same reference as the inserted element. If you iterate over the set and compare its contained item with item
, they are equal.So what happened?
Look into the documentation of Java
If we have a look at the Java documentation of the HashSet-function contains() it says:"Returns true if this set contains the specified element. More formally, returns true if and only if this set contains an element e such that Objects.equals(o, e)."If we add the item again to the set via
set.add(item);
, the item should not be added as also claimed in the documentation of HashSet's add-function.
"Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that Objects.equals(e, e2). If this set already contains the element, the call leaves the set unchanged and returns false."
The cause
The cause of this problem is found in the documentation of hashCode. There it says:"Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application."In this example we did not obey that rule. We changed the hashCode by changing
value
, because it is recalculated every time we call it.
The solution
Since the equals-function says that two equal items must also return the same hashCode, we cannot calculate a hashCode when the object is created. That would violate the rule.The only chance we have to fix it, is to keep the object immutable or at least the attributes that were used to calculate the hashCode. Hence, the only candidates for attributes to calculate the hashCode are marked with
final
. In the worst case there is no invariant attribute, then the hashCode is a constant. In that case the data-structure HashSet is very slow in finding elements. In that case an implementation for the
Set
should be used that not depends on a hashCode.