Liam Miller-Cushon, April 2019
Correctly implementing equals()
and hashCode()
requires too much ceremony.
Implementations are time-consuming to write by hand and, worse, expensive to maintain. IDEs can help generate the initial code, but once generated that code needs to be read, and debugged, and maintained as the class changes. Over time these methods become a place for bugs to hide (see the appendix for real-world examples).
Consider all the boilerplate required to correctly implement equals()
and hashCode()
for a trivial Point
class:
class Point {
final int x;
final int y;
Point(int x, int y) {
this.x = x;
this.y = y;
}
@Override public boolean equals(Object other) {
if (!(other instanceof Point)) {
return false;
}
Point that = (Point) other;
return x == that.x && y == that.y;
}
@Override public int hashCode() {
return Objects.hash(x, y);
}
}
The goal of this proposal is to create a library solution to help produce readable, correct, and performant implementations of equals()
and hashCode()
.
A secondary use case is to provide new definitions of equivalence for existing types, which differ from those types’ equals()
and hashCode()
implementations.
The API will likely consist of an interface with a method for testing equality and computing hashCodes for instances of the type.
WARNING: All API details in examples are very much non-final, and included only to illustrate the idea.
interface Equivalence<T> {
boolean equivalent(T self, Object other);
int hash(T self);
}
With that hypothetical API, our Point
example might look something like this:
class Point {
int x;
int y;
private static final Equivalence<Point> EQ =
Equivalence.of(Point.class, p -> p.x, p -> p.y);
@Override public boolean equals(Object other) {
return EQ.equivalent(this, other);
}
@Override public int hashCode() {
return EQ.hash(this);
}
}
In the future we expect value classes like Point
to become records, but there will always be some cases where a class needs an equals
and hashCode
implementation and can’t be migrated to a record. This proposal is complementary to records, and would help avoid ever needing to hand-implement equals
and hashCode
.
There are some opportunities here to make the performance competitive to (or better than) hand-written implementations through language extensions or compiler support, but that is out of scope for this document. The goal is to determine what the best version of the feature we could realize purely as a library is.
It’s possible that a future version of the language will support ‘field references’, such as Foo::x
where x
is a field. The Equivalence API would dovetail nicely with that feature and might help provide motivation for it, but the details of that feature are out of scope for this document.
equals()
and hashCode()
, or focus solely on equals()
?An advantage of supporting both equals()
and hashCode()
is that implementing hashCode
and equals
avoids a common source of bugs. Implementations of hashCode
that consider more state than the corresponding equals
function are incorrect. Having a single canonical list of state to consider in both equals
and hashCode
isn’t just about avoiding boilerplate, it’s about
correctness.
(Being able to share the list of state used by equals
and hashCode
with a Comparator
implementation is potentially interesting; see “What’s the relationship to Comparator
?” below)
One option is for the API to always use Object.equals
and Object.hashCode
when testing the provided state. Another is to allow custom comparators to be associated with derived state, for example we might want to perform a case-insensitive comparison on a String
field.
private static final Equivalence<Point> EQ =
Equivalence.forClass(Person.class)
.andThen(Person::age)
.andThen(Person::name, CASE_INSENSITIVE_EQ); // also of type Equivalence
(Another example where custom comparisons will be helpful is with arrays, where we often want Arrays.deepEquals
and Object.deepHashCode
instead of Object.equals
and Object.hashCode
. Arrays are a common-enough special case that it’s natural to discuss preferential treatment in the API, which we do below.)
The state included in a correct implementation of hashCode
must be a subset of the state considered by equals
. Sometimes using only a proper subset in hashCode
produces a good-enough hash that is faster to compute.
Support for this feature falls out of custom comparisons discussed above. For example it might look something like:
private static final Equivalence<Point> EQ =
Equivalence.forClass(Point.class)
.andThen(Point::x)
.andThen(Point::y, Equivalence.using(Objects::equals, x -> 0));
We could consider adding more explicit support to the API for this-use case (for example: Equivalence.forClass(Point.class)andThen(Point::x).butNotHashing(Point::y)
, but we think that level of support is unnecessary. It’s an uncommon use-case (the best practice for hash functions is to avoid trivially constructible collisions), and it’s already possible without additions to the API.
The time-honoured tradition in many hashCode()
implementations has been to combine the hash of each piece of state using (x, y) -> 31 * x + y
. This is usually a fine choice; we don’t see a compelling reason to support customization.
Either way, we should absolutely not specify the exact behavior of the default hash reduce function, to allow future improvements to it.
(One slightly aggressive way to ensure that would be to have optional per-JVM-invocation randomization of the hash seed, which could be enabled for testing. Google has been randomizing hash iteration order in our tests for a few years with good results.)
instanceof
or getClass()
?There’s a choice between implementing equals
using instanceof
, or using getClass()
, or offering the choice of either. We want to try to avoid philosophical debates on which is more correct.
Fortunately we think there’s an easy way to decide this. instanceof
is the more permissive default, since the user can still perform an explicit getClass()
check in the Equivalence
chain, or as a guard before calling Equivalence.equals
, for example:
this.getClass() == o.getClass() && EQ.equivalent(this, o);
The reverse is not possible.
null
?equivalent(nonNull, null)
must safely return false
in order to be useful for implementing Object.equals()
, because of the symmetry requirement.
equivalent(null, null)
should probably return true
rather than throw, to be as consistent and unsurprising as possible.
Comparator
?There are some clear similarities: both Comparator
and Equivalence
allow extracting state from an instance (for use with compareTo
and equals
/hashCode
, respectively).
There are also some reasons why it may be necessary to approach them as completely separate APIs, and not have one be a generalization of the other.
Comparator
can provide half of an Equivalence
implementation–x.compareTo(y) == 0
is an OK implementation of equivalent
–but it can’t implement hashCode
. So making Comparator
extend Equivalence
would force it to throw UnsupportedOperationException
for hashCode
.
Alternatively, Equivalence
has some of the pieces of a Comparator
implementation, including the state that you might want to test in a comparison function. However having Equivalence
support comparison would create overlap with Comparator
, and the state you want to compare might only be a subset of the state considered by equals
and hashCode
.
A third approach would be to have a way to create both an Equivalence
and Comparator
at the same time, by providing a single list of state extractors. This might require having a common super-type, which adds complexity and could muddy the two concepts.
Two options that have been discussed so far are:
Equalator
, which references Comparator
Equivalence
, since an instance of this type is an equivalence relation.Our opinion is that coining a new word is unnecessary, when a well-known word exists already in the world of mathematics.
NOTE: This question is about default behavior; the ability to customize the equals
and hashCode
functions used for a specific field ensures it will always be possible to make an explicit choice.
There are at least two schools of thought here, each with strong arguments. This document isn’t going to settle the debate, just present the options.
An argument for detecting arrays and automatically using Arrays.{deepEquals,deepHashCode}
is that using Object.{equals,hashCode}
on arrays is effectively always a bug, so we’re doing users a favor by recognizing and fixing that bug. (Experience with static analysis to disallow calling Object.{hashCode,equals}
on arrays has shown this is effectively always the behavior users want.)
An argument against is that it complicates the mental model for arrays. No matter what, users need to learn that arrays use reference equality. When we do them a favor in this context only, we pollute that knowledge, which has consequences. Note that Kotlin chose this approach.
Should we try to avoid boxing/varargs overhead in Equivalence
, for example by having specialized interfaces like IntEquivalence
and overloads of the builder methods like andThenInt(IntFunction)
and andThenInt(IntFunction, IntEquivalence)
?
This might be necessary to get the desired performance in some situations. On the other hand, it adds significantly to the complexity of the API.
One alternative is to investigate translation strategies that achieve similar performance without surfacing that complexity in the Java API.
equivalent(T, Object)
or equivalent(T, T)
There are two candidates for the signature of the equivalent
function in Equivalence
: equivalent(T, Object)
, and equivalent(T, T)
.
Using equivalent(T, Object)
allows implementing Object#equals
with less ceremony. We anticipate a larger fraction of uses of Equivalence
being to implement Object#equals
than the fraction of Comparators
that implement compareTo
, which is an argument for biasing towards this use-case. (The ease of implementing Object#equals
is especially significant when used together with concise method bodies.)
public boolean equals(Object other) {
return EQ.equivalent(this, other);
}
or:
public boolean equals(Object other) = EQ::equivalent;
The advantage of equivalent(T, T)
is that uses other than implementing Object#equals
are cleaner, and provide some additional type-safety. It also avoids a decision about using getClass()
vs. instanceof
, since the type check and case will happen separately.
public boolean equals(Object other) {
return other instanceof Foo that && EQ.equivalent(this, that);
}
or:
public boolean equals(Object other) ->
other instanceof Foo that && EQ.equivalent(this, that);
Another option would be to use equivalent(T, T)
and convert to an Equivalence<Object>
before using it to implement Object.equals
, to avoid the use-case casts. (But then the awkwardness is at the declaration instead, and it sacrifices additional type-safety.)
just a sketch to illustrate the idea
interface Equivalence<T> {
boolean equivalent(T self, Object other);
int hash(T self);
static <T> Equivalence<T> of(Class<T> clazz, Function<T, ?>... decomposers) {
return new Equivalence<T>() {
public boolean equivalent(T self, Object other) {
if (!clazz.isInstance(other)) {
return false;
}
T that = clazz.cast(other);
return Arrays.stream(decomposers)
.allMatch(d -> Objects.equals(d.apply(self), d.apply(that)));
}
public int hash(T self) {
return Arrays.stream(decomposers)
.map(d -> Objects.hashCode(d.apply(self)))
.reduce(17, (x, y) -> 31 * x + y);
}
};
}
}
equals
and hashCode
implementationsWe have observed a wide array of bugs in implementations of equals
and hashCode
methods, often in the process of rolling out static analysis to prevent these issues.
Some of them seem obvious in hindsight, and unlikely to occur in code written by experienced Java developers, but invariably they have shown up in real-world code. One factor is that because equals
and hashCode
methods are seen as boilerplate they often aren’t reviewed with a lot of scrutiny, and bugs can creep in over time, especially as other parts of the class change.
Object.equals()
, but not hashCode()
. (The contract for Object.hashCode states that if two objects are equal, then calling the hashCode() method on each of the two objects must produce the same result. Implementing equals() but not hashCode() makes that unlikely to be the case.)==
may have been intended instead of this.equals(other)
.)a == that.a && b == that.a
.NullPointerException
when given a null
argument. (They should return false
instead.)ClassCastException
when given an argument of the wrong type. (They should return false
instead.)hashCode()
. (Hashes collide frequently, so this will lead to false positives.)hashCode()
that is not tested in the corresponding equals()
method. (Objects that are equal must have the same hashCode()
.)equals
and hashCode
implementations that use reference equality or hashCode for array members. (They likely intended value equality and hashCode.)JDK-4771660: (coll) Comparator, Comparable, Identity, and Equivalence
JDK-6270657: (coll) remove/contains and “Equators” other than .equals()
Apache Commons’ EqualsBuilder
is an existing library to help implement equals methods.
https://github.com/forax/exotic/blob/master/src/main/java/com.github.forax.exotic/com/github/forax/exotic/ObjectSupport.java
https://sourceforge.net/p/hsqldb/svn/HEAD/tree/base/trunk/src/org/hsqldb/lib/ObjectComparator.java