Equivalence

Liam Miller-Cushon, April 2019

Background

Correctly implementing `equals()` and `hashCode()` requires too much ceremony.

Implementations are time-consuming to write by hand and, worse, expensive to maintain. IDEs can help generate the initial code, but once generated that code needs to be read, and debugged, and maintained as the class changes. Over time these methods become a place for bugs to hide (see the appendix for real-world examples).

Consider all the boilerplate required to correctly implement `equals()` and `hashCode()` for a trivial `Point` class:

``````class Point {
final int x;
final int y;

Point(int x, int y) {
this.x = x;
this.y = y;
}

@Override public boolean equals(Object other) {
if (!(other instanceof Point)) {
return false;
}
Point that = (Point) other;
return x == that.x && y == that.y;
}

@Override public int hashCode() {
return Objects.hash(x, y);
}
}``````

Objective

The goal of this proposal is to create a library solution to help produce readable, correct, and performant implementations of `equals()` and `hashCode()`.

A secondary use case is to provide new definitions of equivalence for existing types, which differ from those types’ `equals()` and `hashCode()` implementations.

The API will likely consist of an interface with a method for testing equality and computing hashCodes for instances of the type.

WARNING: All API details in examples are very much non-final, and included only to illustrate the idea.

``````interface Equivalence<T> {
boolean equivalent(T self, Object other);
int hash(T self);
}``````

With that hypothetical API, our `Point` example might look something like this:

``````class Point {
int x;
int y;

private static final Equivalence<Point> EQ =
Equivalence.of(Point.class, p -> p.x, p -> p.y);

@Override public boolean equals(Object other) {
return EQ.equivalent(this, other);
}

@Override public int hashCode() {
return EQ.hash(this);
}
}``````

In the future we expect value classes like `Point` to become records, but there will always be some cases where a class needs an `equals` and `hashCode` implementation and can’t be migrated to a record. This proposal is complementary to records, and would help avoid ever needing to hand-implement `equals` and `hashCode`.

Non-goals of this document

There are some opportunities here to make the performance competitive to (or better than) hand-written implementations through language extensions or compiler support, but that is out of scope for this document. The goal is to determine what the best version of the feature we could realize purely as a library is.

It’s possible that a future version of the language will support ‘field references’, such as `Foo::x` where `x` is a field. The Equivalence API would dovetail nicely with that feature and might help provide motivation for it, but the details of that feature are out of scope for this document.

Requirements

Should the API support both `equals()` and `hashCode()`, or focus solely on `equals()`?

An advantage of supporting both `equals()` and `hashCode()` is that implementing `hashCode` and `equals` avoids a common source of bugs. Implementations of `hashCode` that consider more state than the corresponding `equals` function are incorrect. Having a single canonical list of state to consider in both `equals` and `hashCode` isn’t just about avoiding boilerplate, it’s about

correctness.

(Being able to share the list of state used by `equals` and `hashCode` with a `Comparator` implementation is potentially interesting; see “What’s the relationship to `Comparator`?” below)

Should the API support custom comparison functions?

One option is for the API to always use `Object.equals` and `Object.hashCode` when testing the provided state. Another is to allow custom comparators to be associated with derived state, for example we might want to perform a case-insensitive comparison on a `String` field.

``````private static final Equivalence<Point> EQ =
Equivalence.forClass(Person.class)
.andThen(Person::age)
.andThen(Person::name, CASE_INSENSITIVE_EQ); // also of type Equivalence``````

(Another example where custom comparisons will be helpful is with arrays, where we often want `Arrays.deepEquals` and `Object.deepHashCode` instead of `Object.equals` and `Object.hashCode`. Arrays are a common-enough special case that it’s natural to discuss preferential treatment in the API, which we do below.)

Facilitate omitting state from hashCode?

The state included in a correct implementation of `hashCode` must be a subset of the state considered by `equals`. Sometimes using only a proper subset in `hashCode` produces a good-enough hash that is faster to compute.

Support for this feature falls out of custom comparisons discussed above. For example it might look something like:

``````private static final Equivalence<Point> EQ =
Equivalence.forClass(Point.class)
.andThen(Point::x)
.andThen(Point::y, Equivalence.using(Objects::equals, x -> 0));``````

We could consider adding more explicit support to the API for this-use case (for example: `Equivalence.forClass(Point.class)andThen(Point::x).butNotHashing(Point::y)`, but we think that level of support is unnecessary. It’s an uncommon use-case (the best practice for hash functions is to avoid trivially constructible collisions), and it’s already possible without additions to the API.

Is the hash reduce function customizable?

The time-honoured tradition in many `hashCode()` implementations has been to combine the hash of each piece of state using `(x, y) -> 31 * x + y`. This is usually a fine choice; we don’t see a compelling reason to support customization.

Either way, we should absolutely not specify the exact behavior of the default hash reduce function, to allow future improvements to it.

(One slightly aggressive way to ensure that would be to have optional per-JVM-invocation randomization of the hash seed, which could be enabled for testing. Google has been randomizing hash iteration order in our tests for a few years with good results.)

Does equals use `instanceof` or `getClass()`?

There’s a choice between implementing `equals` using `instanceof`, or using `getClass()`, or offering the choice of either. We want to try to avoid philosophical debates on which is more correct.

Fortunately we think there’s an easy way to decide this. `instanceof` is the more permissive default, since the user can still perform an explicit `getClass()` check in the `Equivalence` chain, or as a guard before calling `Equivalence.equals`, for example:

``this.getClass() == o.getClass() && EQ.equivalent(this, o);``

The reverse is not possible.

What happens to `null`?

`equivalent(nonNull, null)` must safely return `false` in order to be useful for implementing `Object.equals()`, because of the symmetry requirement.

`equivalent(null, null)` should probably return `true` rather than throw, to be as consistent and unsurprising as possible.

What’s the relationship to `Comparator`?

There are some clear similarities: both `Comparator` and `Equivalence` allow extracting state from an instance (for use with `compareTo` and `equals`/`hashCode`, respectively).

There are also some reasons why it may be necessary to approach them as completely separate APIs, and not have one be a generalization of the other.

`Comparator` can provide half of an `Equivalence` implementation–`x.compareTo(y) == 0` is an OK implementation of `equivalent`–but it can’t implement `hashCode`. So making `Comparator` extend `Equivalence` would force it to throw `UnsupportedOperationException` for `hashCode`.

Alternatively, `Equivalence` has some of the pieces of a `Comparator` implementation, including the state that you might want to test in a comparison function. However having `Equivalence` support comparison would create overlap with `Comparator`, and the state you want to compare might only be a subset of the state considered by `equals` and `hashCode`.

A third approach would be to have a way to create both an `Equivalence` and `Comparator` at the same time, by providing a single list of state extractors. This might require having a common super-type, which adds complexity and could muddy the two concepts.

Design Questions

What is the name of the API?

Two options that have been discussed so far are:

1. `Equalator`, which references `Comparator`
2. `Equivalence`, since an instance of this type is an equivalence relation.

Our opinion is that coining a new word is unnecessary, when a well-known word exists already in the world of mathematics.

Should arrays be compared for content equality or reference equality?

NOTE: This question is about default behavior; the ability to customize the `equals` and `hashCode` functions used for a specific field ensures it will always be possible to make an explicit choice.

There are at least two schools of thought here, each with strong arguments. This document isn’t going to settle the debate, just present the options.

An argument for detecting arrays and automatically using `Arrays.{deepEquals,deepHashCode}` is that using `Object.{equals,hashCode}` on arrays is effectively always a bug, so we’re doing users a favor by recognizing and fixing that bug. (Experience with static analysis to disallow calling `Object.{hashCode,equals}` on arrays has shown this is effectively always the behavior users want.)

An argument against is that it complicates the mental model for arrays. No matter what, users need to learn that arrays use reference equality. When we do them a favor in this context only, we pollute that knowledge, which has consequences. Note that Kotlin chose this approach.

Specialized custom comparisons

Should we try to avoid boxing/varargs overhead in `Equivalence`, for example by having specialized interfaces like `IntEquivalence` and overloads of the builder methods like `andThenInt(IntFunction)` and `andThenInt(IntFunction, IntEquivalence)`?

This might be necessary to get the desired performance in some situations. On the other hand, it adds significantly to the complexity of the API.

One alternative is to investigate translation strategies that achieve similar performance without surfacing that complexity in the Java API.

`equivalent(T, Object)` or `equivalent(T, T)`

There are two candidates for the signature of the `equivalent` function in `Equivalence`: `equivalent(T, Object)`, and `equivalent(T, T)`.

Using `equivalent(T, Object)` allows implementing `Object#equals` with less ceremony. We anticipate a larger fraction of uses of `Equivalence` being to implement `Object#equals` than the fraction of `Comparators` that implement `compareTo`, which is an argument for biasing towards this use-case. (The ease of implementing `Object#equals` is especially significant when used together with concise method bodies.)

``````public boolean equals(Object other) {
return EQ.equivalent(this, other);
}``````

or:

``public boolean equals(Object other) = EQ::equivalent;``

The advantage of `equivalent(T, T)` is that uses other than implementing `Object#equals` are cleaner, and provide some additional type-safety. It also avoids a decision about using `getClass()` vs. `instanceof`, since the type check and case will happen separately.

``````public boolean equals(Object other) {
return other instanceof Foo that && EQ.equivalent(this, that);
}``````

or:

``````public boolean equals(Object other) ->
other instanceof Foo that && EQ.equivalent(this, that);``````

Another option would be to use `equivalent(T, T)` and convert to an `Equivalence<Object>` before using it to implement `Object.equals`, to avoid the use-case casts. (But then the awkwardness is at the declaration instead, and it sacrifices additional type-safety.)

Appendix

Straw implementation

just a sketch to illustrate the idea

``````interface Equivalence<T> {

boolean equivalent(T self, Object other);

int hash(T self);

static <T> Equivalence<T> of(Class<T> clazz, Function<T, ?>... decomposers) {
return new Equivalence<T>() {
public boolean equivalent(T self, Object other) {
if (!clazz.isInstance(other)) {
return false;
}
T that = clazz.cast(other);
return Arrays.stream(decomposers)
.allMatch(d -> Objects.equals(d.apply(self), d.apply(that)));
}

public int hash(T self) {
return Arrays.stream(decomposers)
.map(d -> Objects.hashCode(d.apply(self)))
.reduce(17, (x, y) -> 31 * x + y);
}
};
}
}``````

Examples of bugs in `equals` and `hashCode` implementations

We have observed a wide array of bugs in implementations of `equals` and `hashCode` methods, often in the process of rolling out static analysis to prevent these issues.

Some of them seem obvious in hindsight, and unlikely to occur in code written by experienced Java developers, but invariably they have shown up in real-world code. One factor is that because `equals` and `hashCode` methods are seen as boilerplate they often aren’t reviewed with a lot of scrutiny, and bugs can creep in over time, especially as other parts of the class change.

• Overriding `Object.equals()`, but not `hashCode()`. (The contract for Object.hashCode states that if two objects are equal, then calling the hashCode() method on each of the two objects must produce the same result. Implementing equals() but not hashCode() makes that unlikely to be the case.)
• Equals implementations that unconditionally recurse. (`==` may have been intended instead of `this.equals(other)`.)
• Comparing mis-matched pairs of fields or getters, e.g. `a == that.a && b == that.a`.
• Equals implementations that throw a `NullPointerException` when given a `null` argument. (They should return `false` instead.)
• Equals implementations that throw a `ClassCastException` when given an argument of the wrong type. (They should return `false` instead.)
• Implementation equals by delegating to `hashCode()`. (Hashes collide frequently, so this will lead to false positives.)
• Considering state in `hashCode()` that is not tested in the corresponding `equals()` method. (Objects that are equal must have the same `hashCode()`.)
• `equals` and `hashCode` implementations that use reference equality or hashCode for array members. (They likely intended value equality and hashCode.)
• Other bugs (which are out of scope for the proposal): usage errors like comparing two statically different types, or non-local errors with definitions (e.g. overriding equals and changing semantics, breaking substitutability)