This set of classes and benchmarks is intended to model the use of
ClassValue
and MethodHandle
APIs as runtime support for
future generic specialization.
The idea is to use a runtime “hook” such as a ClassValue
or a
bootstrap method to lazily produce customized versions of reusable
methods. The methods themselves don’t change but, in each specialized
context, a customized optimization can take into account details such
as types of parameters, return values, receivers, and array elements.
This example code uses many small subclasses inheriting from a common superclass and reusing the methods of that superclass. The superclass has bootstrap-like functionality which “spins up” method handles for specific list element types, including primitive types.
Since this example works apart from Valhalla, there are no inline types, but it would be very natural to add specialized subclasses for inline types wherever primitive specializations occur in this example.
It is expected that in a future version of generic specialization, there would be only a single specializable generic class containing methods (instead of the present superclass), and a number of “species” automatically derived from that single class (instead of the present subclasses).
On the simple benchmark included, the performance of the subclasses
varies widely. Hand-specialization (that is, manual copy/paste/edit)
gains about a 10x speedup; conversely the use of “automagic”
ClassValue
and MethodHandle
mechanisms slows things down by almost
10x. This is almost certainly due to a lack of constant folding and
subsequent specialization and/or inlining. Here is the hot method (in
BasicCustomList
, the common source of implementations):
public T get(int i) {
return elementTypeMethods().arrayLoadT(elementArray, i);
}
//where
TypeMethods<T> elementTypeMethods() {
return (TypeMethods) TYPE_METHODS.get(elementType());
}
//and
class TypeMethods<T> { ...
final MethodHandle arrayLoadT;
T arrayLoadT(Object a, int i) {
try {
return (T) arrayLoadT.invoke(a, i);
} catch (Throwable ex) {
throw throwElseAssert(ex);
}
}
}
In order for this to go fast, several things need to happen:
The self-reflective query this.elementType()
needs to constant
fold. This is simple, as long as we “trust” whatever final variable
is doing the bookkeeping. (See JDK-8233873: final field values
should be trusted as constant.)
The lazy method handle spinning, done by elementTypeMethods
,
needs to be constant-folded, as long as the ClassValue
has a
binding for the desired element type. (See JDK-8238260:
ClassValue.get should be optimized like
MutableCallSite.getTarget.)
The method handle call (hidden in unboxT
) should also constant
fold. This is another place where final fields need to be trusted
(JDK-8233873 again).
The polymorphic invocation (MethodHandle::invoke
) must be
customized to expected type, as if the user had guessed the correct
static type and used exact invocation (MethodHandle::invokeExact
).
Efficient exact invocation of constant method handles is a solved
problem.
If the element type in question is a primitive, redundant boxing and unboxing needs to be removed. (This assumes that the hot loops “knows” that the list values are a specific primitive. This is true in our benchmark, and can be assumed also in the future when customization is already being applied to the hot loop.) Currently, box elimination depends sensitively on an exhaustive inlining of related code; it often fails in the presence of out-of-line calls. This may need more work.
In the unlikely case where everything inlines correctly, from the
benchmark (or real-world application) through a hot loop to a
call to a known species (subclass) of BasicCustomList
, then the
previous steps are probably enough to get performance comparable to
hand-customized classes. But if there are out-of-line calls, then
more work needs to be done.
6a. If the containing loop is reached by an out-of-line call, it needs to be customized. If that loop is in a virtual method (say, in the
java.util.stream
runtime), customization should apply to whatever customizing container is executing the loop; hypothetically this could be a self-customizing stream.
6b. If the containing loop for some reason needs to call the hot method out-of-line, then the hot method should be customized, and there must be a hot path to reach it quickly. (See steps 1-4 above.)
The the v-table entry for a customized method (from 6a or 6b) itself should to be customized (especially in case 6b), so that hot calls reacgh the customized code more directly (as a side-effect of v-table dispatch). The shared code as originally defined in the superclass should be reserved only for cold paths or fallbacks from customized code.
The calling sequence for an out-of-line call to customized code should be adjusted to avoid boxing, if parameters and/or return values are customized to primitives or inline objects.