CustomizationExample README

John Rose

August, 2020

This set of classes and benchmarks is intended to model the use of ClassValue and MethodHandle APIs as runtime support for future generic specialization.

The idea is to use a runtime “hook” such as a ClassValue or a bootstrap method to lazily produce customized versions of reusable methods. The methods themselves don’t change but, in each specialized context, a customized optimization can take into account details such as types of parameters, return values, receivers, and array elements.

This example code uses many small subclasses inheriting from a common superclass and reusing the methods of that superclass. The superclass has bootstrap-like functionality which “spins up” method handles for specific list element types, including primitive types.

Since this example works apart from Valhalla, there are no inline types, but it would be very natural to add specialized subclasses for inline types wherever primitive specializations occur in this example.

It is expected that in a future version of generic specialization, there would be only a single specializable generic class containing methods (instead of the present superclass), and a number of “species” automatically derived from that single class (instead of the present subclasses).

On the simple benchmark included, the performance of the subclasses varies widely. Hand-specialization (that is, manual copy/paste/edit) gains about a 10x speedup; conversely the use of “automagic” ClassValue and MethodHandle mechanisms slows things down by almost 10x. This is almost certainly due to a lack of constant folding and subsequent specialization and/or inlining. Here is the hot method (in BasicCustomList, the common source of implementations):

    public T get(int i) {
        return elementTypeMethods().arrayLoadT(elementArray, i);
    }
    //where
    TypeMethods<T> elementTypeMethods() {
        return (TypeMethods) TYPE_METHODS.get(elementType());
    }
    //and
    class TypeMethods<T> { ...
        final MethodHandle arrayLoadT;
        T arrayLoadT(Object a, int i) {
            try {
                return (T) arrayLoadT.invoke(a, i);
            } catch (Throwable ex) {
                throw throwElseAssert(ex);
            }
        }
    }

In order for this to go fast, several things need to happen:

The self-reflective query this.elementType() needs to constant fold. This is simple, as long as we “trust” whatever final variable is doing the bookkeeping. (See JDK-8233873: final field values should be trusted as constant.)
The lazy method handle spinning, done by elementTypeMethods, needs to be constant-folded, as long as the ClassValue has a binding for the desired element type. (See JDK-8238260: ClassValue.get should be optimized like MutableCallSite.getTarget.)
The method handle call (hidden in unboxT) should also constant fold. This is another place where final fields need to be trusted (JDK-8233873 again).
The polymorphic invocation (MethodHandle::invoke) must be customized to expected type, as if the user had guessed the correct static type and used exact invocation (MethodHandle::invokeExact). Efficient exact invocation of constant method handles is a solved problem.
If the element type in question is a primitive, redundant boxing and unboxing needs to be removed. (This assumes that the hot loops “knows” that the list values are a specific primitive. This is true in our benchmark, and can be assumed also in the future when customization is already being applied to the hot loop.) Currently, box elimination depends sensitively on an exhaustive inlining of related code; it often fails in the presence of out-of-line calls. This may need more work.
In the unlikely case where everything inlines correctly, from the benchmark (or real-world application) through a hot loop to a call to a known species (subclass) of BasicCustomList, then the previous steps are probably enough to get performance comparable to hand-customized classes. But if there are out-of-line calls, then more work needs to be done.

6a. If the containing loop is reached by an out-of-line call, it needs to be customized. If that loop is in a virtual method (say, in the java.util.stream runtime), customization should apply to whatever customizing container is executing the loop; hypothetically this could be a self-customizing stream.

6b. If the containing loop for some reason needs to call the hot method out-of-line, then the hot method should be customized, and there must be a hot path to reach it quickly. (See steps 1-4 above.)

The the v-table entry for a customized method (from 6a or 6b) itself should to be customized (especially in case 6b), so that hot calls reacgh the customized code more directly (as a side-effect of v-table dispatch). The shared code as originally defined in the superclass should be reserved only for cold paths or fallbacks from customized code.
The calling sequence for an out-of-line call to customized code should be adjusted to avoid boxing, if parameters and/or return values are customized to primitives or inline objects.