Generating BMH$Species classes with a jlink plugin ================================================== With the Indify String Concat - ISC - JEP (http://openjdk.java.net/jeps/280) comes various strategies to generate very efficient code for string concatenation in Java. However, the one that brings the best throughput potential also has a bit of a startup issue due to pulling in more of the java.lang.invoke and depending more heavily on indy to function. For example this little program: public class HelloConcat { public static String value = "Concat!"; public static int i = 17; public static float f = 17.0f/5.0f; public static double d = 17.0/5.0; public static boolean b = true; public static String value2 = " Still here?"; public static void main(String ... args) throws Exception { System.out.println("Hello " + value); System.out.println("int: " + i); System.out.println("bool: " + b); System.out.println("float: " + f); System.out.println("double: " + d); System.out.println("Hello " + value + " " + value2); } } ... ran with: java -Djava.lang.invoke.stringConcat=MH_INLINE_SIZED_EXACT -XX:+UseParallelGC HelloConcat ... takes about 298ms on my machine. Ouch! (Granted: this is a dual-socket machine where a simple Hello World barely gets anywhere near 100ms without tuning heavily for startup, e.g., pinning to one socket, and jigsaw alone adds about 60ms at time of writing). Well, if we run with -Xlog:classload we could notice these: [0,188s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L source: jrt:/java.base [0,213s][info][classload] java.lang.invoke.BoundMethodHandle$Species_LL source: __JVM_DefineClass__ [0,233s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L3 source: __JVM_DefineClass__ [0,238s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L4 source: __JVM_DefineClass__ [0,245s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L5 source: __JVM_DefineClass__ [0,249s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L6 source: __JVM_DefineClass__ [0,252s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L6I source: __JVM_DefineClass__ [0,257s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L6II source: __JVM_DefineClass__ [0,260s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L6IIL source: __JVM_DefineClass__ [0,290s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L7 source: __JVM_DefineClass__ [0,292s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L8 source: __JVM_DefineClass__ [0,295s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L9 source: __JVM_DefineClass__ [0,297s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L10 source: __JVM_DefineClass__ [0,300s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L10I source: __JVM_DefineClass__ [0,304s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L10II source: __JVM_DefineClass__ [0,306s][info][classload] java.lang.invoke.BoundMethodHandle$Species_L10IIL source: __JVM_DefineClass__ All except $Species_L are generated at runtime, and spinning such classes are a known startup cost we have to take to get applications using indy up and running. Also, it seems MH_INLINE_SIZED_EXACT is hungry for those Species classes. Anyhow, these classes are generated dynamically because we don't know beforehand which a given program will need, and for some purposes like embedded we don't want to generate a lot of classes statically for footprint reasons. Now, with jigsaw comes many things, for example a way to generate things at link time using jlink plugins. Here's an experimental patch to add such a plugin to allow us to generate those BoundMethodHandle$Species classes at link time, with some degree of flexibility: http://cr.openjdk.java.net/~redestad/scratch/bmh_species_gen.01/ jlink plugins like this are run when generating the runnable image, e.g., the JDK itself, and can be turned on or off or configured for specific purposes depending on what is better for a specific deployment. Well, does it help? Using our little HelloConcat program to "benchmark" the difference: time for i in {1..100}; do java -Djava.lang.invoke.stringConcat=MH_INLINE_SIZED_EXACT -XX:+UseParallelGC HelloConcat > /dev/null; done Before: real 0m29.823s user 1m25.412s sys 0m10.836s After: real 0m28.492s user 1m14.540s sys 0m9.996s So this simple plugin gave us about 13ms improvement in wallclock time and spends about 15% less cycles overall. Not bad, I guess, but there's still room for improvement. I think continuing to expand this plugin approach to include more of java.lang.invoke might get us much closer to the goal of delivering a lot of cool, new high-performance features in JDK 9 while at the same time improve on startup.