-*- mode: outline -*- Fast and Decisive Class Checking in the Hotspot VM John Rose & Cliff Click, 8/2001 * Definitions DEF 0: A check is __decisive__ if it always produces an answer in a bounded number of CPU operations. A check is __semi-decisive__ if it always produces an answer without a VM call. DEF 1: A __klass__ is (the Hotspot VM representation of) any non-null reference type (Java class, interface, or array type). DEF 2: A klass T is a __primary supertype__ iff T is a loaded proper class, or an array of a primary supertype, or an array of primitive values. Note that this T is called a primary supertype independently of any subtypes it may have. Under this definition, even a final class can be called a primary supertype. We say "T is a primary supertype of S" iff T is a primary supertype and T is a supertype of S. Likewise for "T a secondary supertype of S". DEF 3: A klass T is a __secondary supertype__ iff T is a loaded interface, or an array of a secondary supertype. As before, such a T is intrinsically a secondary supertype, without reference to any subtypes it may have. When we wish to observe T with reference to some subtype S, we say "T is a secondary supertype of S." DEF 4: The __super depth__ of any klass S is zero if S has no primary supertypes, and otherwise is one plus the number of primary supertypes of S (exclusive of S). DEF 5: Given a type S with primary supertypes, the __direct primary supertype__ T of S is a "minimal" primary supertype of S. By "minimal" we mean that any other primary supertype T' of S is also a supertype of T. Note that if S has any primary supertypes, the direct primary supertype T exists and is unique. Thus the set of all primary supertypes forms a tree under the subtype relation. Therefore, "super depth" is tree depth within the primary supertype relation. * Examples Object, Object[], Object[][], int[], int[][], String, and String[] are primary supertypes. Comparable, Comparable[], and Comparable[][] are secondary supertypes. The super depth of Object is 0, of String and Number is 1, and of Integer is 2. The super depth of Comparable (or any interface) is 1. (In our VM, currently, every interface inherits directly from Object.) The primary supertypes of Integer are Object and Number. Its secondary supertypes are Comparable and Serializable. The primary supertypes of Integer[][] are (by increasing super depth): Object, Object[], Object[][], Number[][] The secondary supertypes of Integer[][] are (in no particular order): Cloneable, Serializable, Cloneable[], Serializable[], Comparable[][], Serializable[][] * Hotspot VM Data Structures The Klass::super_depth accessor of every klass S reports S's super depth. The Klass::_primary_supers field of a klass S provides a list of all primary supertypes of S. Its format will be described below. Its order, which is highly significant, is by increasing super depth. (The first element, if any, is always the Object klass.) The Klass::_secondary_supers field of a klass S provides a list of all secondary supertypes of S. It points to an objArray of klasses. An integer Klass::_super_check_offset will be added. It is described below. (These fields are new, after 1.4.) * Checking Primary Supertypes, Trial Version Thus, one can decisively determine if a test klass T is a supertype of an object klass S, by checking S's _primary_supers and _secondary_supers lists. Moreover, T is not in S's _primary_supers if it is a secondary supertype, and T is not on S's _secondary_supers list is a primary supertype. This means that, by inspecting T, we only need to seach one of S's lists. Moreover, if T is a primary supertype, its position on S's _primary_supers is determined by T's super depth, and without reference to S. For example, if T's super depth is 0 (i.e., T is java.lang.Object), then T will be the first element in S's _primary_supers, if it is there at all. Or, if T's super depth is 1 (e.g., T is java.lang.Throwable), T will be the second in S's _primary_supers. Now consider the operation S->super_of_depth(D) which produces the unique element of S's _primary_supers whose super depth is D. (Define the operation to produce NULL for D > S->super_depth().) Then the following formula is true for every klass S and primary supertype T: S->is_subtype_of(T) := (T == S->super_of_depth(T->super_depth())) These insights would allow us to reorganize S's list of primary super into an array indexed by super depth, leading immediately to a fast decisive test for all primary supertypes: S->super_of_depth(D) := D <= S->super_depth()? S->_primary_supers[D]: NULL This technique does not map perfectly to a few machine instructions. It includes an undesirable range check, and also the layout of Hotspot klasses does not allow us to place a variable-length array of supertypes at a fixed offset in the Klass layout. Also, it does not help much in the case where T is a secondary supertype. (It happens to produce the right answer for secondaries only if T==S.) * Checking Primary Supertypes, Production Version We can adjust the previous algorithm ensure that checking an unknown type S against a known primary supertype T can be done with one memory access, by making the _primary_supers an embedded array within the Klass layout, and giving it a statically fixed size (SUPER_LIMIT, e.g., 8). Then we can enjoy a one-load implementation of super_of_depth: S->super_of_depth(D) := S->_primary_supers[D] In effect, we wish to remove the need for range checks and floating arrays by putting a limit on the super depth of all klasses. This is easily done, by adjusting the definitions of primary and secondary supertypes: DEF 2': A klass is an __restricted primary supertype__ iff it is a loaded proper class, or an array of a primary supertype, or an array of primitive values, and also that klass's restricted super depth is less than SUPER_LIMIT. DEF 3': A klass is a __restricted secondary supertype__ iff it is a loaded interface, or an array of a secondary supertype, or any klass of restricted super depth SUPER_LIMIT. DEF 4': The __restricted super depth__ of a klass S is zero if S has no restricted primary supertypes, and otherwise is one plus the number of restricted primary supertypes of S (exclusive of S). Note that the definition of "restricted super depth" is recursively involved with the first two definitions, and that no klass ever has a restricted super depth larger than SUPER_LIMIT. All examples given above also hold true for the corresponding "restricted" definitions, assuming that SUPER_LIMIT is at least 4. By using these restricted definitions, we can constrain the size of the Klass::_primary_supers array. The type checks against restricted primary types will go faster because of the removal of indirections and range checks. Overflows in the primary array (because you are too deep in the type heirarchy) go into the secondary array. Type checks against very deep classes will be slow. * Checking Secondary Supertypes In order to quickly check secondary supertypes, we could introduce a global numbering on them, as IBM does with interfaces. It is easy to envision a decisive type check based on a global type numbering which indexes a two-dimensional bit table. However, this is difficult to do well in the presence of dynamic loading and unloading of types. A simpler technique which performs a linear search on the Klass::_secondary_supers list is slower, but still semi-decisive, because we can do a short linear search inline, without visiting the VM. It is also still reasonably fast, given the typically short length of supertype lists. (Note that creation of VM calls is a significant compile-time cost. Thus, semi-decisive techniques are usually preferable over non-decisive ones, even if the VM calls are dynamically very rare.) If we prepend cache checks to the linear search, we can avoid the search in most cases. A secondary supertype check, with a one-element supertype cache, looks like this: S->has_secondary_supertype(T) := { if (S->_secondary_super_cache == T) return true; for (int i = 0; i < S->_secondary_supers->_length; i++) if (S->_secondary_supers->at(i) == T) { S->_secondary_super_cache = T; return true; } return false; } * Combining the Checks The last problem to solve is quickly determining whether a given test klass is a primary or secondary supertype. This might be done by means of a bit in the header of the klass, etc. The best approach is to force the super depth property of restricted secondary supertypes to contain a distinguished value (SUPER_LIMIT). Note that super depth is only useful in primary supertypes, as a parameter to super_of_depth, and for restricted supertypes it is always less than SUPER_LIMIT. A further variation this approach gives the best code. Define a new field Klass::_super_check_offset, which holds the bytewise offset of a klassOop field relative to the base of the Klass layout. For a restricted primary supertype T, the _super_check_offset is the offset of Klass::_primary_supers[T->super_depth()]. For a restricted secondary supertype T, the _super_check_offset is the offset of Klass::_secondary_super_cache. This means that most type checks can be carried out in two memory references, without regard to whether the test type is primary or secondary: S->is_subtype_of(T) := { int off = T->_super_check_offset; klassOop sup = *(klassOop*)( (address)S + off ); if (sup == T) return true; // usually wins if (off != &Klass::_secondary_super_cache) return false; // always taken for primary supertype T if (S == T) return true; // S is not in its own super lists if (S->search_for_secondary(T)) { S->_secondary_super_cache = T; return true; } else return false; } Note that the second test determines whether the first test was decisive or not. This combined test is decisive for primary supertypes, in two memory references. Depending on the effectiveness of the _secondary_super_cache, it usually produces an answer in the same two memory references for secondary supertypes also. The code shape of this test is very easy to optimize. If T is constant, then its _super_check_offset may be constant folded, as can the second test. This is true in most cases, and allows most type checks to obtain an answer in one memory reference. Note that the first two tests are independent, and may be reordered if the CPU allows a compare to be issued concurrently with a branch on a previous compare: ld [%T + &Klass::_super_check_offset], %X1 cmp %X1, &Klass::_secondary_super_cache ld [%S + %X1], %X2 # %X1, %X2 can be the same temp reg bne done cmp %T, %X2 be done cmp %T, %S be done nop # slow case ... done: bne check_failed nop It is quite natural to place the _secondary_super_cache near the zero-th element of the _primary_supers array. If the _super_check_offset field is also placed near the array, then the entire check will use a minimum number of memory cache lines. The _not_super_cache, if any, should be located nearby also. * More On Secondary Supertype Arrays The final test in the code shape above performs a linear search over a second supertype array. It may be augmented by further cache checks, such as additional super type caches, or the two-word subtype cache in the 1.4 Hotspot VM. Note that the 1.4 subtype cache is independent of the supertype cache described above. (If the secondary supertype list is represented as an array contiguous with a multi-word supertype cache, it would be possible to search both together in the same loop. However, this would prevent sharing of secondary supertype lists, and the list as a whole could not be treated as a mutable cache, because MT programs could cause elements to be lost.) For the sake of the instanceof operator, we may desire to speed negative results on secondary supertypes by including a field in every klass S Klass::_not_super_cache, which contains the last secondary supertype that failed to be a subtype of S. When compiling this code, the final search_for_secondary operation may be hidden in a single graph node, and implemented with hand-written assembly instructions. Since it is uncommon, there is no need to worry about optimizing that code. Secondary supertype arrays (Klass::_secondary_supers) are immutable and hence may be shared between klasses with the same supertype lists. Arrays always have the same two-element list of secondary supertypes. We make sharing more likely among subclasses by refusing to put the klass itself on its own supertype lists, even though that requires an extra compare-and-branch in the shape of the type-checking code, to cover the case of T==S. * Comparisons with Jalapeno Compared with IBM's published claims on their experimental VM, our scheme is better in that it: + supports decisive checks for common array types, + handles all checks, usually, in one to two memory operations, + works well with class loading and unloading, and + is at least semi-decisive in all cases (no VM calls at all). Our scheme does not include decisive interface checks, but even the Jalapeno checks were slightly indecisive, in that an initial VM call is required to fill in the "maybe" entries in their trit table. * Checking Against Non-Constant Types Most checks compare a non-constant reference against a statically determined type, so that the "_super_check_offset" value is a compile time constant. This is true of checkcast and instanceof operations. The reflective method Class.isInstance can also produce checks against non-constant types. The compiler implements it as an intrinsic. In the absence of a constant class operand, the intrinsic code will produce a decisive answer in two memory references, for most classes and arrays, and not much more than two in most other cases. A Java reference array variable is polymorphic, in that the variable can point to various kinds of arrays. This leads to the infamous "store check" requirement, where every reference stored into an array must be checked against the element type of the array. This is inconvenient, because the element type may vary at runtime. The Jalapeno paper notes that nearly all array variables are monomorphic, in the sense that they hold (at runtime) references to objects of exactly the statically determined array type, and not a subtype thereof. This means that when an element is stored to an array, the array may be verified (in one memory reference) to be of exactly the expected type, and then the array element may be tested against the expected element type, as if by a checkcast. Thus, most array store checks can optimistically verify an element type in one memory reference. In those rare cases where the optimistic technique fails, because an array variable is polymorphic, the element type to be tested is non-constant, and must be loaded (in two memory references) from the array object. In this case, two more memory references will usually finish the array store check. * Conclusion We have described a type-checking mechanism which usually succeeds in one memory reference for instanceof and checkcast, one or two memory references for array store checks, and two memory references for Class.isInstance. For most proper classes and many array types (i.e., all restricted primary supertypes), the mechanism is fully decisive after one memory reference, when compiling instanceof or checkcast. The memory cost for new super lists and caches is 30-40 bytes per klass compared with 8 bytes for the 1.4 mechanism. The _secondary_supers array is almost identical with the existing _transitive_interfaces array (in instanceKlass), and can subsume it. The compilation costs are favorable, in that there are no slow paths with their associated call structure and register spills. The new type-checking mechanism is easy to implement in the Hotspot runtime. * Appendix: An Experiment with Spec The following numbers were derived by instrumenting the runtime and running the VM in -Xint mode with all type checks redirected to the runtime. (I.e., the interpreter's subtype cache logic was disabled.) The exact run was: java -Xint SpecApplication _200_check _202_jess _213_javac _228_jack Bottom line: Optimal FastSuperclassLimit for spec is 5. Robustly optimal setting is 2 or 3 more: 7 or 8. These numbers tell me that the 1-element secondary subtype cache is important: It hits on 6.5% of all queries, or 99.9% of all secondary queries. Basically it makes the cost of searching the secondary array disappear completely. Given that even in the best case a secondary array search requires two loads in addition to the first two loads, this means that the secondary cache saves at least 7% of the total cost of type checking. The cache also adds a margin of robustness to the performance of the VM, which is important since we do not know all the applications that will be performance sensitive. The "=secondary" case is not important to performance, but allows for (an unknown amount of) footprint reduction, by enabling sharing of more secondary arrays. The idea is that a class might be its own secondary supertype, but it can share the secondary supertypes array with its own superclass. It might be possible to get rid of the "=secondary" case with a small cost to footprint. We haven't measured footprint effects yet. (We haven't measured JBB effects yet.) Key: FastSuperclassLimit = # elements in primary supertype array subtype query = # type checks by interpreter (aastore, i'of, c'cast) subtype primary = # checks against 1ary super w/ true outcome subtype !primary = # checks against 1ary super w/ false outcome subtype cache = # checks against 2ary super w/ cache hit, true outcome subtype =secondary = # checks against 2ary super == self type, true outcome subtype !cache1 = # checks against 2ary w/ hit in negative cache #1 subtype !cache2 = # checks against 2ary w/ hit in negative cache #2 subtype secondary = # linear searches of 2ary array w/ true outcome subtype !secondary = # linear searches of 2ary array w/ false outcome subtype statistics: (FastSuperclassLimit=1) subtype query = 12504552 = 100.0% subtype primary = 1464914 = 11.7% subtype !primary = 0 = 0.0% subtype cache = 8640060 = 69.1% subtype =secondary = 587122 = 4.7% subtype !cache1 = 557221 = 4.5% subtype !cache2 = 432263 = 3.5% subtype secondary = 460346 = 3.7% subtype !secondary = 362626 = 2.9% (Note that with FastSuperclassLimit=1 the only "primary" superclass is java/lang/Object itself. Therefore, the 11.7% primary hits in this case are exactly all tests against Object itself. Since the javac compiler never emits instanceof or checkcast operations against that type, it follows that 11.7% of all type checking operations, in this experiment, were due to aastores to Object arrays.) subtype statistics: (FastSuperclassLimit=2) subtype query = 12504552 = 100.0% subtype primary = 8839736 = 70.7% subtype !primary = 377148 = 3.0% subtype cache = 1508841 = 12.1% subtype =secondary = 581350 = 4.6% subtype !cache1 = 539204 = 4.3% subtype !cache2 = 309926 = 2.5% subtype secondary = 222515 = 1.8% subtype !secondary = 125832 = 1.0% subtype statistics: (FastSuperclassLimit=3) subtype query = 12504552 = 100.0% subtype primary = 9668198 = 77.3% subtype !primary = 654151 = 5.2% subtype cache = 828912 = 6.6% subtype =secondary = 577967 = 4.6% subtype !cache1 = 451748 = 3.6% subtype !cache2 = 194151 = 1.6% subtype secondary = 77365 = 0.6% subtype !secondary = 52060 = 0.4% subtype statistics: (FastSuperclassLimit=4) subtype query = 12504552 = 100.0% subtype primary = 10016074 = 80.1% subtype !primary = 1188958 = 9.5% subtype cache = 804298 = 6.4% subtype =secondary = 287054 = 2.3% subtype !cache1 = 157502 = 1.3% subtype !cache2 = 4907 = 0.0% subtype secondary = 45016 = 0.4% subtype !secondary = 743 = 0.0% (The negative caches were experimental. These numbers indicate that they are completely unused by SpecApplication. Hypothetical applications which do if/then/else logic on interface types will want them.) subtype statistics: (FastSuperclassLimit=5) subtype query = 12504552 = 100.0% subtype primary = 10341031 = 82.7% subtype !primary = 1351410 = 10.8% subtype cache = 810323 = 6.5% subtype =secondary = 750 = 0.0% subtype !cache1 = 0 = 0.0% subtype !cache2 = 0 = 0.0% subtype secondary = 338 = 0.0% subtype !secondary = 700 = 0.0% subtype statistics: (FastSuperclassLimit=6) subtype query = 12504552 = 100.0% subtype primary = 10341349 = 82.7% subtype !primary = 1351431 = 10.8% subtype cache = 810275 = 6.5% subtype =secondary = 546 = 0.0% subtype !cache1 = 0 = 0.0% subtype !cache2 = 0 = 0.0% subtype secondary = 272 = 0.0% subtype !secondary = 679 = 0.0% subtype statistics: (FastSuperclassLimit=7) subtype query = 12504552 = 100.0% subtype primary = 10341667 = 82.7% subtype !primary = 1351451 = 10.8% subtype cache = 810202 = 6.5% subtype =secondary = 307 = 0.0% subtype !cache1 = 0 = 0.0% subtype !cache2 = 0 = 0.0% subtype secondary = 266 = 0.0% subtype !secondary = 659 = 0.0% subtype statistics: (FastSuperclassLimit=8) subtype query = 12504552 = 100.0% subtype primary = 10341709 = 82.7% subtype !primary = 1351457 = 10.8% subtype cache = 810202 = 6.5% subtype =secondary = 265 = 0.0% subtype !cache1 = 0 = 0.0% subtype !cache2 = 0 = 0.0% subtype secondary = 266 = 0.0% subtype !secondary = 653 = 0.0% subtype statistics: (FastSuperclassLimit=9) subtype query = 12504552 = 100.0% subtype primary = 10341709 = 82.7% subtype !primary = 1351457 = 10.8% subtype cache = 810202 = 6.5% subtype =secondary = 265 = 0.0% subtype !cache1 = 0 = 0.0% subtype !cache2 = 0 = 0.0% subtype secondary = 266 = 0.0% subtype !secondary = 653 = 0.0% subtype statistics: (FastSuperclassLimit=10) [fixpoint; identical to limit=9 case]