In this document, we will show the enhancements to the classfile format that are required in order to support type-specialization. As described in [1], the classfile format, in its current state, does not preserve enough type information to allow specialization of generic classes at runtime. To overcome this problem, the valhalla javac
compiler [2] might decorate a specializable class with additional information - in the form of the bytecode attributes shown below - such that a relatively mechanical on-demand class specialization process can be defined.
Covers the new layered structure of the TypeVariablesMap
attribute.
Covers new erasure_index
field in the TypeVariablesMap
attribute.
TypeVariablesMapattribute
The first thing a specializer runtime might need to know is which type-variables have been marked with the special modifier any
in the corresponding source code. Since the source code is subject to type-erasure, all type information involving type-parameters is lost - meaning that an any
type-variable is turned into an ordinary type-variable whose bound is simply Object
. To make up for this information loss, we define a bytecode attribute, namely TypeVariablesMap
, which stores all source-related flags associated with any given type-variable. The structure of this attribute is given below:
TypeVariablesMap_attribute {
u2 attribute_name_index;
u4 attribute_length;
u1 entries_length;
{
u2 owner_idx
u1 tvars_length;
{
u1 flags;
u2 erasure_idx;
} tvars_info[tvars_length];
} entries_info[entries_length]
}
Here, entries_length
denotes the number of type-variable mappings in this class/method declaration (maximum number of 255 mappings are supported); each mapping is associated to a given owner - the declaration the type-variables in this mapping belongs to. For this purpose, owner_idx
points to a constant pool entry of kind CONSTANT_Utf8_info
containing the string-based representation of the owning declaration (either a method or a class -- see example below). Each mapping contains tvars_length
type-variables, where each type-variable T
is associated with an 8-bit flags (flags
) - currently, only one bit is used, with 0 denoting standard type-variables and 1 denoting any
type-variables, respectively; and with an index (erasure_idx
) to a constant pool entry of kind CONSTANT_Utf8_info
containing the erased signature of T
. Consider the following source:
class Outer<any T> {
<any Z> void m() {
class Inner<any U> { }
}
}
The above program generates three two classfiles, one for the toplevel class Outer
and one for the local class Inner
. Let's look at the TypeVariableMapping
attribute for Inner
:
TypeVariablesMap:
LOuter$1Inner;:
Tvar Flags Erased bound
U [ANY] Ljava/lang/Object;
LOuter;::m()V:
Tvar Flags Erased bound
Z [ANY] Ljava/lang/Object;
LOuter;:
Tvar Flags Erased bound
T [ANY] Ljava/lang/Object;
Note how the TypeVariablesMap
attribute for Inner
defines mappings for both the current and the enclosing type-variables (mappings are sorted from innermost to outermost). This allows for fast type-variable lookups (the alternative would have been to rely on existing InnerClasses
and EnclosingMethod
attribute - which requires jumping between different classfiles).
BytecodeMappingattribute
The runtime specializer needs to know which opcodes in the erased classfile needs to be specialized; for instance, if the erased classfile performs an aload
instruction, and the local variable has type any T
in the source code, the specializer might need i.e. to replace the aload
with an iload
. To allow this rewriting in a straightforward fashion, we introduce an additional bytecode attribute, namely BytecodeMapping
, which stores the bytecode offsets of all specializable opcodes in a given method. Extra type information is also stored in this attribute, so that the original (unerased) type information can be reconstructed by the specializer. The structure of this attribute is given below:
BytecodeMapping_attribute {
u2 attribute_name_index;
u4 attribute_length;
u2 mappings_length;
{
u2 bc_offset;
u2 cp_idx;
} mappings[mapping_length];
}
Here, mapping_length
denotes the number of mappings in this attribute; the mappings are stored in an array (mappings
) of size mapping_length
, where each mapping is a tuple of two elements: a bytecode offset (bc_offset
) and an index to a constant pool entry of kind CONSTANT_Utf8_info
(cp_idx
). The cp_idx
field is crucial to retrieve unerased type-information associated with a given opcode - this info mihght be required by the specializer in order to emit correct opcodes/constant pool entries in the specialized classfiles. An overview of the possible specializable opcoes, along with the type information associated with them is given in the following table (in this table we use the term 'type' to denote an unerased type signature):
opcode | category | Utf8 value |
---|---|---|
aloadXX |
1 | local variable type |
astoreXX |
1 | top-of-stack element type |
aaload |
1 | array element type |
aastore |
1 | array element type |
areturn |
1 | enclosing method return type |
dupXX |
1 | top-of-stack element type |
if_acmpXX |
1 | top-of-stack element type |
new |
2 | class type |
anewarray |
2 | array type |
amultinewarray |
2 | array type |
ldc |
2 | class literal type |
checkcast |
2 | cast type |
instanceof |
2 | instanceof type |
XXfield |
3 | instantiated field descriptor |
invokeXX |
3 | instantiated method descriptor |
As it can be seen, specializable opcodes are divided into three main categories; opcodes in the first category (such as aload
) can be specialized only if the associated unerased type is either an any
type variable or an array type whose element type is an any
type-variable; opcodes in the second category can be specialized if the associated unerased type is a class type where at least one type-parameter is an any
type-variable (or an array thereof).
In the third category we find all opcodes associated with member access (field acces/method call). Such opcodes are specializable only if the unerased selector type is a class type where at least one type-parameter is an any
type-variable (or an array thereof). Note that, as the specializer might need to emit specialized constant pool entries, the associated Utf8 entry needs to store information about both the unerased member owner type and the unerased member type (after all relevant type-substitution has occurred). The two signatures (owner and member type) are concatenated using the symbol ::
(see the example in the following section).
In the following sections we present some examples to show how the BytecodeMapping
attribute is used in practice. Some of those examples are bases upon a slightly simplified version of the Box
class in [1] given below:
class Box<any T> {
T t;
T get() { return t; }
}
aload,
astore
The following generates two bytecode mappings (one for aload
, one for astore
) both pointing to the siganture TT;
.
<any T> void test(T t0) {
t0 = t0;
}
Here's the relevant javap output:
<T extends java.lang.Object> void test(T);
descriptor: (Ljava/lang/Object;)V
flags:
Code:
stack=1, locals=2, args_size=2
0: aload_1
1: astore_1
2: return
BytecodeMapping:
Code_idx Signature
0: TT;
1: TT;
aaload,
aastore
The following generates (among others) two bytecode mappings (one for aaload
, one for aastore
) both pointing to the siganture TT;
.
<any T> void test(T[] tarr, T t) {
t = tarr[0];
tarr[0] = t;
}
Here's the relevant javap output:
<T extends java.lang.Object> void test(T[], T);
descriptor: ([Ljava/lang/Object;Ljava/lang/Object;)V
flags:
Code:
stack=3, locals=3, args_size=3
0: aload_1
1: iconst_0
2: aaload<any T> void testCmpNe(T t1, T t2) {
boolean b = t1 == t2;
}
3: astore_2
4: aload_1
5: iconst_0
6: aload_2
7: aastore
8: return
BytecodeMapping:
Code_idx Signature
2: TT;
3: TT;
6: TT;
7: TT;
areturn
The following generates (among others) a bytecode mappings for areturn
pointing to the siganture TT;
.
<any T> T test(T t) {
return t;
}
Here's the relevant javap output:
<T extends java.lang.Object> T test(T);
descriptor: (Ljava/lang/Object;)Ljava/lang/Object;
flags:
Code:
stack=1, locals=2, args_size=2
0: aload_1
1: areturn
BytecodeMapping:
Code_idx Signature
0: TT;
1: TT;
dup
The following generates (among others) a bytecode mappings for dup
pointing to the siganture TT;
.
<any T> void test(T t1, T t2, T t3) {
t1 = (t2 = t3);
}
Here's the relevant javap output:
<T extends java.lang.Object> void test(T, T, T);
descriptor: (Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)V
flags:
Code:
stack=2, locals=4, args_size=4
0: aload_3
1: dup
2: astore_2
3: astore_1<any T> void testCmpNe(T t1, T t2) {
boolean b = t1 == t2;
}
4: return
LineNumberTable:
line 4: 0
line 5: 4
BytecodeMapping:
Code_idx Signature
0: TT;
1: TT;
2: TT;
3: TT;
if_acmpne,
if_acmpeq
The following generates (among others) two bytecode mappings (one for if_acmpne
, one for if_cmpeq
) both pointing to the siganture TT;
.
<any T> void test(T t1, T t2) {
boolean b1 = t1 == t2;
boolean b2 = t1 != t2;
}
Here's the relevant javap output:
<T extends java.lang.Object> void test(T, T);
descriptor: (Ljava/lang/Object;Ljava/lang/Object;)V
flags:
Code:
stack=2, locals=5, args_size=3
0: aload_1
1: aload_2
2: if_acmpne 9
5: iconst_1
6: goto 10
9: iconst_0
10: istore_3
11: aload_1
12: aload_2
13: if_acmpeq 20
16: iconst_1
17: goto 21
20: iconst_0
21: istore 4
23: return
BytecodeMapping:
Code_idx Signature
0: TT;
1: TT;
2: TT;
11: TT;
12: TT;
13: TT;
new
The following generates a bytecode mapping for new
pointing to the siganture LBox<TT;>;
.
<any T> void test(T t) {
new Box<T>();
}
Here's the relevant javap output:
<T extends java.lang.Object> void test(T);
descriptor: (Ljava/lang/Object;)V
flags:
Code:
stack=2, locals=2, args_size=2
0: new #2 // class Box
3: dup
4: invokespecial #3 // Method Box."<init>":()V
7: pop
8: return
BytecodeMapping:
Code_idx Signature
0: LBox<TT;>;
4: LBox<TT;>;::()V
anewarray,
multianewarray
The following generates two bytecode mappings (one for newarray
, one for anewarray
) each pointng to the correspoinding unerased array signature - [TZ;
and [[TZ;
, respectively.
<any Z> void test() {
Z[] arr1 = new Z[2];
Z[][] arr2 = new Z[2][4];
}
Here's the relevant javap output:
<Z extends java.lang.Object> void test();
descriptor: ()V
flags:
Code:
stack=2, locals=3, args_size=1
0: iconst_2
1: anewarray #2 // class java/lang/Object
4: astore_1
5: iconst_2
6: iconst_4
7: multianewarray #3, 2 // class "[[Ljava/lang/Object;"
11: astore_2
12: return
BytecodeMapping:
Code_idx Signature
1: [TZ;
7: [[TZ;
ldc
The following generates a bytecode mapping for new
pointing to the siganture LBox<TT;>;
.
<any T> void test() {
Class<?> c = Box<T>.class;
}
Here's the relevant javap output:
<T extends java.lang.Object> void test();
descriptor: ()V
flags:
Code:
stack=1, locals=2, args_size=1
0: ldc #2 // class Box
2: astore_1
3: return
BytecodeMapping:
Code_idx Signature
0: LBox<TT;>;
checkcast,
instanceof
The following generates two bytecode mappings (one for checkcast
, one for instanceof
) both pointing to the siganture LBox<TZ;>;
.
<any Z> void test() {
Object o = (Box<Z>)null;
boolean b = (o instanceof Box<Z>);
}
Here's the relevant javap output:
<Z extends java.lang.Object> void test();
descriptor: ()V
flags:
Code:
stack=1, locals=3, args_size=1
0: aconst_null
1: checkcast #2 // class Box
4: astore_1
5: aload_1
6: instanceof #2 // class Box
9: istore_2
10: return
LineNumberTable:
line 4: 0
line 5: 5
line 6: 10
BytecodeMapping:
Code_idx Signature
1: LBox<TZ;>;
6: LBox<TZ;>;
getfield,
invokevirtual
The following generates two (among others) bytecode mappings (one for getfield
, one for invokevirtual
) each pointng to the correspoinding unerased member descriptor - LBox<TZ;>;::TZ;
and LBox<TZ;>;::()TZ;
, respectively.
<any Z> void test(Box<Z> bz) {
Z z = bz.t;
z = bz.get();
}
Here's the relevant javap output:
<Z extends java.lang.Object> void test(Box<Z>);
descriptor: (LBox;)V
flags:
Code:
stack=1, locals=3, args_size=2
0: aload_1
1: getfield #2 // Field Box.t:Ljava/lang/Object;
4: astore_2
5: aload_1
6: invokevirtual #3 // Method Box.get:()Ljava/lang/Object;
9: astore_2
10: return
LineNumberTable:
line 4: 0
line 5: 5
line 6: 10
BytecodeMapping:
Code_idx Signature
1: LBox<TZ;>;::TZ;
4: TZ;
6: LBox<TZ;>;::()TZ;
9: TZ;