An L-type is a reference to a named class or interface type. (Its spelling in bytecode will begin with capital L, hence the name. It could more properly be called R-type, for a reference.)
An array type is a reference to an array, whose element type may be any field type. Array types are reference types but not L-types, in the terms of this document. In narrative we can say “array of T” or “T[]” for short. Thus “array of L-Foo” or “L-Foo[]” is an array type whose element type is a the L-type Foo.
For any given named class or interface Foo we say that L-Foo is the
L-type of Foo. When the type is obviously not a value type we may
omit the “L-” prefix, thus Object
and Object[]
for L-Object
and
L-Object[]
.
As dictated by today’s JVMS, any variable of L-type will accept null.
A Q-type is a reference to a value instance. All value instances have a named class, which is incorporated into the spelling of the Q-type. (For regularity we say “reference to a value”, but the reference is not detectable to the user; there are no identity or aliasing relations between values.)
For any given named value class Val we say that Q-Val
is the Q-type
of Val.
In addition to referring to value instances, variables of Q-types are never null. We say that Q-types are non-nullable, where L-types are nullable.
The JVM carefully tracks the distinction between L-types and Q-types and ensures that a Q-type variable will never contain a null, either by assignment or with an initial value. As usual, any L-type variable is allowed to contain null if a null is assigned, or present as an initial default value.
In addition, a well-formed Q-type name must incorporate the spelling of a class which resolves (in the context of uttering the Q-type) into a value class declaration. Thus, Q-Object and Q-Comparable and Q-String are not well-formed type names. The JVM makes sure to load and validate classes used in Q-types before those types are allowed to affect the JVM’s processing of data.
The value set of an L-type is all references to objects whose classes “appropriately match” the name of the L-type. Appropriate matching depends in detail on context and use of the L-type, but for classes (both object and value classes), appropriate matching means that the L-type can only refer to objects whose classes or super classes are named (perhaps after context-specific resolution of the name) by the L-type. (As is well-known, L-types of interfaces are not as constraining and do not entail a subtype match.)
The value set of a Q-type is all references to instances of the value class named by the Q-type (perhaps after context-specific resolution of the name). The value set of any Q-type is therefore identical the value set of the L-type of the name name, except that the Q-type does not include null while the L-type does include null.
Nullability or non-nullability, therefore, is most directly a property of a value set, and by extension a container which accommodates a particular value set, or a type assigned to such a container. Nullability or non-nullability is never a property of any particular value of a container. It is a syntax error to speak of things like a “nullable object” or “non-nullable value”. Thus, nullability applies to variables, to containers of values, not individual values.
Variables of Q-type are said to reject null. The details of null
rejection are complex and subject to revision in detail, but the
intended effect is for attempts to pass null into a Q-type variable to
elicit a NullPointerException
(NPE). For example:
Assigning one Q-type variable to another Q-type variable does not require any special processing, assuming the Q-types are of the same class. Again, assuming identical value classes, assigning an Q-type variable to a L-type variable is also an empty operation, since the value set of the Q-type is a subset of the corresponding L-type. But assigning an L-type variable to a Q-type variable requires an explicit check of some sort, and may elicit an NPE if the check fails.
A bytecode descriptor is a string (backed by a CONSTANT_Utf8
) which
when parsed denotes one or more types for methods, fields, and the
like.
A Q-descriptor is a string (backed by a CONSTANT_Utf8
) which
when parsed denotes a Q-type. There is always enough contextual
information to validate that the class named by the Q-type is in fact
a value class.
A Q-descriptor is spelled by replacing the leading letter 'L'
in the
corresponding L-descriptor with the letter 'Q'
.
DESCRIPTOR[L-Val] := 'L' + CLASSFILE_NAME[Val] + ';'
DESCRIPTOR[Q-Val] := 'Q' + CLASSFILE_NAME[Val] + ';'
DESCRIPTOR[rank-N-array-of-L-Val] := ('[')**N DESCRIPTOR[L-Val]
DESCRIPTOR[rank-N-array-of-Q-Val] := ('[')**N DESCRIPTOR[Q-Val]
Using this spelling has implications for parsing descriptors. In
particular, existing code may determine if a descriptor requires a
managed reference if and only if its first character is 'L'
or
'['
. With the new spelling of Q-types, a descriptor requires a
managed reference if and only if it begins with 'L'
, 'Q'
, or
'['
.
Although we are introducing a new descriptor letter, we are not introducing a new fundamental type to the JVM. (Previous versions of the Valhalla design did do this; the present design is intended to be a refinement which is relatively simpler and better factored.) A Q-type is not different in any way from the corresponding L-type, except that the Q-type rejects nulls and the L-type allows them.
In particular, the value stored inside a Q-type or L-type variable is the same kind of JVM entity, regardless of whether which mode of variable (Q or L) it is stored in.
Q-types appear in the verifier type system and are verified as follows:
Note that the only allowed superclass of a value type is Object, which
means that the middle rule is really just Q-Val <: L-Object
.
If the JVM were to support cross-kind array subtype relations (such as
Q-Val[] <: L-Object[]
), then the verifier would allow additional
assignability relations, of the same form. Failing that, the verifier
rules for array assignment are similar to those for primitive array
types like int[]
. Thus, neither int[]
nor Q-Val[]
are subtypes
of Object[]
.
Since L-Val <: L-Object
, it is already the case, due to existing
verifier rules, that L-Val[] <: L-Object[] <: L-Object
.
It is not logically necessary for the verifier to allow subtyping relations between arrays of Q-type and other array types. Such relations may be added later, if necessary. See below for examples.
The null verifier type is not assignable to any Q-type.
The type array-of-L-type is, as usual, a subtype of array of Object and any interface implemented by the value class of the L-type.
As noted above in the case of the verifier’s type system, it is not
logically necessary for the type array-of-Q-Val to be a subtype of any
other array type, such as array-of-L-Val or Object[]
. Such a
subtype relation, of the form array-of-Q-Val <: array-of-L-X
(for
any X
where Val <: X
) may be called a cross-kind array subtype
relation. Such relations add some complexity to the JVM, although
they may perhaps add useful features to the end-user experience. We
will try to leave out these features at first.
Whether or not any cross-kind array subtype relations are supported,
the aaload
and aastore
bytecodes (as well as arraylength
of
course) will be overloaded to support array-of-Q-Val
objects. The
verifier will allow aaload
to apply to both kinds of arrays, even if
it does not allow assignment between the two kinds of arrays. The
situation here is similar to the support of boolean[]
and byte[]
arrays. These two “small array types” are not related as verifier
types, but are both served by baload
and bastore
. JVM
implementors may choose to accelerate aaload
and aastore
for
array-of-Q-Val operands may rewriting their opcodes at class load
time, so that the two type-dependent behaviors are more distinctly
separated.
As it is, with no cross-kind relations, the user model for
array-of-Q-Val is quite similar to array-of-int, in that an int[]
array cannot be worked on under the Object[]
type. This seems
natural: a value “works like an int”, so a flat value array should
work like an int array. A non-flat value array (type L-Val[]
) is,
by contrast, an array of boxed values, and it works like Integer[]
.
As such it is a subtype of Object[]
.
As a future option, if user model experiments motivate the extra
complexity, the array type Q-Val[]
, unlike the type int[]
, may
become a subtype of Object[]
and X[]
, where X is any interface
implemented by Val. The effect of this would be that aaload
and
aastore
instructions would graduate to be dynamically polymorphic,
rather than statically overloaded across array kinds.
As a similar future option, again if user model experiments motivate
extra JVM complexity, the type Q-Val[]
could be made a subtype of
L-Val[]
, unlike the non-relation between int[]
and Integer[]
.
The effect on aaload
and aastore
would be about the same as in the
subtype relation Q-Val[] <: L-Object[]
. In fact Q-Val[] <: L-Val[]
implies Q-Val[] <: L-Object[]
by transitivity, but not vice
versa.
In any case, if an array of Q-type is ever viewed, via cross-kind
subtyping, as an array of L-type (such as L-Object[]
), nulls will
always be dynamically rejected if storing a null is attempted. The
exception thrown is NullPointerException
(NPE), analogously to
ArrayStoreException
(ASE).
As is the case today, no array conversions involving Q-types will ever create a new instance of an array. Array conversions are always “view in place” operations.
It is an option to disconnect arrays of Q-types from any subtyping
relations with arrays of L-types. This would lead to an anomaly that
Q-Val <: L-Val
but not Q-Val[] <: L-Val[]
, a violation of the
usual (and useful) rule that Java arrays are covariantly typed.
Separately, it is an option to disallow arrays of type L-Val[]
, on
the grounds that flat arrays of values are useful (like int[]
),
but no-flat arrays of values are not so useful (like Integer[]
).
It seems likely that an upcoming experimental version of the JVM will
allow both kinds of arrays, but not allow “cross-kind” array
covariance, such as from Q-Val[]
to L-Object[]
. This behavior
emulates the current behavior of int[]
relative to Object[]
,
so it may be tolerable to users to continue to keep those array
types separate. The JVM implementation costs are likely to be
much smaller if flattened arrays are never presented to aaload
and aastore
bytecodes, and disallowing “cross-kind” array
subtyping keeps that desirable isolation.
A CONSTANT_Class
entry in the constant pool can mention a Q-type as
well as an L-type. The rule of deriving a Q-type is approximately the
same as with array types: Just use the descriptor as if it were a
class name. The encoding rules are not symmetric, but they are
unambiguous, since class names never contain a semicolon ';'
.
TYPENAME[L-Val] := CLASSFILE_NAME[Val]
TYPENAME[Q-Val] := DESCRIPTOR[Q-Val]
TYPENAME[rank-N-array-of-L-Val] := DESCRIPTOR[rank-N-array-of-L-Val]
TYPENAME[rank-N-array-of-Q-Val] := DESCRIPTOR[rank-N-array-of-Q-Val]
We are aware that we are somewhat abusing the original intent of the
CONSTANT_Class_info
constant pool entry, by adding Q-descriptors to
its repertoire as well as the already-entrenched array descriptors.
The alternative would be to specify a new constant pool type, not
inconveniently overloaded with ever-more-baroque spellings for more
things that are not classes. However, this would complicate the JVM
specification in other ways; on balance overloading pre-existing
constant pool tags seems a simpler way to go.
Although it would seem excessively subtle to rely on a trailing
semicolon to change the interpretation of a string in a
CONSTANT_Class
entry, we observe that today’s JVM’s already detect
such trailing semicolons as part of structural checking during class
file parsing. Thus, no new passes are required to detect Q-types in
the constant pool.
If JVM implementations wish, they can choose to internally rewrite constant pool tags to distinguish Q-type constants from L-type constants, when the required structural checks detect that a Q-type has been mentioned.
For use in reflection, there is a disctinction between Q-types and L-types, something like there is today between primitives and their wrappers. Thus, each loaded value class is represented by an L-mirror and a Q-mirror.
Calling Object.getClass
on a value instance produces the L-mirror
(for maximum compatibility). Calling Class.forName
on a legacy
class name produces the L-mirror (for similar reasons).
Calling Class.getName
on a Q-mirror should produce something
different from the value class’s name. What should it produce? For
guidance on this and many other questions, we look to our design
principle, “codes like a class, works like an int”, observing that
although java.lang.Integer
is the name of the box of an int value,
the mirror which represents the unboxed value is some ad hoc string
that doesn’t refer to a real class. Thus, Class.getName
on a
Q-mirror for class pkg.Val
should return something like
pkg.Val/val
or something else that is obviously not a real class
name, yet is related to the value class Val
.
One might also ask whether Class.forName
on any particular string
should produce the Q-mirror of a value class. Appealing to “works
like an int”, we say that this is not the case. Instead, there is a
programmatic way to get a Q-mirror. We will add API points to Class
which allow conversion between L-mirrors and Q-mirrors. These are
likely to be:
Class.asValueType
=> given either the Q-mirror or L-mirror, yields the Q-mirrorClass.asBoxType
=> given either the Q-mirror or L-mirror, yields the L-mirrorFor mirrors of primitives and their box types, these access points may usefully return the corresponding alternative mirror, as if a primitive were a value type and its box type were that value type’s box type.
For mirrors of classes which are neither value types or primitive box
types, these access points should yield null or throw or (in the case
of asBoxType
) return the same class mirror.
For arrays, calling Class.getComponentType
on the mirror of an array
of Q-type produces a Q-type, thereby reifying the flat, null-rejecting
character of such arrays.
When reflecting on the fields and methods of a class, the distinctions
in the original classfile between L-descriptors and Q-descriptors
surface regularly as L-mirrors and Q-mirrors, something like today’s
distinction between int.class
and Integer.class
, thus again
upholding the principle of “works like an int”.
It is a simplifying move in the JVM to have Q-Val <: L-Val
, since
this means that there is only one carrier type (references) with
slightly differing rules for nullability of the two “type modes”. The
language, however, may choose to make a stronger distinction between
kinds, disallowing subtyping relations between “a value” and “that
value’s box”. This might be done to preserve aspects of the existing
model for primitives, where int
and Integer
are related types,
but neither is a subtype of the other.
When a source compiler determines that a type name denotes an L-type, it emits a L-descriptor and/or L-name; likewise with Q-types. It’s that simple. This means that most occurrences of value types in class files are likely to contain Q instead of L.
Because of the JVM’s simple descriptor matching rules, a method name may be overloaded to take differing combinations of L-types and Q-types on the same classes. There is no attempt (at present) to auto-bridge between methods of distinct but similar descriptors. The close similarity of Q-types and L-types makes such bridging relatively easy to consider in the future.
All bytecodes which operate on L-types (either as L-names or L-descriptors) operate also on the corresponding Q-types, unless there is a logical reason to forbid such an operation in the case of a particular bytecode. Such cases include:
new
(new-instance)It is likely that both flattened and unflattened (nullable) arrays will be supported for value types. (This is assumed above.) In that case, the array creation bytecodes, when applied to array-of-Q-type, produce flattened arrays, while the same bytecodes, applied to corresponding L-types, produce unflattened (nullable) arrays.
As noted above, the verifier type system does not allow L-types to
assign to Q-types. The checkcast
bytecode reverses the assignment,
at the cost of a runtime check, and NPE if null is detected. This is
a regular, conservative extension of the previous role of checkcast
.
Before a Q-descriptor can be used (for some contextual purpose), it must be validated as well-formed, ensuring that the class named by the descriptor is loaded (or can now be loaded), and that the loaded class in fact is a value class.
The cases where this is done are:
If a class declares any non-static field whose type descriptor is a Q-descriptor, the class named by the Q-descriptor is loaded, if necessary, and the JVM ensures that it is in fact a value type. This loading happens during the same linkage phase that the containing class loads its super class and interfaces, and checks for circularities along super-type dependencies.
In addition, the JVM detects if loading this field value class (i.e., the class named by the Q-descriptor for non-static field) directly or indirectly triggers loading of the current class. If there is a circularity in these loading requirements, the JVM cancels the class load and instead throws a linkage error for the circularity. This prevents value types from directly or indirectly including non-static fields of their own type.
If a class declares a method whose return type or parameter type descriptor is a Q-descriptor, or if the class declares a static field whose descriptor is a Q-descriptor, then class named by that descriptor is loaded (as necessary) and validated as a value class. This loading must occur before the first linkage of a bytecode which refers to the declared method or field (or may refer to the declared method by an override relation). The loading of classes named by Q-descriptors may be done earlier, at the discretion of the JVM. However, except for non-static fields, any early loading must not elicit class circularity errors.
The preparation phase of class linking will typically allocate space for static fields and method dispatch tables. In doing so, it must not require that Q-types of static fields or methods are loaded, because that could lead to unwanted circularity errors. When a class is linked, it may safely load the Q-types of static fields or methods, and perform any final configuration of static variables or calling sequences.
It is likely that JVM implementations will use hidden indirections for static variables of Q-types, so that they can be initialized to placeholder values (null) at preparation time, and later at link time load the Q-types and allocate the correct amount of storage for each Q-typed static variable. Since the statics of a class cannot be accessed until its initialization has begun, it is safe to initialize a Q-typed static (to its default value, of course) during the linking phase of the class declaring the static.
Some bytecodes use symbolic references, stored in the containing
class’s constant pool, to refer to types, fields, and methods in
various classes. Such bytecodes may be called “nominal” or “symbolic
referencing”, because they make use of names, and those names must be
resolved during a linking phase before the bytecode is first executed.
Nominal bytecodes which refer to fields and methods, which use
Q-types, are discussed above. There are a few additional nominal
bytecodes, such as instanceof
, checkcast
, and anewarray
, which
may also refer to Q-types. These bytecodes, when they resolve their
symbolic references, must always validate that each Q-type resolved
refers to a loaded valid class.
The instanceof
and checkcast
bytecodes perform dynamic type
checks. These bytecodes may operate on either Q-types or L-types, and
they adjust their behavior accordingly. The behavior on L-types is
already defined by the JVMS. The behavior on Q-types is slightly more
restrictive in the case of checkcast
. If a null value is presented
to checkcast
of a Q-type, a NPE is thrown. In all other cases, the
behaviors of checkcast
and instanceof
are the same on L-types and
Q-types.
A third bytecode, aastore
, also incorporates dynamic type checks as
part of its so-called array store check. In that case, the Q-type or
L-type is the element type, as dynamically derived from the array
being stored into. If the array element type is a Q-type, it’s a flat
array, and NPE is thrown as part of the array store check, if a null
value is presented to aastore
.
In all three cases of dynamic type checks, the JVM will need to add an
extra bit somewhere which tells the type-checking bytecode whether
nulls are to be rejected. In the case of arrays, the bit must be
derived from the array, probably cached near its element class. In
the case of the nominal instructions instanceof
and checkcast
, the
bit can be derived when the instruction is linked, perhaps in the
linkage state associated with the resolved constant pool entry.