Q-Types in L-World 10

Valhalla Working Group, Burlington, September 2018

Types in narrative specification

An L-type is a reference to a named class or interface type. (Its spelling in bytecode will begin with capital L, hence the name. It could more properly be called R-type, for a reference.)

An array type is a reference to an array, whose element type may be any field type. Array types are reference types but not L-types, in the terms of this document. In narrative we can say “array of T” or “T[]” for short. Thus “array of L-Foo” or “L-Foo[]” is an array type whose element type is a the L-type Foo.

For any given named class or interface Foo we say that L-Foo is the L-type of Foo. When the type is obviously not a value type we may omit the “L-” prefix, thus Object and Object[] for L-Object and L-Object[].

As dictated by today’s JVMS, any variable of L-type will accept null.

Here come Q-types

A Q-type is a reference to a value instance. All value instances have a named class, which is incorporated into the spelling of the Q-type. (For regularity we say “reference to a value”, but the reference is not detectable to the user; there are no identity or aliasing relations between values.)

For any given named value class Val we say that Q-Val is the Q-type of Val.

In addition to referring to value instances, variables of Q-types are never null. We say that Q-types are non-nullable, where L-types are nullable.

The JVM carefully tracks the distinction between L-types and Q-types and ensures that a Q-type variable will never contain a null, either by assignment or with an initial value. As usual, any L-type variable is allowed to contain null if a null is assigned, or present as an initial default value.

In addition, a well-formed Q-type name must incorporate the spelling of a class which resolves (in the context of uttering the Q-type) into a value class declaration. Thus, Q-Object and Q-Comparable and Q-String are not well-formed type names. The JVM makes sure to load and validate classes used in Q-types before those types are allowed to affect the JVM’s processing of data.

Value sets

The value set of an L-type is all references to objects whose classes “appropriately match” the name of the L-type. Appropriate matching depends in detail on context and use of the L-type, but for classes (both object and value classes), appropriate matching means that the L-type can only refer to objects whose classes or super classes are named (perhaps after context-specific resolution of the name) by the L-type. (As is well-known, L-types of interfaces are not as constraining and do not entail a subtype match.)

The value set of a Q-type is all references to instances of the value class named by the Q-type (perhaps after context-specific resolution of the name). The value set of any Q-type is therefore identical the value set of the L-type of the name name, except that the Q-type does not include null while the L-type does include null.

Nullability or non-nullability, therefore, is most directly a property of a value set, and by extension a container which accommodates a particular value set, or a type assigned to such a container. Nullability or non-nullability is never a property of any particular value of a container. It is a syntax error to speak of things like a “nullable object” or “non-nullable value”. Thus, nullability applies to variables, to containers of values, not individual values.

Null rejection

Variables of Q-type are said to reject null. The details of null rejection are complex and subject to revision in detail, but the intended effect is for attempts to pass null into a Q-type variable to elicit a NullPointerException (NPE). For example:

If an array has Q-type elements, storing a null will elicit NPE.
If a field has a Q-type, storing a null will elicit NPE.
A checkcast instruction of Q-type applied to a null, elicits an NPE.
If a method parameter has a Q-type, passing a null will elicit NPE between the caller’s method invocation instruction and callee method entry, if the caller is untyped (reflective or method handle)
If a method return has a Q-type, returning a null will elicit NPE between the callee’s method return instruction and caller method resumption, if the caller is untyped (reflective or method handle)

Assigning one Q-type variable to another Q-type variable does not require any special processing, assuming the Q-types are of the same class. Again, assuming identical value classes, assigning an Q-type variable to a L-type variable is also an empty operation, since the value set of the Q-type is a subset of the corresponding L-type. But assigning an L-type variable to a Q-type variable requires an explicit check of some sort, and may elicit an NPE if the check fails.

Types in bytecode descriptors

A bytecode descriptor is a string (backed by a CONSTANT_Utf8) which when parsed denotes one or more types for methods, fields, and the like.

A Q-descriptor is a string (backed by a CONSTANT_Utf8) which when parsed denotes a Q-type. There is always enough contextual information to validate that the class named by the Q-type is in fact a value class.

Spellings

A Q-descriptor is spelled by replacing the leading letter 'L' in the corresponding L-descriptor with the letter 'Q'.

DESCRIPTOR[L-Val] := 'L' + CLASSFILE_NAME[Val] + ';'
DESCRIPTOR[Q-Val] := 'Q' + CLASSFILE_NAME[Val] + ';'
DESCRIPTOR[rank-N-array-of-L-Val] := ('[')**N DESCRIPTOR[L-Val]
DESCRIPTOR[rank-N-array-of-Q-Val] := ('[')**N DESCRIPTOR[Q-Val]

Using this spelling has implications for parsing descriptors. In particular, existing code may determine if a descriptor requires a managed reference if and only if its first character is 'L' or '['. With the new spelling of Q-types, a descriptor requires a managed reference if and only if it begins with 'L', 'Q', or '['.

Although we are introducing a new descriptor letter, we are not introducing a new fundamental type to the JVM. (Previous versions of the Valhalla design did do this; the present design is intended to be a refinement which is relatively simpler and better factored.) A Q-type is not different in any way from the corresponding L-type, except that the Q-type rejects nulls and the L-type allows them.

In particular, the value stored inside a Q-type or L-type variable is the same kind of JVM entity, regardless of whether which mode of variable (Q or L) it is stored in.

Types in the verifier

Q-types appear in the verifier type system and are verified as follows:

A Q-type Q-Val is assignable to the corresponding L-Val
A Q-type Q-Val is assignable to a L-type for any superclass of Val
A Q-type Q-Val is assignable to any L-type for any interface

Note that the only allowed superclass of a value type is Object, which means that the middle rule is really just Q-Val <: L-Object.

If the JVM were to support cross-kind array subtype relations (such as Q-Val[] <: L-Object[]), then the verifier would allow additional assignability relations, of the same form. Failing that, the verifier rules for array assignment are similar to those for primitive array types like int[]. Thus, neither int[] nor Q-Val[] are subtypes of Object[].

Since L-Val <: L-Object, it is already the case, due to existing verifier rules, that L-Val[] <: L-Object[] <: L-Object.

It is not logically necessary for the verifier to allow subtyping relations between arrays of Q-type and other array types. Such relations may be added later, if necessary. See below for examples.

The null verifier type is not assignable to any Q-type.

Arrays

The type array-of-L-type is, as usual, a subtype of array of Object and any interface implemented by the value class of the L-type.

As noted above in the case of the verifier’s type system, it is not logically necessary for the type array-of-Q-Val to be a subtype of any other array type, such as array-of-L-Val or Object[]. Such a subtype relation, of the form array-of-Q-Val <: array-of-L-X (for any X where Val <: X) may be called a cross-kind array subtype relation. Such relations add some complexity to the JVM, although they may perhaps add useful features to the end-user experience. We will try to leave out these features at first.

Whether or not any cross-kind array subtype relations are supported, the aaload and aastore bytecodes (as well as arraylength of course) will be overloaded to support array-of-Q-Val objects. The verifier will allow aaload to apply to both kinds of arrays, even if it does not allow assignment between the two kinds of arrays. The situation here is similar to the support of boolean[] and byte[] arrays. These two “small array types” are not related as verifier types, but are both served by baload and bastore. JVM implementors may choose to accelerate aaload and aastore for array-of-Q-Val operands may rewriting their opcodes at class load time, so that the two type-dependent behaviors are more distinctly separated.

As it is, with no cross-kind relations, the user model for array-of-Q-Val is quite similar to array-of-int, in that an int[] array cannot be worked on under the Object[] type. This seems natural: a value “works like an int”, so a flat value array should work like an int array. A non-flat value array (type L-Val[]) is, by contrast, an array of boxed values, and it works like Integer[]. As such it is a subtype of Object[].

As a future option, if user model experiments motivate the extra complexity, the array type Q-Val[], unlike the type int[], may become a subtype of Object[] and X[], where X is any interface implemented by Val. The effect of this would be that aaload and aastore instructions would graduate to be dynamically polymorphic, rather than statically overloaded across array kinds.

As a similar future option, again if user model experiments motivate extra JVM complexity, the type Q-Val[] could be made a subtype of L-Val[], unlike the non-relation between int[] and Integer[]. The effect on aaload and aastore would be about the same as in the subtype relation Q-Val[] <: L-Object[]. In fact Q-Val[] <: L-Val[] implies Q-Val[] <: L-Object[] by transitivity, but not vice versa.

In any case, if an array of Q-type is ever viewed, via cross-kind subtyping, as an array of L-type (such as L-Object[]), nulls will always be dynamically rejected if storing a null is attempted. The exception thrown is NullPointerException (NPE), analogously to ArrayStoreException (ASE).

As is the case today, no array conversions involving Q-types will ever create a new instance of an array. Array conversions are always “view in place” operations.

It is an option to disconnect arrays of Q-types from any subtyping relations with arrays of L-types. This would lead to an anomaly that Q-Val <: L-Val but not Q-Val[] <: L-Val[], a violation of the usual (and useful) rule that Java arrays are covariantly typed.

Separately, it is an option to disallow arrays of type L-Val[], on the grounds that flat arrays of values are useful (like int[]), but no-flat arrays of values are not so useful (like Integer[]).

It seems likely that an upcoming experimental version of the JVM will allow both kinds of arrays, but not allow “cross-kind” array covariance, such as from Q-Val[] to L-Object[]. This behavior emulates the current behavior of int[] relative to Object[], so it may be tolerable to users to continue to keep those array types separate. The JVM implementation costs are likely to be much smaller if flattened arrays are never presented to aaload and aastore bytecodes, and disallowing “cross-kind” array subtyping keeps that desirable isolation.

Q-types as class Q-names

A CONSTANT_Class entry in the constant pool can mention a Q-type as well as an L-type. The rule of deriving a Q-type is approximately the same as with array types: Just use the descriptor as if it were a class name. The encoding rules are not symmetric, but they are unambiguous, since class names never contain a semicolon ';'.

TYPENAME[L-Val] := CLASSFILE_NAME[Val]
TYPENAME[Q-Val] := DESCRIPTOR[Q-Val]
TYPENAME[rank-N-array-of-L-Val] := DESCRIPTOR[rank-N-array-of-L-Val]
TYPENAME[rank-N-array-of-Q-Val] := DESCRIPTOR[rank-N-array-of-Q-Val]

We are aware that we are somewhat abusing the original intent of the CONSTANT_Class_info constant pool entry, by adding Q-descriptors to its repertoire as well as the already-entrenched array descriptors. The alternative would be to specify a new constant pool type, not inconveniently overloaded with ever-more-baroque spellings for more things that are not classes. However, this would complicate the JVM specification in other ways; on balance overloading pre-existing constant pool tags seems a simpler way to go.

Although it would seem excessively subtle to rely on a trailing semicolon to change the interpretation of a string in a CONSTANT_Class entry, we observe that today’s JVM’s already detect such trailing semicolons as part of structural checking during class file parsing. Thus, no new passes are required to detect Q-types in the constant pool.

If JVM implementations wish, they can choose to internally rewrite constant pool tags to distinguish Q-type constants from L-type constants, when the required structural checks detect that a Q-type has been mentioned.

Q-types as class Q-mirrors

For use in reflection, there is a disctinction between Q-types and L-types, something like there is today between primitives and their wrappers. Thus, each loaded value class is represented by an L-mirror and a Q-mirror.

Calling Object.getClass on a value instance produces the L-mirror (for maximum compatibility). Calling Class.forName on a legacy class name produces the L-mirror (for similar reasons).

Calling Class.getName on a Q-mirror should produce something different from the value class’s name. What should it produce? For guidance on this and many other questions, we look to our design principle, “codes like a class, works like an int”, observing that although java.lang.Integer is the name of the box of an int value, the mirror which represents the unboxed value is some ad hoc string that doesn’t refer to a real class. Thus, Class.getName on a Q-mirror for class pkg.Val should return something like pkg.Val/val or something else that is obviously not a real class name, yet is related to the value class Val.

One might also ask whether Class.forName on any particular string should produce the Q-mirror of a value class. Appealing to “works like an int”, we say that this is not the case. Instead, there is a programmatic way to get a Q-mirror. We will add API points to Class which allow conversion between L-mirrors and Q-mirrors. These are likely to be:

Class.asValueType => given either the Q-mirror or L-mirror, yields the Q-mirror
Class.asBoxType => given either the Q-mirror or L-mirror, yields the L-mirror

For mirrors of primitives and their box types, these access points may usefully return the corresponding alternative mirror, as if a primitive were a value type and its box type were that value type’s box type.

For mirrors of classes which are neither value types or primitive box types, these access points should yield null or throw or (in the case of asBoxType) return the same class mirror.

For arrays, calling Class.getComponentType on the mirror of an array of Q-type produces a Q-type, thereby reifying the flat, null-rejecting character of such arrays.

When reflecting on the fields and methods of a class, the distinctions in the original classfile between L-descriptors and Q-descriptors surface regularly as L-mirrors and Q-mirrors, something like today’s distinction between int.class and Integer.class, thus again upholding the principle of “works like an int”.

Translation strategy

It is a simplifying move in the JVM to have Q-Val <: L-Val, since this means that there is only one carrier type (references) with slightly differing rules for nullability of the two “type modes”. The language, however, may choose to make a stronger distinction between kinds, disallowing subtyping relations between “a value” and “that value’s box”. This might be done to preserve aspects of the existing model for primitives, where int and Integer are related types, but neither is a subtype of the other.

When a source compiler determines that a type name denotes an L-type, it emits a L-descriptor and/or L-name; likewise with Q-types. It’s that simple. This means that most occurrences of value types in class files are likely to contain Q instead of L.

Because of the JVM’s simple descriptor matching rules, a method name may be overloaded to take differing combinations of L-types and Q-types on the same classes. There is no attempt (at present) to auto-bridge between methods of distinct but similar descriptors. The close similarity of Q-types and L-types makes such bridging relatively easy to consider in the future.

Operations

All bytecodes which operate on L-types (either as L-names or L-descriptors) operate also on the corresponding Q-types, unless there is a logical reason to forbid such an operation in the case of a particular bytecode. Such cases include:

the bytecode new (new-instance)

It is likely that both flattened and unflattened (nullable) arrays will be supported for value types. (This is assumed above.) In that case, the array creation bytecodes, when applied to array-of-Q-type, produce flattened arrays, while the same bytecodes, applied to corresponding L-types, produce unflattened (nullable) arrays.

As noted above, the verifier type system does not allow L-types to assign to Q-types. The checkcast bytecode reverses the assignment, at the cost of a runtime check, and NPE if null is detected. This is a regular, conservative extension of the previous role of checkcast.

Validation of Q-names

Before a Q-descriptor can be used (for some contextual purpose), it must be validated as well-formed, ensuring that the class named by the descriptor is loaded (or can now be loaded), and that the loaded class in fact is a value class.

The cases where this is done are:

definition of Q-typed class fields
definition of Q-typed method parameters or return values
execution of nominal bytecodes which refer to Q-descriptors or Q-names

If a class declares any non-static field whose type descriptor is a Q-descriptor, the class named by the Q-descriptor is loaded, if necessary, and the JVM ensures that it is in fact a value type. This loading happens during the same linkage phase that the containing class loads its super class and interfaces, and checks for circularities along super-type dependencies.

In addition, the JVM detects if loading this field value class (i.e., the class named by the Q-descriptor for non-static field) directly or indirectly triggers loading of the current class. If there is a circularity in these loading requirements, the JVM cancels the class load and instead throws a linkage error for the circularity. This prevents value types from directly or indirectly including non-static fields of their own type.

If a class declares a method whose return type or parameter type descriptor is a Q-descriptor, or if the class declares a static field whose descriptor is a Q-descriptor, then class named by that descriptor is loaded (as necessary) and validated as a value class. This loading must occur before the first linkage of a bytecode which refers to the declared method or field (or may refer to the declared method by an override relation). The loading of classes named by Q-descriptors may be done earlier, at the discretion of the JVM. However, except for non-static fields, any early loading must not elicit class circularity errors.

The preparation phase of class linking will typically allocate space for static fields and method dispatch tables. In doing so, it must not require that Q-types of static fields or methods are loaded, because that could lead to unwanted circularity errors. When a class is linked, it may safely load the Q-types of static fields or methods, and perform any final configuration of static variables or calling sequences.

It is likely that JVM implementations will use hidden indirections for static variables of Q-types, so that they can be initialized to placeholder values (null) at preparation time, and later at link time load the Q-types and allocate the correct amount of storage for each Q-typed static variable. Since the statics of a class cannot be accessed until its initialization has begun, it is safe to initialize a Q-typed static (to its default value, of course) during the linking phase of the class declaring the static.

Some bytecodes use symbolic references, stored in the containing class’s constant pool, to refer to types, fields, and methods in various classes. Such bytecodes may be called “nominal” or “symbolic referencing”, because they make use of names, and those names must be resolved during a linking phase before the bytecode is first executed. Nominal bytecodes which refer to fields and methods, which use Q-types, are discussed above. There are a few additional nominal bytecodes, such as instanceof, checkcast, and anewarray, which may also refer to Q-types. These bytecodes, when they resolve their symbolic references, must always validate that each Q-type resolved refers to a loaded valid class.

Dynamic type checks

The instanceof and checkcast bytecodes perform dynamic type checks. These bytecodes may operate on either Q-types or L-types, and they adjust their behavior accordingly. The behavior on L-types is already defined by the JVMS. The behavior on Q-types is slightly more restrictive in the case of checkcast. If a null value is presented to checkcast of a Q-type, a NPE is thrown. In all other cases, the behaviors of checkcast and instanceof are the same on L-types and Q-types.

A third bytecode, aastore, also incorporates dynamic type checks as part of its so-called array store check. In that case, the Q-type or L-type is the element type, as dynamically derived from the array being stored into. If the array element type is a Q-type, it’s a flat array, and NPE is thrown as part of the array store check, if a null value is presented to aastore.

In all three cases of dynamic type checks, the JVM will need to add an extra bit somewhere which tells the type-checking bytecode whether nulls are to be rejected. In the case of arrays, the bit must be derived from the array, probably cached near its element class. In the case of the nominal instructions instanceof and checkcast, the bit can be derived when the instruction is linked, perhaps in the linkage state associated with the resolved constant pool entry.