Q-Types in L-World 10

Q-Types in L-World 10

John Rose, September 2018

Types in narrative specification

An L-type is a reference to a named class or interface type. (Its spelling in bytecode will begin with capital L, hence the name. It could more properly be called R-type, for a reference.)

(An array type is a reference to an array, whose element type may be any field type. Array types are reference types but not L-types, in the terms of this document. In narrative we can say “array of T” or “rank N array of T”, thus “array of Q-Val” is an array type whose element type is a Q-type.)

For any given named class or interface Foo we say that L-Foo is the L-type of Foo.

As dictated by today’s JVMS, any variable of L-type will accept null.

A Q-type is a reference to a value instance. All value instances have a named class, which is incorporated into the spelling of the Q-type. (For regularity we say “reference to a value”, but the reference is not detectable to the user; there are no identity or aliasing relations between values.)

For any given named value class Val we say that Q-Val is the Q-type of Val.

A well-formed Q-type name must incorporate the spelling of a class which resolves (in the context of uttering the Q-type) into a value class declaration. Thus, Q-Object and Q-Comparable and Q-String are not well-formed type names.

Value sets

The value set of an L-type is all references to objects whose classes “appropriately match” the name of the L-type. Appropriate matching depends in detail on context and use of the L-type, but for classes (both object and value classes), appropriate matching means that the L-type can only refer to objects whose classes or super classes are named (perhaps after context-specific resolution of the name) by the L-type. (As is well-known, L-types of interfaces are not as constraining and do not entail a subtype match.)

The value set of a Q-type is all references to instances of the value class named by the Q-type (perhaps after context-specific resolution of the name). The value set of any Q-type is therefore identical the value set of the L-type of the name name, except that the Q-type does not include null while the L-type does include null.

Null rejection

Variables of Q-type are said to reject null. The details of null rejection are subject to revision, but the intended effect is for attempts to pass null into a Q-type variable to elicit NullPointerException. For example:

Types in bytecode descriptors

A bytecode descriptor is a string (backed by a CONSTANT_Utf8) which when parsed denotes one or more types for methods, fields, and the like.

A Q-descriptor is a string (backed by a CONSTANT_Utf8) which when parsed denotes a Q-type. There is always enough contextual information to check that that the class named by the Q-type is in fact a value class.

Types in the verifier

Q-types appear in the verifier type system and are verified as follows:

There may be corresponding verifier assignabilities for arrays of Q-type, or not (see below). It is not logically necessary for the verifier to allow subtyping relations between arrays of Q-type and other types (other than simple L-Object, the super of all arrays).

The null verifier type is not assignable to any Q-type.

Spellings

A Q-descriptor is spelled by appending a fixed string, called Q_SUFFIX to the corresonding L-descriptor.

SPELLING[Q-Val] := SPELLING[L-Val] + Q_SUFFIX
SPELLING[rank-N-array-of-Q-Val] := SPELLING[rank-N-array-of-L-Val] + Q_SUFFIX

The Q_SUFFIX is defined as the four characters /$Q;. This string is chosen to be unambiguously distinct from any related spelling, and also to be somewhat future proof, anticipating other likely extensions of the descriptor syntax.

Arrays

The type array-of-Q-type is a subtype of array of Object and any interface implemented by the value class of the Q-type.

The type array-of-L-type is, as usual, a subtype of array of Object and any interface implemented by the value class of the L-type.

Optionally, an array-of-Q-type can be a subtype of array-of-L-type. This is based on the observation that Q-Val <: L-Val (because of value sets), and that in Java array types are covariant.

In any case, when an array of Q-type is viewed as an array of L-type (such as L-Object), nulls will be dynamically rejected if storing a null is attempted. The exception is NPE, analogously to ASE.

As is the case today, no array conversions involving Q-types will ever create a new instance of an array. Array conversions are always “view in place” operations.

It is an option to disconnect arrays of Q-types from any subtyping relations with arrays of L-types. This would lead to an anomaly that Q-Val <: L-Val but not Q-Val[] <: L-Val[], a violation of the usual (and useful) rule that Java arrays are covariantly typed.

Separately, it is an option to disallow arrays of type L-Val[], on the grounds that flat arrays of values are useful (like int[]), but no-flat arrays of values are not so useful (like Integer[]).

It seems likely that an upcoming experimental version of the JVM will allow both kinds of arrays, but not allow “cross-kind” array covariance, such as from Q-Val[] to L-Object[]. This behavior emulates the current behavior of int[] relative to Object[], so it may be tolerable to users to continue to keep those array types separate. The JVM implementation costs are likely to be much smaller if flattened arrays are never presented to aaload and aastore bytecodes, and disallowing “cross-kind” array subtyping keeps that desirable isolation.

Q-types as class Q-names

A CONSTANT_Class entry in the constant pool can mention a Q-type as well as an L-type. The rule of deriving a Q-type is approximately the same as with descriptors: Append semicolon, then Q_SUFFIX. In neither case is the introductory letter L used; the name starts with a package prefix in Q-names as as well as L-names. (The semicolon is not strictly necessary, but may help simplify mappings between class names and descriptors.)

Q-types as class Q-mirrors

For use in reflection, there is a disctinction between Q-types and L-types, something like there is today between primitives and their wrappers. Thus, each loaded value class is represented by an L-mirror and a Q-mirror.

Calling Object.getClass on a value instance (probably) produces the L-mirror (for maximum compatibility). Calling Class.forName on a legacy class name produces the L-mirror (for similar reasons). It is probably useful that Class.forName on a suffixed legacy name should produce the Q-mirror.

Calling Class.getComponentType on the mirror of an array of Q-type produces a Q-type, thereby reifying the flat, null-rejecting character of such arrays.

When reflecting on the fields and methods of a class, the distinctions in the original classfile between L-descriptors and Q-descriptors surface regularly as L-mirrors and Q-mirrors, something like today’s distinction between int.class and Integer.class.

Translation strategy

It is a simplifying move in the JVM to have Q-Val <: L-Val, since this means that there is only one carrier type (references) with slightly differing rules for nullability of the two “type modes”. The language, however, may choose to make a stronger distinction between kinds, disallowing subtyping relations between “a value” and “that value’s box”. This might be done to preserve aspects of the existing model for primitives, where int and Integer are related types, but neither is a subtype of the other.

When a source compiler determines that a type name denotes an L-type, it emits a L-descriptor and/or L-name; likewise with Q-types. It’s that simple. This means that most occurrences of value types in class files are likely to contain the Q_SUFFIX.

Because of the JVM’s simple descriptor matching rules, a method name may be overloaded to take differing combinations of L-types and Q-types on the same classes. There is no attempt (at present) to auto-bridge between methods of distinct but similar descriptors. The close similarity of Q-types and L-types makes such bridging relatively easy to consider in the future.

Operations

All bytecodes which operate on L-types (either as L-names or L-descriptors) operate also on the corresponding Q-types, unless there is a logical reason to forbid such an operation in the case of a particular bytecode. Such cases include:

It is likely that both flattened and unflattened (nullable) arrays will be supported for value types. (This is assumed above.) In that case, the array creation bytecodes, when applied to array-of-Q-type, produce flattened arrays, while the same bytecodes, applied to corresponding L-types, produce unflattened (nullable) arrays.

As noted above, the verifier type system does not allow L-types to assign to Q-types. The checkcast bytecode reverses the assignment, at the cost of a runtime check, and NPE if null is detected. This is a regular, conservative extension of the previous role of checkcast.