# Minimal Value Types
#### April 2017: Minimal Edition _(v. 0.4)_
#### John Rose, Brian Goetz
_"What we do in the shadows reveals our true values."_
## Background
In the three years since the [first public proposal [values]][values]
of value types, there have been [vigorous discussions
[valhalla-dev]][valhalla-dev] of how to get there, and vigorous
prototyping in the Java compiler, classfile format, and VM. The goal
has been to unify primitives, references, and values in a common
platform that supports efficient generic, object-oriented programming.
Much of the discussion has [concentrated on generic specialization
[goetz-jvmls15]][goetz-jvmls15], as a way of implementing full
parametric polymorphism in Java and the JVM. This concentration has
been intentional and fruitful, since it exposes all the ways in which
primitives fail to align sufficiently with references, and forces us
to expand the bytecode model. After solving for `List`, it will
be simpler to manage `List>`.
Other discussions have concentrated on [details of value semantics
[valsem-0411]][valsem-0411] and specific tactics [implementing
[simms-vbcs]][simms-vbcs] new bytecodes which work with values. A few
experiments have employed value-like APIs to perform useful tasks like
[vectorizing loops [graves-jvmls16]][graves-jvmls16].
Most recently, at the JVM Language Summit (2016), and at the Valhalla
EG meeting that week, we got repeated calls for an early-access
version of value types that would be suitable for vector, Panama, and
GPU experiments. This document outlines a subset of experimental
value type support in the JVM (and to a smaller degree, language and
libraries), that would be suitable for early adopters. And we have
been prototyping many of these ideas in recent months.
Looking back, it is reasonable to estimate that there have been many
thousands of engineer-hours devoted to mapping out this complex
future. Now is the time to take this vision and choose a first
version, a sort of "hello world" system for value types.
The present document proposes a minimized but viable subset of
value-type functionality with the following goals:
* simple to implement in the HotSpot JVM (the reference implementation)
* does not constrain future developments for the Java language or VM
* usable by power-users for early experimentation and prototyping
* minimum changes to the JVM classfile format
* use of such changes can be firewalled in experimental areas only
* users can develop value-using code with standard toolchains
Our non-goals are complementary to the goals:
* does not support all known-good language constructs for value types
* does not commit to a Java language syntax or even a bytecode design
* does not support Java programmers to code value types in Java code
* does not propose a final bytecode format
* will not be deployed for general use (not initially, and likely never)
* does not require vertically integrated upgrade of developer toolchain
In other words, before releasing our values to the full light of day,
we will prototype with them in the shady area between armchair
speculation and public specification. Such a prototype, though
limited, is far from useless. It will allow us to experiment with
various approaches to the design and implementation of value types.
We can also discard approaches as needed! We can also begin to make
better estimates of performance and usability, as power-users (most of
whom will work closely with the designers and implementors) exercise
various early use cases.
## Features
The specific features of our minimum (but viable) support for value
types can be summarized as follows:
* A few **value-capable classes** (`Int128`, etc.) from which the VM
may create associated **derived value types**.
* **Descriptor** syntax ("`Q`-types") for describing new value types
in class-files.
* Enhanced **constants** in the constant pool, to interoperate with
these descriptors.
* A small set of **bytecode instructions** (`vload`, etc.) for moving value types
between JVM locals and stack.
* Limited **reflection** for value types (similar to `int.class`).
* **Boxing** and unboxing, to represent values (like primitives) in
terms of Java's universal `Object` type.
* Method handle **factories** to provide access to value operations
(member access, etc.)
The value-capable classes can be developed in today's toolchains
as _standard POJO classes_. In this mode of use,
standard Java source code, including generic classes and methods, will
be able to refer to values only in their boxed form. However, both
method handles and specially-generated bytecodes will be able to work
with values in their native, unboxed form.
This work relates to the JVM, not to the language. Therefore
non-goals include:
* Syntax for defining or using value types directly from Java code.
* Specialized generics in Java code which can store or process
unboxed values (or primitives).
* Library value types or evolved versions of value-based classes
like `java.util.Optional`.
* Access to value types from arbitrary modules. (Typically,
value-capable classes will not be exported.)
Given the slogan _"codes like a class, works like an int,"_ which
captures the overall vision for value types, this minimal set will
deliver something more like _"works like an int, if you can catch
one with a box or a handle"_.
By limiting the scope of this work, we believe useful experimentation
can be enabled in a production JVM much earlier than if the entire
value-type stack were delivered all at once.
The support for the new JVM-level features will allow
immediate prototyping of new language features and tools
which can make _direct_ use of those features. But this
minimal project does not _depend_ on such language features
or tools.
The rest of this document goes into the proposed features in detail.
## Value-capable classes
A class may be marked with a special annotation `@DeriveValueType` (or
perhaps an attribute). A class with this marking is called a
_value-capable class_ (or _VCC_ for short), meaning it can be endowed
with an associated _derived value type_ (or _DVT_), beyond the class
type itself.
The use of this annotation will be restricted in some manner, probably
unlocked by a command line option, and associated with some sort of
[incubator module [JEP-11]][JEP-11].
Example:
@jvm.internal.value.DeriveValueType
public final class DoubleComplex {
public final double re, im;
private DoubleComplex(double re, double im) {
this.re = re; this.im = im;
}
... // toString/equals/hashCode, accessors, math functions, etc.
}
The semantics of the marked class will be the same as if the annotation
were not present. But, the annotation will enable the JVM, _in addition_,
to consider the marked value-capable class as a source for an associated
derived value type.
The super-class of a value-capable class must be `Object`.
(This is similar to the full proposal, in which super-classes
are disallowed.)
A class marked as value-capable must qualify as [value-based][],
because its instances will serve as boxes for values of the associated
value type. In particular, the class, and all its fields, must be
marked `final`, and constructors must be private.
A class marked as value-capable must not use any of the methods
provided on `Object` on any instance of itself, since that would
produce indeterminate results on a boxed version of a value. The
`equals`, `hashCode`, and `toString` methods must be replaced
completely, with no call via `super` to `Object` methods.
As an exception, the `getClass` method may be used freely; it behaves
as if it were replaced in the value-capable class by a
constant-returning method.
As with all value-based classes, the other object methods (`clone`,
`finalize`, `wait`, `notify`, and `notifyAll`) should not be used
with the value-capable class. (This is left to the user to enforce
manually. In the full proposal we may find ways to enforce it more
automatically.)
In summary, the JVM will make the following structural checks on a
value capable class:
* The class must be marked `final`.
* The class is a proper class (not an interface).
* The superclass must be `Object`.
* All non-static fields must be `final`.
* It overrides the `Object` methods `equals`, `hashCode`, and `toString`.
* It does not override the `Object` methods `clone` or `finalize`.
These structural checks are performed when the JVM derives the
DVT from the VCC. The phasing of that derivation is discussed below.
Apart from the above restrictions, a value-capable class can do any of
the things normal value-based classes do, such as define constructors,
methods, fields, and nested types, implement interfaces, and define
type variables on itself or its methods. There is no particular
restriction on the types of fields.
As we will see, the derived value type will contain _only fields_. It
will contain the same set of fields as the value-capable class from
which it is derived. But the JVM will give it no methods,
constructors, nested types, or super-types. (In the full proposal, of
course, value types will "code like a class" and support all of those
features.)
>
Note that a value-capable class compiled using the standard javac
compiler will be unable to express "inline sub-value" fields which
themselves are value types; the most it will be able to do is request
fields of their associated reference types ("L-types"). An upgraded
version of javac may well be able to define sub-value fields, of true
"Q-types". Such a version of javac will give programmers the ability
to work directly with value types, bypassing the trick of deriving
a separate derived value type from a value-capable classes. Thus,
if you see a field in a value-capable class which itself is typed
as a value-capable class, it is probably an error, an unintentional
boxing of an intended inline sub-value.
Here is a larger example of a value-capable class which defines
a "super-long" derived value type:
@DeriveValueType
final class Int128 extends Comparable {
private final long x0, x1;
private Int128(long x0, long x1) { ... }
public static Int128 zero() { ... }
public static Int128 from(int x) { ... }
public static Int128 from(long x) { ... }
public static Int128 from(long hi, long lo) { ... }
public static long high(Int128 i) { ... }
public static long low(Int128 i) { ... }
// possibly array input/output methods
public static boolean equals(Int128 a, Int128 b) { ... }
public static int hashCode(Int128 a) { ... }
public static String toString(Int128 a) { ... }
public static Int128 plus(Int128 a, Int128 b) { ... }
public static Int128 minus(Int128 a, Int128 b) { ... }
// more arithmetic ops, bit-shift ops
public int compareTo(Int128 i) { ... }
public boolean equals(Int128 i) { ... }
public int hashCode() { ... }
public boolean equals(Object x) { ... }
public String toString() { ... }
}
[Similar types [Long2.java]][Long2.java] have been used in a loop
vectorization prototype. This example has been defined in a prototype
version of the `java.lang` package. But value-capable types defined
as part of this minimal proposal will _not_ appear in any standard
API. Their visibility is likely to be controlled using features of
the module system, such as incubator modules.
>
Initial value-capable classes are likely to be extensions of numeric
types like `long`. As such they should have a standard and consistent
set of arithmetic and bitwise operations. There is no such set
codified at present, and creating one is beyond the scope of the
minimal set. Eventually we will need to create a set of interfaces
that captures the common operation structure between numeric
primitives and numeric values.
#### Splitting the value type from the object type
When the JVM loads a value-capable class, it may either eagerly derive
a derived value type for it, or else set a flag on the class and
arrange to create the derived value type on demand.
(The latter is recommended.)
>
(Note: The minimal proposal may leave this ordering undetermined. In
the full version of value types, the question is moot, since the
derived value type and the value-capable class are identical.)
The value-capable class itself is not changed at all at load time. It
remains an ordinary "POJO" for a value-based class.
The corresponding derived value type is created as a
copy of the value-capable class, but with these crucial differences:
* The derived value type is marked as a value-type.
* The derived value type is given a new name derived from the value-capable class.
* All super-types from the value-capable class are removed.
* All methods and constructors from the value-capable class are removed.
* Non-static fields from the value-capable class are retained without change.
The name given to the DVT is hidden by the implementation. In all
cases the name of the VCC is used to refer to both types, and enough
contextual information is always present to resolve any ambiguity.
In bytecode descriptors the letters `Q` and `L` are used to make
the distinction, so we call the VCC a Q-type and the DVT an L-type.
The creation of the DVT must happen at some point after the loading of
the VCC, and before the first creation of an instance of the DVT.
This is enforced in the semantics of particular instructions which
trigger the initialization of the DVT in much the same way that
some instructions today (such as `getstatic` or `new`) trigger
the initialization of a normal class. Details will be given later.
Start again with our example of `DoubleComplex`:
@jvm.internal.value.DeriveValueType
public final class DoubleComplex {
public final double re, im;
...
double realPart() { return re; }
}
When the JVM decides to synthesize a derived value type for
`DoubleComplex` it makes a fresh copy with all class members stripped
out except the two `double` fields. Crucially, the JVM uses internal
magic to make the synthetic class into a value-type, not an object
type.
Inside the JVM, the resulting derived value type looks something like
this:
@jvm.internal.value.DeriveValueType
public final class L-DoubleComplex {
public final double re, im;
...
double realPart() { return $value.re; }
}
public static __ByValue class Q-DoubleComplex {
public final double re, im;
}
The hypothetical `__ByValue` keyword notes where values are defined in
place of references. Until much work has been done up and down the
stack, such a thing cannot be directly specified in source code, but
it is perfectly reasonable and useful to perform at class-load time.
Note that the derived value type has no constructors. Normally this
would be a problem, since object classes are required by the JVM to
have at least one constructor. The JVM permits this in the case of
the derived value type. (Such a constraint is not necessary with
value types in general, but that story is too long to tell here.)
In any case, the derive value type will "borrow" the constructors
of the value-capable class, as we will see in the next section.
>
This design may be called "box-first", in that the JVM loads the
box-type only, and somehow creates the value-type as a side effect.
We will end up with a more natural "value-first" design, but the
present box-first design puts the fewest constraints on tools which
read and write class-files, including the JVM and javac. So the
box-first awkwardness is the correct choice, at first.
#### Boxing, unboxing, and borrowing
The JVM internally arranges boxing and unboxing operations to convert
between the value-capable class and its derived value type. The semantics
of these operations are simple field-wise copies between the two types.
This obviously is well-defined because the field lists are identical.
The synthetic unbox operation allows the derived value type to make
indirect use of the constructors of the value-capable class. The
programmer can create a box using a constructor, and unbox it to get
the desired constructed value. The JVM just copies the fields out of
the box and discards the box. (In the full proposal, value types have
real constructors, and don't need to borrow them from their boxes.)
The synthetic box operation allows the derived value type to make
indirect use of the methods of the value-capable class. The
programmer can temporarily box a value, and invoke any of the methods
of the value-capable class, throwing away the box when the method
returns. Since the box has a short lifetime, it is likely that the
JVM can optimize it away, at least for simple methods. (In the full
proposal, value types have real methods, and don't need to borrow them
from boxes. Instead, the boxes can borrow their methods from the
values.)
Note that the synthetic box operation creates a new instance of the
value-capable class _without running a constructor_. Normally this is
a problem, but in this case the two classes are so closely linked that
it is safe to assume that any value was created (in the first place)
by unboxing a properly-constructed box. Thus, a constructor gets the
first word, for any particular value. The pattern of unboxing and
boxing is similar to the pattern of serialization and deserialization.
In both patterns, the second operation bypasses normal object
construction.
The synthetic box operation also allows the derived value type to make
indirect use of the interfaces of the value-capable class. Again, if
a derived value type must be passed somewhere that expects an
interface, the programmer can simply boxed it and pass a reference
to the box. (In the full proposal, we intend to provide ways for
values to be operated on via interface types, without any visible
boxing. This will require some careful work defining how values
and interfaces work together directly.)
Finally, since static methods and static fields are not copied
into the derived value type, the programmer can only access them
from the original value-capable class, the box.
### Scoping of these features
A crucial part of being able to provide an experimental release is the
ability to mark features as experimental and subject to change. While
the ideas expressed in this document are reasonably well baked, it is
entirely foreseeable that they might change between an experimental
release and a full Valhalla release.
Within a single version of the JVM, the experimental features are
further restricted to classes loaded into the JVM's initial module
layer, or a module selected by a command line option, and is otherwise
ignored. These modules are called _value-capable modules_.
In addition, the class-file format features may be enabled only
in class files of a given major and minor version, such as 53.1.
In that case, the JVM class loader would ensure that classes of
that version were loaded only into value-capable modules, and
then consult only the version number when validating and loading
the experimental extended features proposed here. It is possible
that some minor versions will be used _only_ for experimental
features, and _never_ appear in production specifications.
Any use of any part of any feature of this prototype must originate
from a class in a value-capable module. The JVM is free to detect and
reject attempts from non-value-capable modules. Annotations like
`@DeriveValueType` may be silently ignored.
However, a prototype implementation of this specification may omit
checks for such usage, and seem to work (or at least, fail to throw a
suitable error). Any such non-rejection would be a bug, not an
invitation.
## Value descriptors
In value-capable modules, the class-file descriptor language is
extended to include Q-types, which directly denote unboxed
value types. The descriptor syntax is "`Q`_BinaryName_`;`", where
_BinaryName_ is the internal form of the VCC name. (The internal
form substitutes slashes for dots.) In fact, the class name must be
that of the value type derived from a value-capable class.
By comparison, a standard reference type descriptor is called an
L-type. For a value-capable class _C_, we may speak of both the
Q-type and the L-type of _C_. Note that usage of L-types is not
correlated in any way with usage of Q-types. For example, they can
appear together in method types, in arbitrary mixtures.
A Q-type descriptor may appear as the type of a class field defined in
a value-capable module. But the same descriptor may not appear in a
field reference (`CONSTANT_Fieldref`) for that field (even in a
value-capable module), when that reference is used by one of the four
`getfield` family of instructions.
>
(Method handle factories, described below, will support field loads
and updates, in both values and objects. In this proposal, the field
instructions themselves are unchanged.)
A Q-type descriptor may appear as an array element type in a class of
a value-capable module. (Again, this is only in a value-capable
module, and probably in a specific experimental class-file version.
Let's stop repeating this, since the limitation has already
been set down as a blanket statement.) There are no bytecodes for
creating, reading, or writing such arrays, but the prototype makes
method handles available for these functions.
A field or array of a Q-type is initialized to the _default value_ of
that value type, rather than null. This default value is defined
(at least for now) as a value all of whose fields are themselves of
default value. Such a default may be obtained from a suitable method
handle, such as the `MethodHandles.empty` combinator.
>
(In other words, default values are built up by combining the existing
type-specific default values of `null`, `false`, `\0`, `0`, and `0.0`.
All Java heap variables are initialized to these zero data, including
values. User-defined defaults are unlikely, or at least in the future.)
A Q-type descriptor may appear as the parameter or return type of a
method defined in a class file. As described below, the verifier
enforces the corresponding stacked value for such a parameter or
return value to match the Q-type (not the corresponding L-type or any
other type).
Any method reference (a constant tagged `CONSTANT_Methodref` or
`CONSTANT_InterfaceMethodref`) may mention Q-types in its descriptor.
After resolution of such a constant, the definition
of such a method may not be native, and must use new bytecodes to
work directly with the Q-typed values.
Likewise, a `CONSTANT_Fieldref` constant may mention a Q-type in its
descriptor.
Note that the Java language does not provide any direct way to mention
Q-types in class files. However, bytecode generators may mention such
types and work with them. It is also likely that work in the Valhalla
project will create experimental language features to allow source
code to work with Q-types.
## Constant pool and instruction linkage
Since our value types will have names and members like reference
types, but are distinct from all reference types, it is necessary to
extend some constant pool structures to interoperate with Q-types.
Naturally, as a result of extending descriptor syntax, method and
field descriptors can mention Q-types. Doing this requires no
additional format changes in the constant pool.
However, some occurrences of types in the constant pool mention "raw"
class names, without the normal descriptor envelope characters (`L`
before and `;` after). Specifically, a `CONSTANT_Class` constant
directly links (after resolution) to a loaded class file, so the
mirror that pertains to it is clearly the principal mirror for
the class. What is a class-file to do if it needs to mention
a secondary mirror?
When used with the `ldc` or `ldc_w` bytecodes, or as a bootstrap
method static argument, a `CONSTANT_Class` beginning with an escaped
descriptor resolves to the principal `Class` mirror for the resolved
class. The secondary mirrors can be created, if desired, by using the
general `CONSTANT_Dynamic` form (a separate proposal [JDK-8177279][]).
When used as the class component of a `CONSTANT_Methodref` or
`CONSTANT_Fieldref` constant, a `CONSTANT_Class` will always denote
the classfile itself, without "tilting" toward any particular "view".
The bytecode which _uses_ the field or method reference will
determine whether the receiver is a Q-type or L-type. This
"tilt" of the bytecode can be described as a _mode_, so that
the standard `getfield` instruction is an L-mode instruction,
while the `vgetfield` instruction is a Q-mode instruction.
>
(Note: This design of "agnostic" field and method references implies
that the JVM is willing to cache sufficient resources in the constant
pool entry for a `CONSTANT_Fieldref` to serve both `getfield` and
`vgetfield` instructions, which means either duplicate space or else
strong alignment between the memory layouts of boxed L-values and
buffered Q-values of the same class. A similar point goes for method
references. This alignment can have a deep influence on the
implementation of the JVM, making Q-type and L-type values internally
similar to each other, except for their storage discipline and
identity management. This seems to be a good thing, especially if
U-types are added to the picture at some point in the future.)
In the case of the `MethodHandles.Lookup` API, the distinction between
Q-type and L-type (and hence invocation mode) will be carried by the
`Class` mirror passed as the first argument to the API call, such as
`Lookup.findGetter`. If `findGetter` is passed the secondary mirror
for a Q-type, it will return a field getter from a by-value receiver.
If it is passed the primary mirror (the L-type) then of course it will
return a field getter for the by-reference object receiver, exactly as
today.
### Restrictions on Q-mode method calls
When performing Q-mode method call (with Q-type receiver), none of the
methods of `java.lang.Object` may appear; the JVM or method handle
runtime may require special filtering logic to enforce this.
In other words, Q-types do not inherit from `Object`. Instead,
they will either define their own methods which replace the
`Object` methods, similar to the rules for value-based classes,
or avoid `Object` methods altogether.
As an exception, the `Object.getClass` method may be permitted, but it
must return the princpal `Class` mirror, corresponding to the VCC.
>
The theory here is that `getClass` reports the class-mirror for the
file loaded to define the object's type. This theory could change.
These restrictions apply to method handles obtained from Q-types,
and to `vinvoke` instructions (if they are supported).
### JVM changes to support Q-types
Q-types, like other type descriptor types, can be mentioned in many
places. The basic list is:
* method and field definitions (UTF8 references in the `method_info`
and `field_info` structures)
* method and field symbolic references (a UTF8 component of
`CONSTANT_NameAndType`)
* type names (UTF8 references in `CONSTANT_Class` constants)
* array component types (after left bracket `[`) in any descriptor
* types in verifier stack maps (via a new item code for Q-types)
* an operand (a `CONSTANT_Class`) of some bytecodes (described below)
The JVM might use invisible boxing of Q-types to simplify the
prototyping of many execution paths. This of course works against a
key value proposition of values, the flattening of data in the heap.
The minimal model requires special processing of Q-types in
array elements and object (or value) fields, at least enough special
processing to initialize such fields to the default value of the
Q-type, which is not (and cannot be) the default `null` of an L-type.
So when the class loader loads an object whose fields are Q-types, it
must resolve and load the classes of those Q-types, and
inquire enough information about the Q-type definition to lay out the
new class which contains the Q-type field. This information includes
at least the size of the type, and may eventually include alignment and
a map of managed references contained in the Q-type.
If a value type contains only primitive fields, then arrays of the
value type must be supported, with full flattening of the values
within the arrays.
>
(This proposal supports so-called "flattened arrays", whose
elements are value structures laid out end-to-end. A minimized form
of this proposal _may_ omit support for some or all types of
flattened, value-bearing arrays. For example, even if
fields of value types may contain a mix of primitives, references,
and/or sub-values, arrays containing such mixed values may be more
difficult to implement than arrays containing values with only
primitive fields; such arrays _may_ be omitted from early
implementations. API points which create such arrays are allowed,
temporarily, to throw errors instead of returning. Caveat emptor.)
Flattened arrays, when supported, must be created with
a component type which is a Q-type. They
will differ from arrays of corresponding L-types just as
`Integer[].class` differs from `int[].class`. Likewise, the
super-type of a value-bearing array will (like a `int[]`) be `Object`
only, and not a different array type. Such arrays will not convert
any other array type, and must be manipulated by explicitly obtained
method handles.
In today's JVM, when a class is first instantiated using `new`, its
initialization is triggered (unless it already occured via another
instruction like `getstatic`). The initalization recursively triggers
the initialization of the super-class, much as the earlier loading of
the class triggerd the loading of the super-class. When a value type
is embedded in a class file, it acts (with respect to phasing of class
loading and initialization) much like a super-type.
In particular, since a Q-type field in an object is a valid (default)
value of the Q-type from the first instant of the enclosing object's
existence, it follows that the class of the Q-type field must be
initialized _before_ the enclosing object is allocated. This
requirement is similar enough to the super-class initialization
requirement that they can be treated as instances of the same
phenomenon: When a class is initialized (resp. loaded), its
_dependencies_ must first be initialized (resp. loaded), where the
dependencies include both the super-class and the classes of any
values embedded in the sub-class layout. (Viewed another way, it is
as if the super-class occupies a large anonymous value-type field in
the sub-class.) By ensuring timely initialization of classes
of inlined fields, we can enforce the invariant that class
methods cannot operate on class instances (either pure values
or objects) until the class initialization has been triggered.
In this minimal proposal, the DVT must be initialized before its first
value is created, which means an object or array is created that
contains such a value, or else a `vdefault` or `vunbox` instruction
completes. The DVT depends on the VCC, so DVT initialization must, in
its turn, trigger VCC intialization.
Note that `vunbox` instruction might relax these rules in practice,
since the input is necessarily a live VCC instance, so the only thing
left to do is extract the DVT and observe that its initialization is a
no-op, since it cannot contain any code.
## Value bytecodes
The following new bytecode instructions are added:
* `vload` pushes a value (a Q-type) from a local onto the stack.
* `vstore` pops a value (a Q-type) from the stack into a local.
* `vreturn` pops a value (a Q-type) from the stack and returns it
from the current method.
* `vbox` and `vunbox` convert between corresponding Q-types and L-types
* `vaload` and `vastore` access elements of "flat" arrays of Q-types
* `vdefault` pushes onto the stack the unique default value of a given Q-type
* `vgetfield` pops a Q-type and pushes a field selected from the Q-type
* `vwithfield` pops a Q-type and a selected field value and pushes an updated Q-type
Values are stored in single locals, not groups of locals as with
`long` and `double` (which consume pairs of locals).
The format of these instructions is TBD. Some of them must include an
operand field which describes the type of value being manipulated.
The field manipulation instructions require a `CONSTANT_Fieldref`.
Certainly `vbox`, `vunbox`, and `vdefault` require an explicit type
operand field.
The JVM may use Q-type resolution to acquire information about the
Q-type's size and alignment requirements, so as to properly "pack" it
into the interpreter stack frame. Or the JVM may simply use boxed or
buffered representations (the corresponding value-capable L-types, or
some internal heap or stack type) and ignore sizing information.
It seems likely that we can omit the type operands for the data
movement instructions. If we can observe that the JVM interpreter
must use an internally uniform "carrier type" for all value types on
the stack, we can simply require that this carrier type be
self-describing, and then there is no need to reaffirm the exact value
type in the data movement instructions.
Since `invokevirtual`, `invokespecial`, and `invokeinterface`
instructions are Q-mode instructions, they cannot invoke methods on
Q-values.
Method handles, `invokestatic` and
`invokedynamic` will always allow bytecode to invoke methods on
Q-types, and this is sufficient for a start. Such a method handle may
in fact internally box up the Q-type and run the corresponding L-type
method, but this is a tactic that can be improved and optimized in
Java support libraries, without pervasive cuts to the interpreter.
>
(Note: It seems certain that the full proposal will have a Q-mode
invoker instruction, `vinvoke`, which invokes methods on Q-types
without boxing. For U-types, a U-mode `uinvoke` instruction will
similarly operate on dynamically tagged receiver values which may be
either Q-values or L-values. This extra mode may seem like an
extravagance, but it seems necessary for processing algorithms on
generic variables that crop up when using interface or type variables
that can range over both pure values and object references.)
### Verifier interactions
When setting up the entry state for a method, if a Q-type appears in
the method's argument descriptors, the verifier notes that the Q-type
(not the L-type!) is present in the corresponding local at entry.
When returning from a method, if the method return type is a Q-type,
the same Q-type must be present at the top of the stack.
When performing an invocation (in any mode), the stack must contain
matching Q-types at the positions corresponding to any Q-types in the
argument descriptors of the method reference. After the invocation,
if the return type descriptor was a Q-type, the stack will be found
to contain that Q-type at the top.
As with the primitive types `int` and `float`, a Q-type will not
convert to any other verification type than itself, or the
verification super-types `oneWord` or `top`. This affects matching of
values at method calls, and also at control flow merge points.
Q-types do not convert to L-types, not even their boxes or the
supertypes (`Object`, interfaces) of their L-types.
Besides `vload`, `vstore`, `vreturn`, and the `invoke` family, the
only bytecodes guaranteed to produce or consume Q-type operands are
`pop`, `pop2, `swap`, and the `dup` family. More bytecodes may be
added over time. The verifier enforces proper handling of Q-types.
The `vaload` and `vastore` instructions work just like the pre-existing
array instructions. Given a uniform carrier type, there is no need for
them to reaffirm the Q-type they operate on. That type can always be
extracted from the array itself.
The `vgetfield` instruction has access control similar to the existing
`getfield` instruction. If a field is public in some value type, any
class can read that field from a value of that type.
But the `vwithfield` instruction has tight access control regardless
of the field's access. Only a class with private access to the value
type is allowed to perform field replacement. This restriction is
analogous to that on `putfield` for a final field, which is only
allowed in the class defining the field, and in fact in constructors
of that class. Because the VCC and DVT are two sides of the same
logical type, the JVM must allow the VCC to perform `vwithfield`
operations on its DVT. This will be done reflectively, using method
handles, unless the VCC somehow is gifted with compiled-in
`vwithfield` instructions.
>
(Note: Value field update instructions correspond approximately
to object field update instructions where the object fields
are final. The rules differ in detail. The JVM enforces a
rule restricting `putfield` on final fields only in constructors.
The Java language makes further restrictions, ensuring that each
such field must be set _exactly once_ along all paths to a normal
return from the constructor. The JVM does not (and cannot)
ensure these further constraints defined by the language,
and therefore allows any number of `putfield` instructions,
including zero, to occur on a final field in a constructor.
Likewise, the language will eventually ensure proper initialization
of value fields, but the JVM has no particular role to play,
except to restrict the use of `vwithfield` to private code.
The restriction is not identical to that for final fields,
since `vwithfield` will eventually find legitimate uses
outside of value constructors, such as "wither" methods.)
The reflective Lookup API will allow VCCs and DVTs to share access to
each others private members and capabilities, just as they are shared
between nestmates today. This includes permission to use the
`vwithfield` instruction. Since DVTs have no methods, this
sharing is asymmetrical, but it is in any rate bidirectional.
>
(Note: A future revision of the JVM may support
explicit VM-level "nestmates" which have access to each others'
private fields and methods. In that revision, the `vwithfield`
instruction would be available to all nestmates of a given value
type. In other words, `vwithfield` is available within the
same "capsule" where `private` methods are available.)
The `vdefault` instruction would seem to be very "private" to a value
type, since it allows constructor-free creation of a value. But the
JVM gives default values a very peculiar status, since any array of a
given type is always pre-packed with that type's default values.
Therefore, there is actually nothing "private" about `vdefault`. Any
class can compute the default value of any Q-type at any time.
### Q-types and bytecodes
Other bytecodes which interact with Q-types are at least these:
* all invocation bytecodes: any argument or return value may be a
Q-type; the receiver (class component of `Methodref`) may not,
not even for static members
* `ldc` and `ldc_w` (of a Q-type, or perhaps a dynamically generated
constant)
* `anewarray`, `multianewarray` work on arrays of Q-type elements
These instructions may not be supported in the minimal version:
* `getfield`, `putfield`, `getstatic`, `putstatic` (of a Q-type value)
It is possible that the JVM will accept class files which define
Q-type fields but not allow the field access instructions to operate
on them. In that case, method handles will provide a workaround
for getting and setting the Q-typed fields of such class files.
(Note that such class files will have been produced by direct
bytecode spinning, or an enhanced non-standard version of javac.
VCC's will not contain Q-typed descriptors, since the standard
version of javac will never emit them.)
## Value type reflection
As we launch into a value type world, a fundamental change takes place
in the role of the reflective class mirror type, `java.lang.Class`.
Before value types, there is a strict one-to-one relation between
class files and `Class` mirrors. The only exception is primitive type
mirrors like `int.class`, which do not come from any class file. With
value types, one class file may correspond to more than one `Class`
mirror. Specifically, loading a value type will make available two
mirrors, one for the Q-type and one for the L-type. Their relation
will be something like that between `int.class` and `Integer.class`
but they will represent the same class, in two different views
or projections. (There is also be a U-type which subsumes both
views. And specializable classes may create an unlimited number
of derived specializations.)
Given all this, it seems useful to speak of the _principal_ or _proper
mirror_ for a class, which most directly represents the loaded class
file. Then there are its "helpers", the other class mirrors which
represent a secondary view or projection of the principal. We can
speak of these as _secondary_ or _improper mirrors_.
>
(Note: If we squint hard, we can then see that `Integer.class` acts
like a proper mirror for the improper mirror `int.class`. Perhaps
they can be converged in the future, so that they really do talk
about a common class.)
For minimal value types, we approach this brave new world by
distinguishing `Class` mirrors for the DVT and VCC, and treating the
VCC as the principal mirror, and the DVT as a secondary mirror.
The public class `jdk.experimental.value.ValueType`
(in an internal module) will contain all methods of the runtime
support for values in this initial prototype.
`ValueType` will contain the following
public methods for reflecting Q-types:
public class ValueType {
static boolean classHasValueType(Class x);
static ValueType forClass(Class x);
Class valueClass(); // DVT, secondary mirror
Class boxClass(); // VCC, principal mirror
...
}
The predicate `classHasValueType` is true if the argument represents
either a Q-type or (the L-type of) a value-capable class. The factory
`forClass` returns the descriptor of the Q-type for any type derived
from a value-capable class.
(If given any other
type, it throws `IllegalArgumentException`; users might want to test
with `classHasValueType` first to avoid the exception.)
The two accessors `valueClass` and `boxClass` return distinct
`java.lang.Class` objects for the Q-type and the original
(value-capable) L-type, respectively.
It is the case that `ValueType.forClass(vt.valueClass())` is the same
as `vt`, and likewise for
`boxClass`. Thus, any `Class` aspect of a value type can be used to
obtain its `ValueType` descriptor.
The legacy lookup method `Class.forName` will continue to return the
`boxClass`, for reasons of compatibility. This condition may or may not
persist. (In the future, the source language construct `T.class` is
likely to produce something more natural to the source code type
assigned to `T`, under the slogan "works like an int".)
The secondary mirror for the Q-type will not support most meaningful
reflective queries, such as `getDeclaredMethods`. The reason for this
is that the DVT, as derived from the VCC, is as empty as possible.
It is likely that only fields will be visible in it via reflective
queries.
In any case, users must resort to the VCC (the `boxClass`) to examine
the members relevant to the value type. (This state of affairs may
well change when the two classes are merged.) This works as normal
because when the VCC is loaded it is treated as a POJO (plain old Java
object), and the DVT is extracted from the VCC without changing any
aspect of it.
Improper classes for Q-types may appear in reflective APIs wherever primitive
pseudo-types (like `int.class`) can appear. These APIs include both
core reflection (`Class` and the types in `java.lang.reflect`) and
also the newer APIs in `java.lang.invoke`, such as `MethodType` and
`MethodHandles.Lookup`. Constant pool constants that work with these
types can refer to Q-types as well as L-types, and the distinctions
are surfaced, reflectively, as suitable choices of `Class` objects
(either proper or improper).
>
(An improper class is also sometimes called a "crass",
where the "r" sound suggests that the thing exists only to reify a
distinction necessary at runtime. The main class is the thing
returned by `Class.forName`, and which represents a class file in 1-1
correspondence; a "crass" is anything else typed as `java.lang.Class`.
A [more principled approach to reflection
[cimadamore-refman]][cimadamore-refman] uses "type mirrors" of a
suitably refined interface type hierarchy.)
As always,
you can use the method handle APIs to create and manipulate arrays, load
and store fields, invoke methods, and obtain method handles.
The Q-type mirrors will, in general, operate much like the existing
primitive mirrors, instructing the JVM to pass a datum as a pure
value, rather than as a reference to a box.
Method handle transforms which change types (such as `asType`) will
support value-type boxing and unboxing just as they can express
primitive boxing and unboxing. Thus, the following code creates a
method handle which will box a `DoubleComplex` value into an object:
Class srcType = DoubleComplex.class;
Class qt = ValueType.forClass(srcType).valueClass();
MethodHandle mh = identity(qt).asType(methodType(Object.class, qt));
Of course, the type-converting method `MethodHandle.invoke` will allow
users to work with method handles over Q-types, either in terms of
box types as supported by the current Java language, or (in suitable
bytecodes) more directly in terms of Q-types.
## Boxed values
Boxing is useful in order to gain interoperability between Q-types and
APIs which use `Object` references to perform generic services across
all types. Many tools (such as debuggers, loggers, and `println`
methods) assume a standard `Object` format for reporting arbitrary
data. The value-capable L-type of a Q-type (or, more generally,
whatever boxing mechanism ends up as the container for Q-types) serves
a useful role in the initial system, and (it seems probable) even in
the final system.
As noted before, instances of a value-capable class (which is an
L-type) serve, at first, as boxes for values of the corresponding Q-type.
The method
handle APIs allow box/unbox conversion operators to be surfaced as method
handles or applied implicitly for argument conversions.
The value-capable L-type also allows convenient specification of some
kinds of Q-type behaviors, such as `toString`, by writing them
directly as standard Java methods on the L-type. The method handle
lookup runtime will make up the difference between an unboxed receiver
(`this`) and boxed receiver, at negligible cost (since the HotSpot JVM
has a sufficient range of scalarization optimizations to remove the
extra boxing step).
However, we anticipate that the tightest loops will be constructed
to have unboxed data flowing along all hot paths. This means that
boxing is most useful, at this early point, at class-load time and
for peripheral operations like `println`.
Since the value-capable class is value-based, it is inappropriate to
synchronize on boxes, make distinctions on them by means of reference
equality comparisons, attempt to mutate their fields, or attempt to
treat a `null` reference as a point in the domain of the boxed type.
>
(These restrictions are likely to carry forward even if boxes have a
different form in the future.)
A future JVM _may_ assist in detecting (or even suppressing) some of
these errors, and it may provide additional optimizations in the presence
of such boxes (which do not require a full escape analysis).
However, such assistance or optimization appears to be unnecessary in
this minimal version of the design. Code which works with Q-types
will, by its very nature, be immune to such bugs, since Q-types are
non-synchronizable, non-mutable, non-nullable, and identity-agnostic.
## Value operator factories
Given the ability to invoke method handles that work with Q-types, all
other semantic features of value types can (temporarily) be accessed
solely through method handles. These include:
* Conversion routines (like box/unbox).
* Obtaining default Q-types.
* Constructing Q-types.
* Comparing Q-types.
* Calling methods defined on Q-types.
* Reading fields defined in Q-types.
* Updating fields defined in Q-types.
* Reading or writing fields (or array elements) whose types are Q-types.
* Constructing, reading, and writing arrays of Q-types
The `MethodHandles.Lookup` and `MethodHandles` APIs will work on
Q-types (represented as `Class` objects), and surface methods which
can perform nearly all of these functions.
>
It is sometimes helpful to think of these operators as the missing
bytecodes. They can be used with the `invokedynamic` instruction to
emulate bytecodes that experimental translators may need. Eventually,
some form of many such bytecodes will "selectively sediment" down
in to the JVM's repertoire of bytecodes, and usage of `invokedynamic`
for these purposes will decrease.
Pre-existing method handle API points will be adjusted as follows:
* `MethodType` factory methods will accept `Class` objects
representing Q-types, just as they accept primitive types today.
* `invoke`, `asType`, and `explicitCastArguments` will treat
Q-type/L-type pairs just as they treat primitive/wrapper pairs.
* `Lookup.in` will allow free conversion (without loss of privilege
modes) between Q-type/L-type pairs.
* Non-static lookups in Q-types will produce method handles which
take leading receiver parameters that are Q-types, not L-types.
* The `findVirtual` method of `Lookup` will expose all accessible
non-static methods on a Q-type, if the lookup class is a Q-type.
* The `findConstructor` method of `Lookup` will expose all accessible
constructors of the original value-capable class, for both the Q-type
and the legacy L-type. The return type of a method handle produced
by `findConstructor` will be identical with the lookup class, even
if it is a Q-type.
* The `findVirtual` and `findConstructor` methods may also perform
ad hoc pattern matching (TBD) on `private static` methods of the
original L-type, as a convention for imputing methods to the
Q-type alone. (This is useful if the imputed methods are
difficult or inconvenient to express as virtuals on the L-type.)
* The `identity` method handle factory method will accept Q-types.
* The `empty` method handle factory method will accept Q-types,
producing a method handle that returns the default value of the type.
* The array-processing method handle factories will accept Q-types,
producing methods for building, reading, and writing Q-type arrays.
(These include `arrayConstructor`, `arrayLength`, `arrayElementGetter`,
and `arrayElementSetter`, plus eventually the var-handle variants.
Note that some or all array factories may throw exceptions, if the
relevant array types are not supported.)
* All method handle transforms will accept method handles that work
with Q-types, just as they accept primitive types today.
>
(Yes, a value type method is obtained with `findVirtual`, despite the
fact that virtuality is not present on a `final` class. The poorer
alternatives are to co-opt `findSpecial`, or make a new API point
`findDirect` to carry the nice, fine distinction. Since Java is
already comfortable with the notion of "final virtual" methods, we
will continue with what we have.)
Method handle lookup on L-types is likely to give parallel results,
although (depending on user model experiments) some methods on L-types
may be suppressed, on the grounds that box types should be relatively
"featureless" compared to their value types. Of course, a virtual
method of an L-type, when embodied as a method handle, will take
a receiver parameter which is an L-type, not a Q-type.
On the other hand, as noted above, we may well build ad hoc
conventions for binding private static L-type methods, via method
handles, as if they were constructors or non-static methods on the
Q-type. This would be a side contract between a translation strategy
and the method handle runtime, of which the JVM would be unaware. The
static methods, visible only to the method handle runtime, would
contain snippets of logic intended _only_ for use by the Q-type.
Eventually of course we will move all that sort of thing into
bytecodes and the JVM's hardwired resolution logic.
As value-based classes, value-capable classes are required to override
all relevant methods from `Object`. The derived Q-types do _not_ inherit
or respond to the standard methods of `Object`; they only respond to
the methods of `Object` (such as `toString`) for which they implement
matching signatures.
The following additional functions do not (_as yet_) fit in the
`MethodHandle` API, and so are placed in the runtime support class
`jdk.experimental.value.ValueType`.
`ValueType` will contain the following methods:
public class ValueType {
...
MethodHandle defaultValueConstant();
MethodHandle substitutabilityTest();
MethodHandle substitutabilityHashCode();
MethodHandle findWither(Lookup lookup, Class> refc,
String name, Class> type);
}
The `defaultValueConstant` method returns a method handle which takes
no arguments and returns a default value of that Q-type. It is
equivalent (but is probably be more efficient than) creating a
one-element array of that value type and loading the result. This
method may be useful implementing `MethodHandles.empty` and similar
combinators.
The `substitutabilityTest` method returns a method handle which
compares two operands of the given Q-type for substitutability.
Specifically, fields are compared pairwise
for substitutability, and the result is the logical conjunction of all
the comparisons. Primitives and references are substitutable if and
only if they compare equal using the appropriate version of the Java
`==` operator, _except_ that floats and doubles are first converted to
their "raw bits" before comparison.
Likewise, the `substitutabilityHashCode` method returns a method
handle which accepts a single operand of the given Q-type, and produces
a hash code which is guaranteed to be equal for two values of that
type if they are substitutable for each other, and is likely to be
different otherwise.
>
(It is an open question whether to expand the size of this hash code
to 64 bits. It will probably be defined, for starters, as a 32-bit
composition of the hash codes of the value type fields, using legacy
hash code values. The composition of sub-codes will probably use, at
first, a base-31 polynomial, even though that composition technique is
deeply suboptimal.)
The `findWither` method works analogously to `Lookup.findSetter`,
except that the resulting method handle always creates a new value, a
full copy of the old value, except that the specified field is changed
to contain the new value. Since values have no identity, this is the
only logically possible way to update a field value.
In order to restrict the use of wither primitives, the `refc`
parameter and the lookup-class will be checked against the Q-type
of the `ValueType` itself; if they are not all the same Q-type, the
access will fail. The access restriction may be broadened later. A
value-type may of course define named wither methods that encapsulate
primitive wither actions. Eventually, a `withfield` bytecode might
be created to express field update directly, in which case the same
issues of access restriction must be addressed.
>
(The name _wither_ method does not mean a way to blight or shrivel
something--certainly a shady activity. It refers to a naming
convention for methods that perform functional update of record
values. Asking a complex number `c.withRe(0)` would return a new
pure-imaginary complex number. By contrast, `c.setRe(0)`, a call to a
_setter_ method, would seem to mutate the complex number, removing any
non-zero real component. Setter methods are appropriate to mutable
objects, while wither methods are appropriate to values. Note that a
method can in fact be a getter, setter, or wither method even if it
does not begin with one of those standard words. The eventual
conventions for value types may well discourage forms like `withRe(0)`
in favor of simply `re(0)`.)
It is likely that these methods in `ValueType` will eventually
become virtual methods of `Lookup` itself (if that is the leading
argument), else static methods of `MethodHandles`.
These methods are also candidates for direct expression as bytecodes,
just as many existing method handles [directly express
[bcbehavior]][bcbehavior] equivalent bytecode operations.
Here is a table that summarizes the new method handles and the
hypothetical bytecode behaviors for their operations upon value types.
The third column gives both the stack effect of a hypothetical
bytecode, and the type of the actual method handle. In this table,
almost any type can be a Q-type, a fact we emphasize with "`Q`"
prefixes. The "`QC`" type, in particular, stands for the value type
being operated upon. The composite `VT` stands for the
`ValueType` instance derived from the Q-type.
The type "`RC`", mentioned at the bottom of the list for field
accessors of Q-type fields in regular objects, is any normal L-type.
Note that Q-types ("works like an int!") can be read and written whole
from fields and array elements.
: Q-type method handles & behaviors
+------------------------------------+-------------------------+--------------------+
| lookup expression possible bytecode stack effect / MH type
+=+
| `VT.defaultValueConstant()` "vdefault" `QC` `()` → `QC`
+-+
| `VT.substitutabilityTest()` ? `(QC QC)` → `boolean`
+-+
| `VT.substitutabilityHashCode()` ? `(QC)` → `int/long`
+-+
| `L.findGetter(QC, f, QT)` "vgetfield" `QC.f:QT` `(QC)` → `QT`
+-+
| `VT.findWither(L, QC, f, QT)` "vwithfield" `QC.f:QT` `(QC, QT)` → `QC`
+-+
| `L.findVirtual(QC, m, (QA*)QT)` "vinvoke" `QC.m(QA*)QT` `(QC QA*)` → `QT`
+-+
| `L.findGetter(RC, f, QT)` "getfield" `RC.f:QT` `(RC)` → `QT`
+-+
| `L.findSetter(RC, f, QT)` "putfield" `RC.f:QT` `(RC, QT)` → `void`
+-+
This table does not cover many method handles which merely copy around
Q-typed values, or load or store them from normal objects or arrays.
Such operations can appear in many places, including `findStaticGetter`,
`findStaticSetter`, `findVirtual`, `findStatic`, `findStaticSetter`,
`arrayElementSetter`, `identity`, `constant`, etc., etc.
## Reminder: All this will change
The bytecodes and APIs described above are **not the final form** of
value types for Java. Code developed under this minimal proposal
will certainly have to be discarded and rewritten when the full
feature is created.
## Future work
This minimal proposal is by nature temporary and provisional. It
gives a necessary foundation for further work, rather than a final
specification. Some of the further work will be similarly provisional
in nature, but over time we will build on our successes and learn
from our mistakes, eventually creating a well-designed specification
that can takes its place in the sun.
This present set of features that support value types will be
difficult to work with; this is intentional. The rest of this
document sketches a few additional features which may enable
experiments not practical or possible in the minimized proposal.
Therefore, this last section may be safely skipped. Any such features
will be given their own supporting documentation if they are pursued.
It may be of interest, however, to people who have noticed missing
features in the minimal values proposal.
### Denoting Q-types in Java source code
At a minimum, no language changes are needed to work with Q-types. A
combination of JVM hacks (value-capable classes), annotation-driven
classfile transformations, and direct bytecode generation are enough
to exercise interesting micro-benchmarks. Method handles supply a
useful alternative to direct bytecode generation, and they will be
made fully capable of working with Q-types (as described below).
Nevertheless, there is nothing like language support. It is likely
that very early experiments with `javac` will create simple ways to
refer to Q-types and create variables for them, directly in Java code
(subject to contextual restrictions, of course).
In particular, constructors for objects have a very different bytecode
shape than seemingly-equivalent constructors for value types. (The
syntax for Java object constructors is a perfectly fine notation for
value type constructors, as long as all fields are final.) It would
be reasonable for javac to take on the burden of byte-compiling both
versions of each constructor of a value-capable class.
Likewise, direct invocation of value type constructors, and direct
access of value type methods and fields, would be convenient to use
from Java source code, even if they had to be compiled to
invokedynamic calls, until bytecode support was completed.
### Q-replacement within value-capable classes
A value-capable class, compiled from Java source, may have additional
annotations (or perhaps attributes) on selected fields and methods
which cause the introduction of Q-types, as a bytecode-level
transformation when the value-capable class's file is loaded or
compiled.
Two transformations which seem useful may be called _Q-replacement_
and _Q-overloading_. The first deletes L-types and replaces them by
Q-types, while the second simply copies methods, replacing some or all
of the L-types in their descriptors by corresponding Q-types. This
set of ideas is tracked as [JDK-8164889][].
An alternative to annotation-driven Q-replacment would be an
experimental language feature allowing Q-types to be mentioned
directly in Java source. Such experiments are likely to happen
as part of Project Valhalla, and may happen early enough to make
transformation unnecessary.
### More bytecodes
The library method handle `defaultValueConstant` could be replaced by
a new `vdefault` bytecode, or by a prefixed `aconst_null` bytecode.
The library method handle `substitutabilityTest` could be replaced by
a new `vcmp` bytecode, or by a prefixed `if_acmpeq` bytecode.
The library method handle `findWither` could be replaced by a new
`vwithfield` bytecode.
The library method handle `findGetter` could be replaced by a suitably
enhanced `getfield` bytecode.
The library method handle `arrayConstructor` could be replaced by a
suitably enhanced `anewarray` or `multianewarray` bytecode.
The library method handle `arrayElementGetter` could be replaced by a
new `vaload` bytecode, or a prefixed `aaload` bytecode.
The library method handle `arrayElementSetter` could be replaced by a
new `vastore` bytecode, or a prefixed `aastore` bytecode.
The library method handle `arrayLength` could be replaced by a
suitably enhanced `arraylength` bytecode.
It is possible that the eventual "sweet spot" for this design is based
on a single set of _universal bytecodes_ (or prefixed macro-bytecodes)
that work symmetrically across references, values, and primitives, by
allowing them to mention any type descriptor, not just Q-types. Such
"universal bytecodes" deserve their own naming convention, as `uload`,
`ustore`, `uinvoke`, `ucmp`, `u2u`, etc. When a bytecode works only
on values, we use the `v*` naming convention.
Just as the descriptor type `I` is used in the JVM to carry ints,
shorts, booleans, chars, and bytes, and the descriptor type `L` (with
a class) is used to carry all kinds of L-types, so the descriptor type
`Q` (with a class) is used to carry all kinds of K-types. In effect,
despite the four I-types and the infinity of L-types and Q-types,
there are really only three _carrier types_ in the JVM that handle all
that work. (There are also monomorphic carrier types for long, float,
and double.) It is clear that the L-type carrier is a single machine
word pointing into the heap, but the Q-type carrier is more
complicated, a data structure which locates the "payload" of some
value, but also describes its size and layout (at least for the GC).
In fact, it must also describe the value's class. In the end, the
Q-type carrier is really just a kind of typed locator to a buffer
which could be any place (Java heap, C heap, thread stack).
Aligning this internal data structure with the layout of Java object
in the heap leads to possible optimizations, such as a _vbox_ or
_vunbox_ which simply retags a pointer. More importantly, it allows
the efficient creation of a U-type carrier, the ultimate JVM carrier
type which can efficiently transmit any Q-type or any L-type also,
simply by the flip of a tag bit.
If U-types turn out to be necessary, it is likely that further
U-mode instructions will be created. But it is also likely that,
in that case, the Q-mode and U-mode instructions could be largely
consolidated, simply by ensuring that the Q-type carrier and the
U-type carrier were one and the same. The verifier would continue
to track whether a Q-type was "pure value" or whether it had been
"polluted" by joining with a U-type. Logically U-types can carry
object references and nulls, while Q-types cannot, but the common
carrier type can simply model this by having a separate tagging
for reference values, to distinguish them from all Q-type values.
In the end, most of the Q-mode instructions can probably be promoted
to U-mode instructions, giving maximum type flexibility with
only a few extra bytecode points.
The minimal proposal does not need U-types, because it will not
experiment with genericity of values, and because all interface
calls will be (for now) pushed onto the boxed VCC, an L-type.
### More data structures
The minimal proposal may omit various corner cases implied by a free
cross-application of all of features of value types. Filling in these
corner cases will be useful. In addition, limitations on value types
(and on primitives!) should be removed over time. The functional
increases may include, at various stages:
* Allow value types to declare fields of non-primitive type
(Q-types, L-types).
* Implement flattened arrays of all value types.
* Support atomic access to values (e.g., fields marked `volatile`).
* Provide annotations for super-alignment of values (beyond the
normal JVM-constrained alignment, which is usually 64-bits).
* Provide low-level ("unsafe") reflection of the details of JVM
layout of value types.
* Use value-type containers to represent foreign types (C `unsigned`)
or safe pointers (e.g., address plus bounds plus type plus scope).
* Use value-type containers to represent a numeric type tower.
* Optimize the performance and storage density of all of the above.
* Do something to detect or prevent egregious misuse of value-based
object types, such as `Integer` or Q-type boxes. Throw an error
if someone tries to synchronize them, for example.
* Support methods directly on value types.
* Support interface implementation directly on value types,
including (crucially) execution of interface default methods.
* Work out a way for
* Provide a safely tagged, disjoint union type ("P-types") that can
directly represent either a Q-type value or an L-type value.
* Create a standard shape for values, including object-like
protocols for printing, comparison, and hash-coding.
* Add specialized generics supporting Q-types--and primitives too.
(A big one!)
If value types are fully integrated with interfaces, a Q-type must
inherit `default` methods from its interface supertypes. This is a
key form of interoperability between values and generic algorithms and
data structures (like sorting and `TreeMap`). Making this work in the
minimal version requires boxing the value and running the default
method on the box, which may have unpleasant performance implications.
In a full implementation, the execution of default methods should be
optimized to each specific value type. Also, there should a framework
for ensuring that the interface methods themselves are appropriate to
value-based types (no nulls or synchronization, limited `==`, etc.).
This requires further work beyond the scope of the minimal proposal.
It seems difficult to make a value-bearing array type `QT[]` be a
subtype of an interface-array type `I[]`, even if the interface `I` is
a supertype of the value type `QT`. Further work on JVM type
structure would be needed to make this happen. Interface types `I`
are firmly in the L-type camp, at present, and interface arrays are
arrays of references, hence subtypes of `Object[]`, and therefore
would appear to be arrays of references. But an array of value types
`QT` cannot be (transparently) treated as an array of references.
### Bridge-o-matic
In some cases, supplying Q-replaced API points in classes is just a
matter of providing suitable bridge methods. Bytecode transformers or
generators can avoid the need to specify the bodies of such bridge
methods if the bridges are (instead of bytecodes) endowed with
suitably organized bootstrap methods. This set of ideas has many
additional uses, including auto-generation of standard `equals`,
`hashCode`, and `toString` methods. It is tracked as [JDK-8164891][].
### Natural (JVM-native) value-type classfiles
The minimal proposal starts with backwards box-first loading order,
loading POJO types and then deriving values and box types from those.
The box types (initially) are simply the POJO types and the value
types are structs internally derived by extracting the fields (only)
from the POJO type. Even methods intended _only_ for the value type
must be delivered, at first, as POJO methods.
This must change over time to a value-first load format, with value
types directly specified by modified class files, and boxes derived
during the load of the value type.
As the POJO-based load format becomes obsolete, the boxes themselves
may continue to be POJO-shaped versions of the value types, or they
may take a different form.
This change will be relatively expensive, because it will require that
all users of value types upgrade their tooling (compiler, debugger,
IDE, etc.) to support non-standard features exposed by the value-first
design. As noted above, some of the awkwardness of the minimal
proposal is caused by its care to _avoid_ features that would require
(or would only be useful with) changes to the tool-chain. These
features include:
* Value-first class file format (e.g., an `ACC_VALUE` bit in the header).
* Denotation in the source language of Q-types.
* Bytecodes for directly loading fields typed as Q-types and fields
within Q-types.
* Bytecodes for directly invoking methods of Q-types.
* Full integration of value types with arrays, interfaces, etc.
### Heisenboxes
As suggested above, L-types for values are value-based, and some
version of the JVM may attempt to enforce this in various ways, such
as the following:
* Synchronizing a boxed Q-type value may throw an exception like
`IllegalMonitorStateException`.
* Reference comparision (Java operator `==`, or the `acmp`
instruction) may report "true" on two equivalent boxed Q-type
values, even if the references previously returned false, or
"false" when they previously returned "true". Such variation
would of course be subject to the logic of substitutability, of the
underlying Q-types. Two boxes that were once detected as equal
references would be permanently substitutable for each other.
* Attempts to reflectively store values into the fields of boxed
Q-type values may fail, even after `setAccessible` is called.
* Attempts to reflectively invoke the constructor for the box may
fail, even after `setAccessible` is called.
A box whose identity status is uncertain from observation to
observation is called a "heisenbox". To pursue the analogy, a
reference equality (`==`, `acmp`) observation of `true` for two
heisenboxes "collapses" them into the same object, since they are then
proven fully inter-substitutable, hence their Q-values are equivalent
also. Two copies of the reference can later decohere, reporting
inequality, despite the continued inter-substitutability of the boxed
values. The equality predicate could be investigated by wiring it to
a box containing Schrödinger's cat, with many puzzling and sad
results...
This set of ideas is tracked as [JDK-8163133][].
## References
[values]:
[valhalla-dev]:
[goetz-jvmls15]:
[valsem-0411]:
[simms-vbcs]:
[graves-jvmls16]:
[value-based]:
[JEP-11]:
[Long2.java]:
[cimadamore-refman]:
[bcbehavior]:
[JDK-8164891]:
[JDK-8177279]:
[JDK-8164889]:
[JDK-8163133]:
\[values]:
\[valhalla-dev]:
\[goetz-jvmls15]:
\[valsem-0411]:
\[simms-vbcs]:
\[graves-jvmls16]:
\[value-based]:
\[JEP-11]:
\[goetz-jvmls16]:
\[Long2.java]:
\[cimadamore-refman]:
\[bcbehavior]:
\[JDK-8164891]:
\[JDK-8177279]:
\[JDK-8164889]:
\[JDK-8163133]: