Minimal Value Types

April 2017: Minimal Edition (v. 0.3)

John Rose, Brian Goetz

“What we do in the shadows reveals our true values.”

Background

In the three years since the first public proposal [values] of value types, there have been vigorous discussions [valhalla-dev] of how to get there, and vigorous prototyping in the Java compiler, classfile format, and VM. The goal has been to unify primitives, references, and values in a common platform that supports efficient generic, object-oriented programming.

Much of the discussion has concentrated on generic specialization [goetz-jvmls15], as a way of implementing full parametric polymorphism in Java and the JVM. This concentration has been intentional and fruitful, since it exposes all the ways in which primitives fail to align sufficiently with references, and forces us to expand the bytecode model. After solving for List<int>, it will be simpler to manage List<Complex<int>>.

Other discussions have concentrated on details of value semantics [valsem-0411] and specific tactics implementing [simms-vbcs] new bytecodes which work with values. A few experiments have employed value-like APIs to perform useful tasks like vectorizing loops [graves-jvmls16].

Most recently, at the JVM Language Summit (2016), and at the Valhalla EG meeting that week, we got repeated calls for an early-access version of value types that would be suitable for vector, Panama, and GPU experiments. This document outlines a subset of experimental value type support in the JVM (and to a smaller degree, language and libraries), that would be suitable for early adopters. And we have been prototyping many of these ideas in recent months.

Looking back, it is reasonable to estimate that there have been many thousands of engineer-hours devoted to mapping out this complex future. Now is the time to take this vision and choose a first version, a sort of “hello world” system for value types.

The present document proposes a minimized but viable subset of value-type functionality with the following goals:

Our non-goals are complementary to the goals:

In other words, before releasing our values to the full light of day, we will prototype with them in the shady area between armchair speculation and public specification. Such a prototype, though limited, is far from useless. It will allow us to experiment with various approaches to the design and implementation of value types. We can also discard approaches as needed! We can also begin to make better estimates of performance and usability, as power-users (most of whom will work closely with the designers and implementors) exercise various early use cases.

Features

The specific features of our minimum (but viable) support for value types can be summarized as follows:

The value-capable classes can be developed in today’s toolchains as standard POJO classes. In this mode of use, standard Java source code, including generic classes and methods, will be able to refer to values only in their boxed form. However, both method handles and specially-generated bytecodes will be able to work with values in their native, unboxed form.

This work relates to the JVM, not to the language. Therefore non-goals include:

Given the slogan “codes like a class, works like an int,” which captures the overall vision for value types, this minimal set will deliver something more like “works like an int, if you can catch one with a box or a handle”.

By limiting the scope of this work, we believe useful experimentation can be enabled in a production JVM much earlier than if the entire value-type stack were delivered all at once.

The support for the new JVM-level features will allow immediate prototyping of new language features and tools which can make direct use of those features. But this minimal project does not depend on such language features or tools.

The rest of this document goes into the proposed features in detail.

Value-capable classes

A class may be marked with a special annotation @DeriveValueType (or perhaps an attribute). A class with this marking is called a value-capable class, meaning it can be endowed with an associated derived value type, beyond the class type itself.

The use of this annotation will be restricted in some manner, probably unlocked by a command line option, and associated with some sort of incubator module [JEP-11].

Example:

@jvm.internal.value.DeriveValueType
public final class DoubleComplex {
  public final double re, im;
  private DoubleComplex(double re, double im) {
    this.re = re; this.im = im;
  }
  ... // toString/equals/hashCode, accessors, math functions, etc.
}

The semantics of the marked class will be the same as if the annotation were not present. But, the annotation will enable the JVM, in addition, to consider the marked value-capable class as a source for an associated derived value type.

The super-class of a value-capable class must be Object. (This is similar to the full proposal, in which super-classes are disallowed.)

A class marked as value-capable must qualify as value-based, because its instances will serve as boxes for values of the associated value type. In particular, the class, and all its fields, must be marked final, and constructors must be private.

A class marked as value-capable must not use any of the methods provided on Object on any instance of itself, since that would produce indeterminate results on a boxed version of a value. The equals, hashCode, and toString methods must be replaced completely, with no call via super to Object methods.

As an exception, the getClass method may be used freely; it behaves as if it were replaced in the value-capable class by a constant-returning method.

As with all value-based classes, the other object methods (clone, finalize, wait, notify, and notifyAll) should not be used with the value-capable class. (This is left to the user to enforce manually. In the full proposal we may find ways to enforce it more automatically.)

In summary, the JVM will make the following structural checks on a value capable class:

These structural checks may be performed when the JVM creates a derived value type from the value-capable class, when the JVM loads the value-capable class, or any other convenient moment.

Apart from the above restrictions, a value-capable class can do any of the things normal value-based classes do, such as define constructors, methods, fields, and nested types, implement interfaces, and define type variables on itself or its methods. There is no particular restriction on the types of fields.

As we will see, the derived value type will contain only fields. It will contain the same set of fields as the value-capable class from which it is derived. But the JVM will give it no methods, constructors, nested types, or super-types. (In the full proposal, of course, value types will “code like a class” and support all of those features.)

Note that a value-capable class compiled using the standard javac compiler will be unable to express “inline sub-value” fields which themselves are value types; the most it will be able to do is request fields of their associated reference types (“L-types”). An upgraded version of javac may well be able to define sub-value fields, of true “Q-types”. Such a version of javac will give programmers the ability to work directly with value types, bypassing the trick of deriving a separate derived value type from a value-capable classes. Thus, if you see a field in a value-capable class which itself is typed as a value-capable class, it is probably an error, an unintentional boxing of an intended inline sub-value.

Here is a larger example of a value-capable class which defines a “super-long” derived value type:

@DeriveValueType
final class Int128 extends Comparable<Int128> {
  private final long x0, x1;
  private Int128(long x0, long x1) { ... }
  public static Int128 zero() { ... }
  public static Int128 from(int x) { ... }
  public static Int128 from(long x) { ... }
  public static Int128 from(long hi, long lo) { ... }
  public static long high(Int128 i) { ... }
  public static long low(Int128 i) { ... }
  // possibly array input/output methods
  public static boolean equals(Int128 a, Int128 b) { ... }
  public static int hashCode(Int128 a) { ... }
  public static String toString(Int128 a) { ... }
  public static Int128 plus(Int128 a, Int128 b) { ... }
  public static Int128 minus(Int128 a, Int128 b) { ... }
  // more arithmetic ops, bit-shift ops
  public int compareTo(Int128 i) { ... }
  public boolean equals(Int128 i) { ... }
  public int hashCode() { ... }
  public boolean equals(Object x) { ... }
  public String toString() { ... }
}

Similar types [Long2.java] have been used in a loop vectorization prototype. This example has been defined in a prototype version of the java.lang package. But value-capable types defined as part of this minimal proposal will not appear in any standard API. Their visibility is likely to be controlled using features of the module system, such as incubator modules.

Initial value-capable classes are likely to be extensions of numeric types like long. As such they should have a standard and consistent set of arithmetic and bitwise operations. There is no such set codified at present, and creating one is beyond the scope of the minimal set. Eventually we will need to create a set of interfaces that captures the common operation structure between numeric primitives and numeric values.

Splitting the value type from the object type

It is likely that an initial implementation in the JVM will work best if the value-capable class, and the value-type derived from it, get distinct names. (In the full proposal, there is no need for two types.)

When the JVM loads a value-capable class, it may either eagerly derive a derived value type for it, or else set a flag on the class and arrange to create the derived value type on demand.

In either case, the value-capable class is not changed at all at load time. The corresponding derived value type is created as a copy of the value-capable class, but with these crucial differences:

The name given to the derived value type is implementation-dependent. The main constraint is that the two types are in the same package. Also, the synthetic name must not be likely to clash with user-written names, so there will be some dollar signs or other punctuation. The name must be a valid class name, so that tools can “spin” it into Q-type descriptors in dynamically generated class files.

There is no need to give a general rule for constructing the synthetic name. Reflective APIs (defined below) will allow it to be obtained programmatically, as needed.

Start again with our example of DoubleComplex:

@jvm.internal.value.DeriveValueType
public final class DoubleComplex {
  public final double re, im;
  ...
  double realPart() { return re; }
}

When the JVM decides to synthesize a derived value type for DoubleComplex it makes a fresh copy with all class members stripped out except the two double fields. Crucially, the JVM uses internal magic to make the synthetic class into a value-type, not an object type.

Let’s assume, for the moment, that the JVM will name the derived value type to make it look like an inner type of the value-capable class. In that case the resulting derived value type would look like this:

@jvm.internal.value.DeriveValueType
public final class DoubleComplex {
  public final double re, im;
  ...
  double realPart() { return $value.re; }
}
public static __ByValue class DoubleComplex$Value {
  public final double re, im;
}

The hypothetical __ByValue keyword notes where values are defined in place of references. Until much work has been done up and down the stack, such a thing cannot be directly specified in source code, but it is perfectly reasonable and useful to perform at class-load time.

Note that the derived value type has no constructors. Normally this would be a problem, since object classes are required by the JVM to have at least one constructor. The JVM permits this in the case of the derived value type. (Such a constraint is not necessary with value types in general, but that story is too long to tell here.) In any case, the derive value type will “borrow” the constructors of the value-capable class, as we will see in the next section.

This design may be called “box-first”, in that the JVM loads the box-type only, and somehow creates the value-type as a side effect. We will end up with a more natural “value-first” design, but the present box-first design puts the fewest constraints on tools which read and write class-files, including the JVM and javac. So the box-first awkwardness is the correct choice, at first.

Boxing, unboxing, and borrowing

The JVM internally arranges boxing and unboxing operations to convert between the value-capable class and its derived value type. The semantics of these operations are simple field-wise copies between the two types. This obviously is well-defined because the field lists are identical.

The synthetic unbox operation allows the derived value type to make indirect use of the constructors of the value-capable class. The programmer can create a box using a constructor, and unbox it to get the desired constructed value. The JVM just copies the fields out of the box and discards the box. (In the full proposal, value types have real constructors, and don’t need to borrow them from their boxes.)

The synthetic box operation allows the derived value type to make indirect use of the methods of the value-capable class. The programmer can temporarily box a value, and invoke any of the methods of the value-capable class, throwing away the box when the method returns. Since the box has a short lifetime, it is likely that the JVM can optimize it away, at least for simple methods. (In the full proposal, value types have real methods, and don’t need to borrow them from boxes. Instead, the boxes can borrow their methods from the values.)

Note that the synthetic box operation creates a new instance of the value-capable class without running a constructor. Normally this is a problem, but in this case the two classes are so closely linked that it is safe to assume that any value was created (in the first place) by unboxing a properly-constructed box. Thus, a constructor gets the first word, for any particular value. The pattern of unboxing and boxing is similar to the pattern of serialization and deserialization. In both patterns, the second operation bypasses normal object construction.

The synthetic box operation also allows the derived value type to make indirect use of the interfaces of the value-capable class. Again, if a derived value type must be passed somewhere that expects an interface, the programmer can simply boxed it and pass a reference to the box. (In the full proposal, we intend to provide ways for values to be operated on via interface types, without any visible boxing. This will require some careful work defining how values and interfaces work together directly.)

Finally, since static methods and static fields are not copied into the derived value type, the programmer can only access them from the original value-capable class, the box.

Scoping of these features

A crucial part of being able to provide an experimental release is the ability to mark features as experimental and subject to change. While the ideas expressed in this document are reasonably well baked, it is entirely foreseeable that they might change between an experimental release and a full Valhalla release.

Within a single version of the JVM, the experimental features are further restricted to classes loaded into the JVM’s initial module layer, or a module selected by a command line option, and is otherwise ignored. These modules are called value-capable modules.

In addition, the class-file format features may be enabled only in class files of a given major and minor version, such as 53.1. In that case, the JVM class loader would ensure that classes of that version were loaded only into value-capable modules, and then consult only the version number when validating and loading the experimental extended features proposed here. It is possible that some minor versions will be used only for experimental features, and never appear in production specifications.

Any use of any part of any feature of this prototype must originate from a class in a value-capable module. The JVM is free to detect and reject attempts from non-value-capable modules. Annotations like @DeriveValueType may be silently ignored.

However, a prototype implementation of this specification may omit checks for such usage, and seem to work (or at least, fail to throw a suitable error). Any such non-rejection would be a bug, not an invitation.

Value descriptors

In value-capable modules, the class-file descriptor language is extended to include so-called Q-types, which directly denote unboxed value types. The descriptor syntax is “QInternalName;”, where InternalName is the internal form of a class name. (The internal form substitutes slashes for dots.) In fact, the class name must be that of the value type derived from a value-capable class.

By comparison, a standard reference type descriptor may be called an L-type. For a value-capable class C, we may speak of both the Q-type and the L-type of C. Note that usage of L-types is not correlated in any way with usage of Q-types. For example, they can appear together in method types, in arbitrary mixtures.

A Q-type descriptor may appear as the type of a class field defined in a value-capable module. But the same descriptor may not appear in a field reference (CONSTANT_Fieldref) for that field (even in a value-capable module), when that reference is used by one of the four getfield family of instructions.

(Method handle factories, described below, will support field loads and updates, in both values and objects. In this proposal, the field instructions themselves are unchanged.)

A Q-type descriptor may appear as an array element type in a class of a value-capable module. (Again, this is only in a value-capable module, and probably in a specific experimental class-file version. Let’s stop repeating this, since the limitation has already been set down as a blanket statement.) There are no bytecodes for creating, reading, or writing such arrays, but the prototype makes method handles available for these functions.

A field or array of a Q-type is initialized to the default value of that value type, rather than null. This default value is defined (at least for now) as a value all of whose fields are themselves of default value. Such a default may be obtained from a suitable method handle, such as the MethodHandles.empty combinator.

(In other words, default values are built up by combining the existing type-specific default values of null, false, \0, 0, and 0.0. All Java heap variables are initialized to these zero data, including values. User-defined defaults are unlikely, or at least in the future.)

A Q-type descriptor may appear as the parameter or return type of a method defined in a class file. As described below, the verifier enforces the corresponding stacked value for such a parameter or return value to match the Q-type (not the corresponding L-type or any other type).

Any method reference (a constant tagged CONSTANT_Methodref or CONSTANT_InterfaceMethodref) may mention Q-types in its descriptor. After resolution of such a constant, the definition of such a method may not be native, and must use new bytecodes to work directly with the Q-typed values.

Likewise, a CONSTANT_Fieldref constant may mention a Q-type in its descriptor.

Note that the Java language does not provide any direct way to mention Q-types in class files. However, bytecode generators may mention such types and work with them. It is also likely that work in the Valhalla project will create experimental language features to allow source code to work with Q-types.

Enhanced constants

Since our value types will have names and members like reference types, but are distinct from all reference types, it is necessary to extend some constant pool structures to interoperate with Q-types.

Naturally, as a result of extending descriptor syntax, method and field descriptors can mention Q-types. Doing this requires no additional format changes in the constant pool.

However, some occurrences of types in the constant pool mention “raw” class names, without the normal descriptor envelope characters (L before and ; after). Specifically, a CONSTANT_Class constant refers to such a raw class name, and is defined to produce (at present) an L-type with no provision for requesting the corresponding Q-type. What is a class-file to do if it needs to mention the Q-type?

There is a simple answer: Pick a character which is illegal as a prefix to class names, and use it as an escape prefix within the UTF8 string argument to a CONSTANT_Class constant. If the escape prefix is present, the rest of the UTF8 string is a descriptor, not a class name.

In order to preserve normalization of names, UTF8 strings for CONSTANT_Class constants may not begin with “;L” or “;[”.

(To avoid confusion between current forms of class names and these additional forms, there will only be one way to express any particular type as a CONSTANT_Class string. Therefore, the descriptor itself may not begin with L or [, since type names that begin with those descriptors are already expressible, today, as “raw” class names in a CONSTANT_Class constant. Otherwise, Class[";[Lfoo;"] and Class["[Lfoo;"] would mean the same thing, which is surely confusing.)

This minimal prototype adopts this answer, using semicolon ; (ASCII decimal code 59) as the escape character. Thus, the types int.class and void.class may now be obtained class-file constants with the UTF8 strings “;I” and “;V”. The choice of semicolon is natural here, since a class name cannot contain a semicolon (unless it is an array type), and descriptor syntax is often found following semicolons in class files.

(Alternatively, we could repurpose CONSTANT_MethodType to resolve to a Class value, since it already takes a descriptor argument, albeit a method descriptor. But this seems more disruptive than extending CONSTANT_Class.)

The L-type and Q-type for the example Int128 can now be expressed as twin CONSTANT_Class constants, with UTF8 strings like “pkg/Int128” and “;Qpkg/Int128;” (where pkg is something like jdk/experimental/value).

When used with the ldc or ldc_w bytecodes, or as a bootstrap method static argument, a CONSTANT_Class beginning with an escaped descriptor resolves to the Class object for the given type (which, apart from a Q-type, must be a primitive type or void). The resolution process is similar to that applied to the descriptor components of a CONSTANT_MethodType constant.

When used as the class component of a CONSTANT_Methodref or CONSTANT_Fieldref constant, a CONSTANT_Class for a Q-type implies that the receiver will be a Q-type instead of the normal L-type. Eventually there may be bytecodes which use such member references directly. (These may be some vinvoke, vgetfield, or just an overloading on invokespecial and getfield) For now, as noted below, such member references are limited to the specification of CONSTANT_MethodHandle constants.

Resolution of method constants

When resolving a CONSTANT_Methodref against a Q-type, none of the methods of java.lang.Object may appear; the JVM or method handle runtime may require special filtering logic to enforce this. In other words, Q-types do not inherit from Object. Instead, they will either define their own methods which replace the Object methods, similar to the rules for value-based classes, or avoid Object methods altogether.

As an exception, the Object.getClass method may be permitted, but it must return the Class constant corresponding to the loaded class-file.

The theory here is that getClass reports the class-mirror for the file loaded to define the object’s type. This theory could change.

JVM changes to support Q-types

Q-types, like other type descriptor types, can be mentioned in many places. The basic list is:

The JVM might use invisible boxing of Q-types to simplify the prototyping of many execution paths. This of course works against a key value proposition of values, the flattening of data in the heap. In fact, the minimal model requires special processing of Q-types in array elements and object (or value) fields, at least enough special processing to initialize such fields to the default value of the Q-type, which is not (and cannot be) the default null of an L-type.

So when the class loader loads an object whose fields are Q-types, it must resolve (and perhaps load) the classes of those Q-types, and inquire enough information about the Q-type definition to lay out the new class which contains the Q-type field. This information includes at least the size of the type, and may eventually include alignment and a map of managed references contained in the Q-type.

(This proposal may support so-called “flattened arrays”, whose elements are value structures laid out end-to-end. A minimized form of this proposal may omit support for some or all types of flattened, value-bearing arrays. For example, even though the fields of value types may contain a mix of primitives, references, and/or sub-values, arrays containing such mixed values may be more difficult to implement than arrays containing values with only primitive fields; such arrays may be omitted from early implementations. API points which create such arrays are allowed, temporarily, to throw errors instead of returning. Caveat emptor.)

Flattened arrays, if supported, must be created with a component type which is a Q-type. They will differ from arrays of corresponding L-types just as Integer[].class differs from int[].class. Likewise, the super-type of a value-bearing array will (like a int[]) be Object only, and not a different array type. Such arrays will not convert any other array type, and must be manipulated by explicitly obtained method handles.

Value bytecodes

The following new bytecode instructions are added:

Values are stored in single locals, not groups of locals as with long and double (which consume pairs of locals).

The format of these instructions is TBD. Some of them must include an operand field which describes the type of value being manipulated. The field manipulation instructions require a CONSTANT_Fieldref. Certainly vbox, vunbox, and vdefault require an explicit type operand field.

The JVM may use Q-type resolution to acquire information about the Q-type’s size and alignment requirements, so as to properly “pack” it into the interpreter stack frame. Or the JVM may simply use boxed or buffered representations (the corresponding value-capable L-types, or some internal heap or stack type) and ignore sizing information.

It seems likely that we can omit the type operands for the data movement instructions. If we can observe that the JVM interpreter must use an internally uniform “carrier type” for all value types on the stack, we can simply require that this carrier type be self-describing, and then there is no need to reaffirm the exact value type in the data movement instructions.

In the minimal prototype, the receiver of an invokevirtual or invokeinterface instruction may not be a Q-type, even though the constant pool structure can express this (by referring to a Q-type as the class component of a CONSTANT_Methodref). Method handles and invokedynamic will always allow bytecode to invoke methods on Q-types, and this is sufficient for a start. Such a method handle may in fact internally box up the Q-type and run the corresponding L-type method, but this is a tactic that can be improved and optimized in Java support libraries, without pervasive cuts to the interpreter.

Verifier interactions

When setting up the entry state for a method, if a Q-type appears in the method’s argument descriptors, the verifier notes that the Q-type (not the L-type!) is present in the corresponding local at entry.

When returning from a method, if the method return type is a Q-type, the same Q-type must be present at the top of the stack.

When performing an invocation (in any mode), the stack must contain matching Q-types at the positions corresponding to any Q-types in the argument descriptors of the method reference. After the invocation, if the return type descriptor was a Q-type, the stack will be found to contain that Q-type at the top.

As with the primitive types int and float, a Q-type will not convert to any other verification type than itself, or the verification super-types oneWord or top. This affects matching of values at method calls, and also at control flow merge points. Q-types do not convert to L-types, not even their boxes or the supertypes (Object, interfaces) of their L-types.

Besides vload, vstore, vreturn, and the invoke family, the only bytecodes guaranteed to produce or consume Q-type operands are pop, pop2,swap, and thedup` family. More bytecodes may be added over time. The verifier enforces proper handling of Q-types.

The vaload and vastore instructions work just like the pre-existing array instructions. Given a uniform carrier type, there is no need for them to reaffirm the Q-type they operate on. That type can always be extracted from the array itself.

The vgetfield instruction has access control similar to the existing getfield instruction. If a field is public in some value type, any class can read that field from a value of that type.

But the vwithfield instruction has tight access control regardless of the field’s access. Only a class with private access to the value type is allowed to perform field replacement. This restriction is analogous to that on putfield for a final field, which is only allowed in the class defining the field, and in fact in constructors of that class. Because the VCC and DVT are two sides of the same logical type, the JVM must allow the VCC to perform vwithfield operations on its DVT. This will be done reflectively, using method handles, unless the VCC somehow is gifted with compiled-in vwithfield instructions.

The reflective Lookup API will allow VCCs and DVTs to share access to each others private members and capabilities, just as they are shared between nestmates today. This includes permission to use the vwithfield instruction. (A future revision of the JVM may support explicit VM-level “nestmates” which have access to each others’ private fields and methods. In that revision, the vwithfield instruction would be available to all nestmates of a given value type.)

The vdefault instruction would seem to be very “private” to a value type, since it allows constructor-free creation of a value. But the JVM gives default values a very peculiar status, since any array of a given type is always pre-packed with that type’s default values. Therefore, there is actually nothing “private” about vdefault. Any class can compute the default value of any Q-type at any time.

Q-types and bytecodes

Bytecodes which interact with Q-types are only these:

Many existing bytecodes take operands which are constant pool references, any of which might directly or indirectly refer to a Q-type. Unless specified otherwise, these bytecodes will reject occurrences of Q-types. They include:

In a fuller implementation of value types, some of these (but not all) are candidates for interoperation with Q-types.

It may seem odd to allow Q-types to move through method calls (via vload, etc.) and not through other instructions, such as getstatic. Minimizing the instruction requirements may get us to a prototype faster. Supporting Q-types as method arguments and returns is also complex, but once that bit is done, method handles can take over all other aspects of execution, including field access. In addition, the present minimal design has the virtue of not requiring the interpreter to ever deal with the flattened form of a value; it can use opaquely boxed pointers for function calling and never even care about the size or format of the values it is working with. But these restrictions are likely to go away quickly; see the “Future Work” section below.

Value type reflection

The public class jdk.experimental.value.ValueType (in an internal module) will contain all methods of the runtime support for values in this initial prototype.

ValueType will contain the following public methods for reflecting Q-types:

public class ValueType<T> {
  static boolean classHasValueType(Class<T> x);
  static ValueType<T> forClass(Class<T> x);
  Class<T> valueClass();
  Class<T> boxClass();
  Class<T> sourceClass();
  ...
}

The predicate classHasValueType is true if the argument represents either a Q-type or (the L-type of) a value-capable class. The factory forClass returns the descriptor of the Q-type for any type derived from a value-capable class. (If given any other type, it throws IllegalArgumentException; users might want to test with classHasValueType first to avoid the exception.)

The two accessors valueClass and boxClass return distinct java.lang.Class objects for the Q-type and the original (value-capable) L-type, respectively.

The third accessor sourceClass returns the class corresponding to the loaded class which created the value class. In the initial proposal, this is the same as the boxClass, but future forms of value type support will certainly generate boxes differently. Deriving the L-type from the originally loaded value-capable class is a temporary expedient, not a permanent feature.

It is the case that ValueType.forClass(vt.sourceClass()) is the same as vt, if it returns normally, and likewise for valueClass and boxClass. Thus, any Class aspect of a value type can be used to obtain its ValueType descriptor.

(Note that the original value-capable class does not have special status with respect to this API; from the point of view of someone working with value types, it is merely the box class for the value. Eventually, value types will be directly defined by class files, and the box type will be derived indirectly. In that case, boxClass may be some other synthetic entity, and sourceClass and valueClass will be the same entity.)

The legacy lookup method Class.forName will continue to return the sourceClass, for reasons of compatibility. This condition may or may not persist. (In the future, the source language construct T.class is likely to produce something more natural to the source code type assigned to T, under the slogan “works like an int”.)

The class value returned from valueClass is distinct from (unequal to) the class returned from boxClass, or perhaps originally passed to forClass (e.g., from code which has no other access to Q-types). This class value directly reflects the Q-type just as a pseudo-class like int.class or void.class directly reflects a primitive type (or even void).

Users should not rely on the Class mirror for a Q-type to be either a pseudo-class (like int.class) or a regular loaded class. When direct loading of value types is supported, Class mirrors for Q-types are likely to be identical with the sourceClass of the type, and to behave much as Class mirrors do on loaded class files today.

(Note: The use of pseudo-classes has precedent, with the primitive pseudo-classes like int.class. But it is not yet clear whether pseudo-classes for Q-types will be a permanent part of the design. For now, they are necessary to enable use of existing reflection mechanisms, such as MethodType objects to encode Q-types for the lookup of method handles.)

The reflective API for java.lang.Class and the sub-package java.lang.reflect will produce unspecified results on both the Q-types and L-types derived from a value-capable class, as a result of whatever expedient tactics are used to implement this initial minimal design. If the class-splitting transform is used at load time, it is likely that all the fields will be reflected on the Q-class, and everything else will stay on the L-class. But this is not a stable contract. Clarifying the behavior of these API points is future work.

To get stable results without specifying the behavior of legacy APIs, ValueType provides a number of temporary replacement API points:

public class ValueType<T> {
  ...
  // front-ends for Class reflection API:
  String getName();
  String getSimpleName();
  Class[] getInterfaces();
  Class[] getClasses(boolean declared, Lookup lookup);
  java.lang.reflect.Constructor[]
       getConstructors(boolean declared, Lookup lookup);
  java.lang.reflect.Method[]
       getMethods(boolean declared, Lookup lookup);
  java.lang.reflect.Field[]
       getFields(boolean declared, Lookup lookup);
}

These select reflective data from the source class, value class, or some other source as appropriate. These API points are self-explanatory as replacements for corresponding legacy API points in the Class mirrors for the Q-types and L-types. The Lookup argument may gate an appropriate security manager check when the declared boolean is true; if it is false they are ignored and may be null.

A member returned from getMethods or a similar API point is likely to have a getDeclaringClass which is one of the Class aspects of the ValueType, but even this is not guaranteed. Use with care. In general, these APIs should be used to determine what are valid inputs to the API of java.lang.invoke.MethodHandles.Lookup, and method handles derived from that API to perform actual work.

The core reflection APIs in java.lang.Class and java.lang.reflect for creating and manipulating object instances may be only partially implemented. In particular, core reflection operations to create new instances will probably fail or produce unpredictable results, as may attempts to work with Q-values (which will not necessarily be boxed under the L-type). The method handles provide more precise control over boxing, and will be correct before the core operations.

Classes for Q-types may appear in reflective APIs wherever primitive pseudo-types (like int.class) can appear. These APIs include both core reflection (Class and the types in java.lang.reflect) and also the newer APIs in java.lang.invoke, such as MethodType and MethodHandles.Lookup. Constant pool constants that work with these types can refer to Q-types as well as L-types, and the distinctions are surfaced, reflectively, as suitable choices of Class objects (either box or value).

It is undefined (in this proposal) how or whether legacy wrapper types (java.lang.Integer) or primitive pseudo-types (int.class) interact with the methods of ValueType.

(When pseudo-classes need to be distinguished from normal java.lang.Class objects, we can use the shorthand term “crass”, where the “r” sound suggests that the thing exists only to reify a distinction necessary at runtime. The main class is the thing returned by Class.forName, and which represents a class file in 1-1 correspondence; a “crass” is anything else typed as java.lang.Class. A more principled approach to reflection [cimadamore-refman] uses “type mirrors” of a suitably refined interface type hierarchy.)

You can use the method handle APIs to create and manipulate arrays, load and store fields, invoke methods, and obtain method handles.

Method handle transforms which change types (such as asType) will support value-type boxing and unboxing just as they can express primitive boxing and unboxing. Thus, the following code creates a method handle which will box a DoubleComplex value into an object:

Class<DoubleComplex> srcType = DoubleComplex.class;
Class<DoubleComplex> qt = ValueType.forClass(srcType).valueClass();
MethodHandle mh = identity(qt).asType(methodType(Object.class, qt));

Of course, the type-converting method MethodHandle.invoke will allow users to work with method handles over Q-types, either in terms of box types as supported by the current Java language, or (in suitable bytecodes) more directly in terms of Q-types.

Boxed values

Boxing is useful in order to gain interoperability between Q-types and APIs which use Object references to perform generic services across all types. Many tools (such as debuggers, loggers, and println methods) assume a standard Object format for reporting arbitrary data. The value-capable L-type of a Q-type (or, more generally, whatever boxing mechanism ends up as the container for Q-types) serves a useful role in the initial system, and (it seems probable) even in the final system.

As noted before, instances of a value-capable class (which is an L-type) serve, at first, as boxes for values of the corresponding Q-type. The method handle APIs allow box/unbox conversion operators to be surfaced as method handles or applied implicitly for argument conversions.

The value-capable L-type also allows convenient specification of some kinds of Q-type behaviors, such as toString, by writing them directly as standard Java methods on the L-type. The method handle lookup runtime will make up the difference between an unboxed receiver (this) and boxed receiver, at negligible cost (since the HotSpot JVM has a sufficient range of scalarization optimizations to remove the extra boxing step).

However, we anticipate that the tightest loops will be constructed to have unboxed data flowing along all hot paths. This means that boxing is most useful, at this early point, at class-load time and for peripheral operations like println.

Since the value-capable class is value-based, it is inappropriate to synchronize on boxes, make distinctions on them by means of reference equality comparisons, attempt to mutate their fields, or attempt to treat a null reference as a point in the domain of the boxed type.

(These restrictions are likely to carry forward even if boxes have a different form in the future.)

A future JVM may assist in detecting (or even suppressing) some of these errors, and it may provide additional optimizations in the presence of such boxes (which do not require a full escape analysis).

However, such assistance or optimization appears to be unnecessary in this minimal version of the design. Code which works with Q-types will, by its very nature, be immune to such bugs, since Q-types are non-synchronizable, non-mutable, non-nullable, and identity-agnostic.

Value operator factories

Given the ability to invoke method handles that work with Q-types, all other semantic features of value types can (temporarily) be accessed solely through method handles. These include:

The MethodHandles.Lookup and MethodHandles APIs will work on Q-types (represented as Class objects), and surface methods which can perform nearly all of these functions.

It is sometimes helpful to think of these operators as the missing bytecodes. They can be used with the invokedynamic instruction to emulate bytecodes that experimental translators may need. Eventually, some form of many such bytecodes will “selectively sediment” down in to the JVM’s repertoire of bytecodes, and usage of invokedynamic for these purposes will decrease.

Pre-existing method handle API points will be adjusted as follows:

(Yes, a value type method is obtained with findVirtual, despite the fact that virtuality is not present on a final class. The poorer alternatives are to co-opt findSpecial, or make a new API point findDirect to carry the nice, fine distinction. Since Java is already comfortable with the notion of “final virtual” methods, we will continue with what we have.)

Method handle lookup on L-types is likely to give parallel results, although (depending on user model experiments) some methods on L-types may be suppressed, on the grounds that box types should be relatively “featureless” compared to their value types. Of course, a virtual method of an L-type, when embodied as a method handle, will take a receiver parameter which is an L-type, not a Q-type.

On the other hand, as noted above, we may well build ad hoc conventions for binding private static L-type methods, via method handles, as if they were constructors or non-static methods on the Q-type. This would be a side contract between a translation strategy and the method handle runtime, of which the JVM would be unaware. The static methods, visible only to the method handle runtime, would contain snippets of logic intended only for use by the Q-type. Eventually of course we will move all that sort of thing into bytecodes and the JVM’s hardwired resolution logic.

As value-based classes, value-capable classes are required to override all relevant methods from Object. The derived Q-types do not inherit or respond to the standard methods of Object; they only respond to the methods of Object (such as toString) for which they implement matching signatures.

The following additional functions do not (as yet) fit in the MethodHandle API, and so are placed in the runtime support class jdk.experimental.value.ValueType.

ValueType will contain the following methods:

public class ValueType<T> {
  ...
  MethodHandle defaultValueConstant();
  MethodHandle substitutabilityTest();
  MethodHandle substitutabilityHashCode();
  MethodHandle findWither(Lookup lookup, Class<?> refc,
                          String name, Class<?> type);
}

The defaultValueConstant method returns a method handle which takes no arguments and returns a default value of that Q-type. It is equivalent (but is probably be more efficient than) creating a one-element array of that value type and loading the result. This method may be useful implementing MethodHandles.empty and similar combinators.

The substitutabilityTest method returns a method handle which compares two operands of the given Q-type for substitutability. Specifically, fields are compared pairwise for substitutability, and the result is the logical conjunction of all the comparisons. Primitives and references are substitutable if and only if they compare equal using the appropriate version of the Java == operator, except that floats and doubles are first converted to their “raw bits” before comparison.

Likewise, the substitutabilityHashCode method returns a method handle which accepts a single operand of the given Q-type, and produces a hash code which is guaranteed to be equal for two values of that type if they are substitutable for each other, and is likely to be different otherwise.

(It is an open question whether to expand the size of this hash code to 64 bits. It will probably be defined, for starters, as a 32-bit composition of the hash codes of the value type fields, using legacy hash code values. The composition of sub-codes will probably use, at first, a base-31 polynomial, even though that composition technique is deeply suboptimal.)

The findWither method works analogously to Lookup.findSetter, except that the resulting method handle always creates a new value, a full copy of the old value, except that the specified field is changed to contain the new value. Since values have no identity, this is the only logically possible way to update a field value.

In order to restrict the use of wither primitives, the refc parameter and the lookup-class will be checked against the Q-type of the ValueType itself; if they are not all the same Q-type, the access will fail. The access restriction may be broadened later. A value-type may of course define named wither methods that encapsulate primitive wither actions. Eventually, a withfield bytecode might be created to express field update directly, in which case the same issues of access restriction must be addressed.

(The name wither method does not mean a way to blight or shrivel something–certainly a shady activity. It refers to a naming convention for methods that perform functional update of record values. Asking a complex number c.withRe(0) would return a new pure-imaginary complex number. By contrast, c.setRe(0), a call to a setter method, would seem to mutate the complex number, removing any non-zero real component. Setter methods are appropriate to mutable objects, while wither methods are appropriate to values. Note that a method can in fact be a getter, setter, or wither method even if it does not begin with one of those standard words. The eventual conventions for value types may well discourage forms like withRe(0) in favor of simply re(0).)

It is likely that these methods in ValueType will eventually become virtual methods of Lookup itself (if that is the leading argument), else static methods of MethodHandles.

These methods are also candidates for direct expression as bytecodes, just as many existing method handles directly express [bcbehavior] equivalent bytecode operations.

Here is a table that summarizes the new method handles and the hypothetical bytecode behaviors for their operations upon value types. The third column gives both the stack effect of a hypothetical bytecode, and the type of the actual method handle. In this table, almost any type can be a Q-type, a fact we emphasize with “Q” prefixes. The “QC” type, in particular, stands for the value type being operated upon. The composite VT<QC> stands for the ValueType instance derived from the Q-type.

The type “RC”, mentioned at the bottom of the list for field accessors of Q-type fields in regular objects, is any normal L-type. Note that Q-types (“works like an int!”) can be read and written whole from fields and array elements.

Q-type method handles & behaviors
lookup expression possible bytecode stack effect / MH type

VT<QC>.defaultValueConstant()

“vdefault” QC

()QC

VT<QC>.substitutabilityTest()

?

(QC QC)boolean

VT<QC>.substitutabilityHashCode()

?

(QC)int/long

L.findGetter(QC, f, QT)

“vgetfield” QC.f:QT

(QC)QT

VT<QC>.findWither(L, QC, f, QT)

“vwithfield” QC.f:QT

(QC, QT)QC

L.findVirtual(QC, m, (QA*)QT)

“vinvoke” QC.m(QA*)QT

(QC QA*)QT

L.findGetter(RC, f, QT)

“getfield” RC.f:QT

(RC)QT

L.findSetter(RC, f, QT)

“putfield” RC.f:QT

(RC, QT)void

This table does not cover many method handles which merely copy around Q-typed values, or load or store them from normal objects or arrays. Such operations can appear in many places, including findStaticGetter, findStaticSetter, findVirtual, findStatic, findStaticSetter, arrayElementSetter, identity, constant, etc., etc.

Future work

This minimal proposal is by nature temporary and provisional. It gives a necessary foundation for further work, rather than a final specification. Some of the further work will be similarly provisional in nature, but over time we will build on our successes and learn from our mistakes, eventually creating a well-designed specification that can takes its place in the sun.

This present set of features that support value types will be difficult to work with; this is intentional. The rest of this document sketches a few additional features which may enable experiments not practical or possible in the minimized proposal.

Therefore, this last section may be safely skipped. Any such features will be given their own supporting documentation if they are pursued. It may be of interest, however, to people who have noticed missing features in the minimal values proposal.

Denoting Q-types in Java source code

At a minimum, no language changes are needed to work with Q-types. A combination of JVM hacks (value-capable classes), annotation-driven classfile transformations, and direct bytecode generation are enough to exercise interesting micro-benchmarks. Method handles supply a useful alternative to direct bytecode generation, and they will be made fully capable of working with Q-types (as described below).

Nevertheless, there is nothing like language support. It is likely that very early experiments with javac will create simple ways to refer to Q-types and create variables for them, directly in Java code (subject to contextual restrictions, of course).

In particular, constructors for objects have a very different bytecode shape than seemingly-equivalent constructors for value types. (The syntax for Java object constructors is a perfectly fine notation for value type constructors, as long as all fields are final.) It would be reasonable for javac to take on the burden of byte-compiling both versions of each constructor of a value-capable class.

Likewise, direct invocation of value type constructors, and direct access of value type methods and fields, would be convenient to use from Java source code, even if they had to be compiled to invokedynamic calls, until bytecode support was completed.

More constants

Additional enhancements to the constant pool may allow creation of constants derived from bootstrap methods. Such features are not in the scope of present document. They are described in the OpenJDK RFE JDK-8161256. This RFE mentions the present enhancement of CONSTANT_Class.

If this RFE is implemented, it may be possible to delay a few of the steps described in this section, such as using Q-types as receiver types for CONSTANT_MethodHandles. The key requirement, in any case, is that invokedynamic instructions be able to refer to a full range of operations on Q-types, since the invokedyanmic instructions are standing in as temporarily place-holders for bytecodes we are not yet implementing.

Independently of user-bootstrapped constants, Q-types in the constant pool might be carried, most gracefully, by variations on the CONSTANT_Class constant. Right now, we choose to mangle type descriptors in CONSTANT_Class constants as an easy-to-implement place-holder, but the final design could introduce new constant pool types to carry the required distinctions.

For example, CONSTANT_Class could be kept as-is, and re-labeled CONSTANT_ReferenceType. Then, a new CONSTANT_Type constant could support arbitrary descriptors. (Perhaps it would have other substructure required by reified generic parameters, but that’s probably yet another kind of constant.) Or, a CONSTANT_ValueType tag could be introduced for symmetry with CONSTANT_ReferenceType, and some other way could be found for mentioning primitive pseudo-classes. (They are useful as parameters to BSMs.)

Q-replacement within value-capable classes

A value-capable class, compiled from Java source, may have additional annotations (or perhaps attributes) on selected fields and methods which cause the introduction of Q-types, as a bytecode-level transformation when the value-capable class’s file is loaded or compiled.

Two transformations which seem useful may be called Q-replacement and Q-overloading. The first deletes L-types and replaces them by Q-types, while the second simply copies methods, replacing some or all of the L-types in their descriptors by corresponding Q-types. This set of ideas is tracked as JDK-8164889.

An alternative to annotation-driven Q-replacment would be an experimental language feature allowing Q-types to be mentioned directly in Java source. Such experiments are likely to happen as part of Project Valhalla, and may happen early enough to make transformation unnecessary.

More bytecodes

The library method handle defaultValueConstant could be replaced by a new vdefault bytecode, or by a prefixed aconst_null bytecode.

The library method handle substitutabilityTest could be replaced by a new vcmp bytecode, or by a prefixed if_acmpeq bytecode.

The library method handle findWither could be replaced by a new vwithfield bytecode.

The library method handle findGetter could be replaced by a suitably enhanced getfield bytecode.

The library method handle arrayConstructor could be replaced by a suitably enhanced anewarray or multianewarray bytecode.

The library method handle arrayElementGetter could be replaced by a new vaload bytecode, or a prefixed aaload bytecode.

The library method handle arrayElementSetter could be replaced by a new vastore bytecode, or a prefixed aastore bytecode.

The library method handle arrayLength could be replaced by a suitably enhanced arraylength bytecode.

It is possible that the eventual “sweet spot” for this design is based on a single set of universal bytecodes (or prefixed macro-bytecodes) that work symmetrically across references, values, and primitives, by allowing them to mention any type descriptor, not just Q-types. Such “universal bytecodes” deserve their own naming convention, as uload, ustore, uinvoke, ucmp, u2u, etc. When a bytecode works only on values, we use the v* naming convention.

More data structures

The minimal proposal may omit various corner cases implied by a free cross-application of all of features of value types. Filling in these corner cases will be useful. In addition, limitations on value types (and on primitives!) should be removed over time. The functional increases may include, at various stages:

If value types are fully integrated with interfaces, a Q-type must inherit default methods from its interface supertypes. This is a key form of interoperability between values and generic algorithms and data structures (like sorting and TreeMap). Making this work in the minimal version requires boxing the value and running the default method on the box, which may have unpleasant performance implications. In a full implementation, the execution of default methods should be optimized to each specific value type. Also, there should a framework for ensuring that the interface methods themselves are appropriate to value-based types (no nulls or synchronization, limited ==, etc.). This requires further work beyond the scope of the minimal proposal.

It seems difficult to make a value-bearing array type QT[] be a subtype of an interface-array type I[], even if the interface I is a supertype of the value type QT. Further work on JVM type structure would be needed to make this happen. Interface types I are firmly in the L-type camp, at present, and interface arrays are arrays of references, hence subtypes of Object[], and therefore would appear to be arrays of references. But an array of value types QT cannot be (transparently) treated as an array of references.

Bridge-o-matic

In some cases, supplying Q-replaced API points in classes is just a matter of providing suitable bridge methods. Bytecode transformers or generators can avoid the need to specify the bodies of such bridge methods if the bridges are (instead of bytecodes) endowed with suitably organized bootstrap methods. This set of ideas has many additional uses, including auto-generation of standard equals, hashCode, and toString methods. It is tracked as JDK-8164891.

Natural (JVM-native) value-type classfiles

The minimal proposal starts with backwards box-first loading order, loading POJO types and then deriving values and box types from those. The box types (initially) are simply the POJO types and the value types are structs internally derived by extracting the fields (only) from the POJO type. Even methods intended only for the value type must be delivered, at first, as POJO methods.

This must change over time to a value-first load format, with value types directly specified by modified class files, and boxes derived during the load of the value type.

As the POJO-based load format becomes obsolete, the boxes themselves may continue to be POJO-shaped versions of the value types, or they may take a different form.

This change will be relatively expensive, because it will require that all users of value types upgrade their tooling (compiler, debugger, IDE, etc.) to support non-standard features exposed by the value-first design. As noted above, some of the awkwardness of the minimal proposal is caused by its care to avoid features that would require (or would only be useful with) changes to the tool-chain. These features include:

Heisenboxes

As suggested above, L-types for values are value-based, and some version of the JVM may attempt to enforce this in various ways, such as the following:

A box whose identity status is uncertain from observation to observation is called a “heisenbox”. To pursue the analogy, a reference equality (==, acmp) observation of true for two heisenboxes “collapses” them into the same object, since they are then proven fully inter-substitutable, hence their Q-values are equivalent also. Two copies of the reference can later decohere, reporting inequality, despite the continued inter-substitutability of the boxed values. The equality predicate could be investigated by wiring it to a box containing Schrödinger’s cat, with many puzzling and sad results…

This set of ideas is tracked as JDK-8163133.

References

[values]: http://cr.openjdk.java.net/~jrose/values/values.html
[valhalla-dev]: http://mail.openjdk.java.net/pipermail/valhalla-dev/
[goetz-jvmls15]: http://www.oracle.com/technetwork/java/jvmls2015-goetz-2637900.pdf
[valsem-0411]: http://mail.openjdk.java.net/pipermail/valhalla-spec-experts/2016-April/000118.html
[simms-vbcs]: http://mail.openjdk.java.net/pipermail/valhalla-dev/2016-June/001981.html
[graves-jvmls16]: http://youtu.be/Z2XgO1H6xPM?list=PLX8CzqL3ArzUY6rQAQTwI_jKvqJxrRrP_
[value-based]: http://docs.oracle.com/javase/8/docs/api/java/lang/doc-files/ValueBased.html
[JEP-11]: http://openjdk.java.net/jeps/11
[goetz-jvmls16]: http://www.youtube.com/watch?v=Tc9vs_HFHVo
[Long2.java]: http://hg.openjdk.java.net/panama/panama/jdk/file/70b3ceb485cf/src/java.base/share/classes/java/lang/Long2.java
[cimadamore-refman]: http://cr.openjdk.java.net/~mcimadamore/reflection-manifesto.html
[bcbehavior]: http://docs.oracle.com/javase/8/docs/api/java/lang/invoke/MethodHandles.Lookup.html#equiv
[JDK-8164891]: http://bugs.openjdk.java.net/browse/JDK-8164891
[JDK-8161256]: http://bugs.openjdk.java.net/browse/JDK-8161256
[JDK-8164889]: http://bugs.openjdk.java.net/browse/JDK-8164889
[JDK-8163133]: http://bugs.openjdk.java.net/browse/JDK-8163133