Value type companions, encapsulated

John Rose for Valhalla EG, July 2022 (ver 0.2)

Background

(We will start with background information. The new stuff comes afterward. Impatient readers can find a very quick summary of restrictions at the end.)

Affordances of C.ref

Every class or interface C comes with a companion type, the reference type C.ref derived from C which describes any expression (variable, return value, array element, etc.) whose values are either null or are instances of a concrete class derived from C.

We are not in the habit of distinguishing C.ref from C, but the distinction is there. For example, if we call Object::getClass on a variable of type C.ref we might not get C.class; we might even get a null pointer exception! Put another way, C as a class means a particular class declaration, while C.ref as a type means a variable which can refer to instances of class C or any subclass. Also C.ref can be null, which is of no class at all. One can view the result of Object::getClass as a type rather than a mere class, since the API of Class includes representation of types like int and C.val as well as classes. In any case, the fact that a class can now have two associated types requires a clearer distinction between classes and types.

We are so very used to working with reference types (for short, ref-types) that we sometimes forget all that they do for us in addition to their linkage to specific classes:

When I store a bunch of C objects into an object array or list, sort it, and then share it with another thread, I am using several of the above properties; if the other thread down-casts the items to C.ref and works on them it relies on those properties.

If I implement C as a doubly-linked list data structure or a (alternatively) a value-based class with tree structure, I am using yet more of the above properties of references.

If my C object has a lot of state and I pass out many pointers to it, and perhaps compute and cache interesting values in its mutable fields, I am again relying on the special properties of references, as well as of identity classes (if fields are mutable).

By the way, in the JVM, variables of type C.ref (some of them at least) are associated not with C simple, but with the so-called L-descriptor spelled LC;. When we talk about C.ref we are usually talking about those L-descriptors in the JVM, as well.

I don’t need to think much about this portfolio of properties as I go about my work. But if they were to somehow fail, I would notice bugs in my code sooner or later.

One of the big consequences of this overall design is that I can write a class C which has full control over its instance states. If it is mutable, I can make its fields private and ensure that mutations occur only under appropriate locking conditions. Or if I declare it as a value-based class, I can ensure that its constructor only allows legitimate instances to be constructed. Under those conditions, I know that every single instance of my class will have been examined and accepted by the class constructor, and/or whatever factory and mutator methods I have created for it. If I did my job right, not even a race condition can create an invalid state in one of my objects.

Any instance state of C which has been reached without being produced from a constructor, factory, mutator, or constant of C can be called non-constructed. Of course, inside a class any state whatever can be constructed, subject to the types of fields and so on. But the author of the class gets to decide which states are legitimate, and the decisions are enforced by access control at the boundaries of the encapsulation.

The author of an encapsulation determines whether the constant C.default is part of the public API or not. Therefore, the value of C.default is non-constructed only if C.val is privatized.

So if I code my class right, using access control to keep bad states away from my clients, my class’s external API will have no non-constructed states.

Reflection and serialization provide additional modes of access to a class’s API. The author of an encapsulation must be given control over these modes of access as well. (This is discussed further below.) If the author of C allows deserialization of C values not otherwise constructible via the public API, those values must be regarded as constructed, not non-constructed, but the API may also be regarded as poorly designed.

Costs of C.ref

In that case why have value types at all, if references are so powerful? The answer is that reference-based abstraction pays for its benefits with particular costs, costs that Java programmers do not always wish to pay:

The major alternative to references, as provided by Valhalla, is flat class instances, where instance fields are laid out immediately in their containers, in place of a pointer which points to them stored elsewhere. Neither alternative is always better than the other, which is why Java has both int and Integer types and their arrays, and why Valhalla will offer a corresponding choice for value classes.

Alternative affordances of C.val

Now, instances of a value class can be laid out flat in their containing variables. But they can also be “boxed” in the heap, for classic reference-based access. Therefore, a value class C has not one but two companion types associated it, not only the reference companion C.ref but also the value companion C.val. Only value classes have value companions, naturally. The companion C.val is called a value type (or val-type for short), by contrast with any reference type, whether Object.ref or C.ref.

The two companion types are closely related and perform some of the same jobs:

For these jobs, it usually doesn’t matter which type companion does the work.

Specifically,

Despite the similarities, many properties of a value companion type are subtly different from any reference type:

The overall effect is that a C.val variable has a very specific concrete format, a flattened set of application-defined fields, often without added overhead from object headers and pointer chasing.

The JVM distinguishes C.val by giving it a different descriptor, a so-called Q-descriptor of the form QC;, and it also provides a so-called secondary mirror C.val.class which is similar to the built-in primitive mirrors like int.class.

As the Valhalla performance model notes, flattening may be expected but is not fully guaranteed. A C.val stored in an Object container is likely to be boxed on the heap, for example. But C.val instances created as bytecode temporaries, arguments, and return values are likely to be flattened into machine registers, and C.val fields and array elements (at least below certain size thresholds) are also likely to be flattened into heap words.

As a special feature, C.ref is potentially flattenable if C is a value class. There are additional terms and conditions for flattening C.ref, however. If C is not yet loaded, nothing can be done: Remember that reference types have full abstraction as one of their powers, and this means building data structures that can refer to them even before they are loaded. But a class file can request that the JVM “peek” at a class to see if it is a value class.

This request is conveyed via the Preload attribute defined in recent drafts of JEP 8277163 (Value Objects). If this request is acted on early enough (at the JVM’s discretion), then the JVM can choose to lay out some or all C.ref values as flattened C.val values plus a boolean or other sentinel value which indicates the null state.

If the JVM succeeds in flattening a C.ref variable, the JMM still requires that racing reads to such a variable will always return a consistent, safely published state. The atomicity or non-atomicity of the C.val companion type has no effect on the races possible to a C.ref variable. Thus, flattening a C.ref variable with a non-atomic value type is not simply a matter of adding a null channel field to a struct, if races are possible on that variable. Most machines today provide hardware atomicity only to 128 bits, so racing updates must probably be accomplished within the limits of 64- or 128-bit reads and writes, for a flattened C.ref. It seems likely that the heap buffering enjoyed by today’s value-based classes will also be the technique of choice in the future, at least for larger value classes, when their containers are in the heap. Since JVM stack and locals can never race, adjoining a null state for a C.ref value can be a simple matter of allocating another calling sequence register or stack slot, for an argument or return value.

Pitfalls of C.val

The advantages of value companion types imply some complementary disadvantages. Hopefully they are rarely significant, but they must sometimes be confronted.

The footprint issue shows up most strongly if you have many copies of the same C.val value; each copy will duplicate all the fields, as opposed many copies of the same C.ref reference, which are likely to all point to a single heap location with one copie of all the fields.

Flat value size can also affect methods like Arrays.sort, which perform many assignments of the base type, and must move all fields on each assignment. If a C.val array has many words per element, then the costs of moving those words around may dominate a sort request. For array sorting there are ways to reduce such costs transparently, but it is still a “law of physics” that editing a whole data structure will have costs proportional to the size of the edited portions of the data structure, and C.ref arrays will often be somewhat more compact than C.val arrays. Programmers and library authors will have to use their heads when deciding between the new alternatives given by value classes.

But the last two pitfalls are hardest to deal with, because they both have to do with non-constructed states. These states are the all-zero state with the second-to-last pitfall, and (with the last pitfall) the state obtained by mixing two previous states by means of a pair of racing writes to the same mutable C.val variable in the heap. Unlike reference types, value types can be manipulated to create these non-constructed states even in well-designed classes.

Now, it may be that a public constructor (or factory) might be perfectly able to create a zero state or an arbitrary field combination, no strings attached. In that case, the class author is enforcing few or no invariants on the states of the value class. Many numeric classes, like complex numbers, are like this: Initialization to all-zeroes is no problem, and races between components are acceptable, compared to the costs of excluding races. The worst a race condition can ever do is create a state that is legitimately constructed via the class API. We can say that a class which is this permissive has no non-constructed states at all.

Such a class will sometimes choose to permit races in order to get faster loads and stores from fields and arrays. A similar choice is made by today’s C ABIs: When they define 128-bit complex numbers, they do not mandate 128-bit atomic loads and stores for them, even if the platform supports such stores. This allows compilers a more flexible choice between a pair of 64-bit memory operations (both of which are probably atomic but not mutually coherent) or a single 128-bit memory operation (which may or may not be atomic). The JVM is likely to prefer pairs of 64-bit operations for such accesses, if the class permits non-atomicity. If the class requires atomicity, the JVM is likely to use an extra layer of heap buffering, or else a somewhat slower (but properly atomic) 128-bit load or store on Intel or ARM.

(The reader may recall that early JVMs accepted races on the high and low halves of 64-bit integers as well; this is no longer a widespread issue, but bigger value types like complex raise the same issue again, and we need to provide class authors the same solution, if it fits their class.)

There are also some classes for which there are no good defaults, or for which a good default is definitely not the all-zero bit pattern. Authors of such types will often wish to make that bit pattern inaccessible to their clients and provide some factory or constant that gives the real default. We expect that such types will choose the C.ref companion, and rely on the extra null checks to ensure correct initialization.

Other classes may need to avoid other non-constructed values that may arise from data races, perhaps for reasons of reliability or security. This is a subtle trade-off; very few class authors begin by asking themselves about the consequences of data races on mutable members, and even fewer will ask about races on whole instances of value types, especially given that fields in value types are always immutable. For this reason, we will set safety as the default, so that a class (like complex numbers) which is willing to tolerate data races must declare its tolerance explicitly. Only then will the JVM drop the internal costs of race exclusion.

Whether to tolerate the all-zero bit pattern is a simpler decision. Still, it turns out to be useful to give a common single point of declarative control to handle all non-constructed states, both the default value of C.val and its mysterious data races.

So different encapsulation authors will want to make different choices. We will give them the means to make these choices. And (spoiler alert) we will make the safest choice be the default choice.

Privatization to the rescue

(Here are the important details about the encapsulation of value types. The impatient reader may enjoy the very quick summary of restrictions at the end of this document.)

In order to hide non-constructed states, the value companion C.val may be privatized by the author of the class C. A privatized value companion is effectively withdrawn from clients and kept private to its own class (and to nestmates). Inside the class, the value companion can be used freely, fully under control of the class author.

But untrusted clients are prevented from building uninitialized fields or arrays of type C.val. This prevents such clients from creating (either accidentally or purposefully) non-constructed states of type C.val. How privatization is declared and enforced is discussed in the rest of this document.

(To review, for those who skipped ahead, non-constructed states are those not created under control of the class C by constructors or other accessible API points. A non-constructed state may be either an uninitialized variable of C.val, or the result of a data race on a shared mutable variable of type C.val. The class itself can work internally with such values all day long, but we exclude external access to them by default.)

Atomicity as well

As a second tactic, a value class C may select whether or not the JVM enforces atomicity of all occurrences of its value companion C.val. A non-atomic value companion is subject to data races, and if it is not privatized, external code may misuse C.val variables (in arrays or mutable fields) to create non-constructed states via data races.

A value companion which is atomic is not subject to data races. This will be the default if the the class C does not explicitly request non-atomicity. This gives safety by default and limits non-constructed states to only the all-zero initial value. The techniques to support this are similar to the techniques for implementing non-tearing of variables which are declared volatile; it is as if every variable of an atomic value variable has some (not all) of the costs of volatility.

The JVM is likely to flatten such an atomic value only up to the largest available atomically settable memory unit, usually 128 bits. Values larger than that are likely to be boxed, or perhaps treated with some other expensive transactional technique. Containers that are immutable can still be fully flattened, since they are not subject to data races.

The behavior of an atomic C.val is aligned with that of C.ref. A reference to a value class C never admits data races on C’s fields. The reason for this is simple: A C.ref value is a C.val instance boxed on the heap in a single immutable box-class field of type C.val. (Actually, the JVM may partially or wholly flatten the representation of C.ref if it can get away with it; full flattening is likely for JVM locals and stack values, but any such secret flattening is undetectable by the user.) Since it is final all the way down (to C’s fields) any C.ref value is safely published without any possibility of data races. Therefore, an extra declaration of non-atomicity in C affects only the value companion C.val.

It seems that there are use cases which justify all four combinations of both choices (privatization and declared non-atomicity), although it is natural to try to boil down the size of the matrix.

It is logically possible, but there does not seem to be a need, for allowing a single class C to work with both kinds of arrays, atomic and non-atomic. (In principle, the dynamic typing of Java arrays would support this, as long as each array was configured at its creation.) The effect of this can be simulated by wrapping a non-atomic class C in another wrapper class WC which is atomic. Then C.val[] arrays are non-atomic and WC.val[] arrays are atomic, yet each kind of array can have the same “payload”, a repeated sequence of the fields of C.

Privatization in code

For source code and bytecode, privatization is enforced by performing access checks on names.

Privatization rules in the language

We will stipulate that a value class C always has a value companion type C.val, even if it is never declared or used. And we give the author of C some control over how clients may use the type C.val, in a manner roughly similar to nested member classes like C.M.

Specifically, the declaration of C always selects an access mode for its value companion C.val from one of the following three choices:

If C.val is declared private, then only nestmates of C may access C.val. If it is neither public nor private, only classes in the same runtime package as C may access it. If it is declared public, then any class that can access C may also access C.val.

As an independent choice, the declaration of C may select an atomicity for its value companionC.val` from one of the following two choices:

If there is no explicit access declaration for C.val in the code of C, then C.val is declared private and atomic. That is, we set the default to the safest and most restrictive choice.

In source code, these declarations are applied to explicit occurrences of the type name C.val. The access modification of C.val is also transferred to the implicitly declared name C.default

The syntax looks like this:

class C {
  //only one of the following lines may be specified
  //the first line is the default
  private value companion C.val;  //nestmates only
  value companion C.val;          //package-mates only
  public value companion C.val;   //all may access
  // the non-atomic modifier may be present:
  private non-atomic value companion C.val;
  public non-atomic value companion C.val;
  non-atomic value companion C.val;
}

When a type name C.val or an expression C.default is used by a class X, there are two access checks that occur. First, access from X to the class C is checked according to the usual rules of Java. If access to C is permitted, a second check is done if the companion is not declared public. If the companion is declared private, then X and C must be nestmates, or else access will fail. If the companion is neither public nor private, then X and C must be in the same package, or else access will fail.

Example privatized value companion

Here is an example of a class which refuses to construct its default value, and which prevents clients from seeing that state:

class C {
  int neverzero;
  public C(int x) {
    if (x == 0)  throw new IllegalArgumentException();
    neverzero = x;
  }
  public void print() { System.out.println(this); }

  private value companion C.val;  //privatized (also the default)

  // some valid uses of C.val follow:
  public C.val[] flatArray() { return new C.val[]{ this }; }
  private static C.ref nonConstructedZero() {
    return (new C.val[1])[0];  //OK:  C.val private but available
  }
  public static C.ref box(C.val val) { return val; }  //OK param type
  public C.val unbox() { return this; }  //OK return type

  // valid use of private C.default, with Lookup negotiation
  public static
  C.ref defaultValue(java.lang.reflect.MethodHandles.Lookup lookup) {
    if (!lookup.in(C.class).hasFullPrivilegeAccess())
      return null;     //…or throw
    return C.default;  //OK: default for me and maybe also for thee
  }
}

// non-nestmate client:
class D {
  static void passByValue(C x) {
    C.ref ref = box(x);   //OK, although x is null-checked
    if (false)  box((C.ref) null);  //would throw NPE
    assert ref == x;
  }

  static Object useValue(C x) {
    x.unbox().print();   //OK, invoke method on C.val expression
    var xv = x.unbox();  //OK, although C.val is non-denotable
    xv.print();          //OK
    //> C.val xv = x.unbox();  //ERROR: C.val is private
    return xv;  //OK, originally from legitimate method of C
  }

  static Object arrays(C x) {
    var a = x.flatArray();
    //> C.val[] va = a;  //ERROR: C.val is private
    Arrays.toString(a);  //OK
    C.ref[] a2 = a;      //covariant array assignment
    C.ref[] na = new C.ref[1];
    //> na = new C.val[1];  //ERROR: C.val is private
    return a[0];  //constructed values only
  }
}

The above code shows how a privatized value companion can and cannot be used. The type name may never be mentioned. Apart from that restriction, client code can work with the value companion type as it appears in parameters, return values, local variables, and array elements. In this, a privatized companion behaves like other non-denotable types in Java.

Rationale: Note that a companion type is not a real class. Therefore it cannot appeal, precisely, to the existing provisions (in JLS or JVMS) for enforcing class accessibility. But because it is a type, and today nearly all types are classes (and interfaces), users have a right to expect that encapsulation of companion types will “feel like” encapsulation of type names. More precisely, users will hope to re-use their knowledge about how type name access works when reasoning about companion types. We aim to accommodate that hope. If it works, users won’t have to think very often about the class-vs-type distinction. That is also why the above design emulates pre-existing usage patterns for non-denotable types.

Privatization in translation

When a value class is compiled to a class file, some metadata is included to record the explicit declaration or implicit status of the value companion.

The access selection of C’s value companion (public, package, private) is encoded in the value_flags field of the ValueClass attribute of the class information in the class file of C.

The value_flags field (16 bits) has the following legitimate values:

Other values are rejected when the class file is loaded.

The choice of ACC_FINAL for this job is arbitrary. It basically means “please ensure safe publication of final fields of this class, even for fields inside flattened instances.” The race conditions of a non-atomic variable of type C.val are about the same as (are isomorphic to) the race conditions for the states reachable from a non-varying non-null variable of type MC.ref, where MC is a hypothetical identity class containing the same instance fields as C, but whose fields are not declared final. (Remember that C, being a value class, must have declared its fields final.) Omitting ACC_FINAL above means about the same as using the non-final fields of MC to store C.val states. Omitting ACC_FINAL is less safe for programmers, but much easier to implement in the JVM, since it can just peek and poke the fields retail, instead of updating the whole instance value in a wholesale transaction.

That is, if you see what I mean… ACC_VOLATILE would be another clever pun along the same lines, since a volatile variable of type long is one which suppresses tearing race conditions. But volatile means additional things as well. Other puns could be attempted with ACC_STATIC, ACC_STRICT, ACC_NATIVE, and more. John likes ACC_FINAL because of the JMM connection to final fields.

(JVM ISSUE #0: Can we kill the ACC_VALUE modifier bit? Do we really care that jlr.Modifiers kind-of wants to own the reflection of the contextual modifier value? Who are the customers of this modifier bit, as a bit? John doesn’t care about it personally, and thinks that if we are going to have an attribute we can get rid of the flag bit. One implementation issue with killing ACC_VALUE is that class attributes are processed very late during class loading, while class access-flags are processed very early. It may be easier to do some kinds of structural checks on the fly during class loading even before class attributes are processed. Yet this also seems like a poor reason to use a modifier bit.)

Perhaps some kind of “poetic justice” would be attained by replacing the outgoing and redundant ACC_SUPER bit with an incoming and largely-redundant ACC_IDENTITY bit in the same position in the access_flags item. That would allow everything else to go into a class attribute at the bottom of the class file, as suggested, and would be neutral in pressure on access flag bit positions.

(JVM ISSUE #1: What if the attribute is missing; do we reject the class file or do we infer value_flags=ACC_PRIVATE|ACC_FINAL? Let’s just reject the file.)

(JVM ISSUE #2: Is this ValueClass attribute really a good place to store the “atomic” bit as well? This attribute is a green-field for VM design, as opposed to the brown-field of modifier bits. The above language assumes the atomic bit belongs in there as well.)

A use of a value companion C.val, in any source file, is generally translated to a use of a Q-descriptor QC;:

Privatization is enforced for these uses only as much as is needed to ensure that classes cannot create unintiialized values, fields, and arrays.

If an access from bytecode to a privatized Q-descriptor fails, an exception is thrown; its type is IllegalAccessError, a subtype of IncompatibleClassChangeError. Generally speaking such an exception diagnoses an attempt by bytecode to make an access that would have been prevented by the static compiler, if the Java source program had been compiled together as a whole.

When a field of Q-descriptor type is declared in a class file, the descriptor is resolved early, before the class is linked, and that resolution includes an access check which will fail unless the class being loaded has access to C.val, as determined by loading C and inspecting its ValueClass attribute. These checks prevent untrusted clients of C from created non-constructed zero values, in any of their fields.

The timing of these checks, on fields, is aligned with the internal logic of the JVM which consults the class file of C to answer other related questions about field types: (a) whether C is in fact a value class, and (b) what is the layout of C.val, in case the JVM wishes to flatten the value in a containing field. The third check (c) is C.val companion accessible happens at the same time. This is early during class loading for non-static fields, and during class preparation for static fields.

Privatization is not enforced for non-field Q-descriptors, that occur in method and constructor signatures, and in state descriptions for the verifier. This is because mere use of Q-descriptors to describe pre-existing values cannot (by itself) expose non-constructed values, when those values are on stack or in locals.

This can happen invisible at the source-code level as well. An API might be designed to return values of a privatized type from its methods or fields, and/or accept values of a privatized type into its methods, constructors, or fields. In general, the bytecode for a client of such an API will work with a mix of Q-descriptor and L-descriptor values.

The verifier’s type system uses field descriptor types, and thus can “see” both Q-descriptors and L-descriptors. Clients of a class with a privatized companion are likely to work mostly with L-descriptor values but may also have Q-descriptor values in locals and on stack.

When feeding an L-descriptor value to an API point that accepts a Q-descriptor, the verifier needs help to keep the types straight. In such cases, the bytecode compiler issues checkcast instructions to adjust types to keep the verifier happy, and in this case the operand of the checkcast would be of the form CONSTANT_Class["QC;"].

(JVM ISSUE #3: The Q/L distinction in the verifier helps the interpreter avoid extra dynamic null checks around putfield, putstatic, and the invoke instructions. This distinction requires an explicit bytecode to fix up Q/L mismatches; the checkcast bytecode serves this purpose. That means checkcast requires the ability to work with privatized types. It requires us to make the dynamic permission check when other bytecodes try to use the privatized type. All this seems acceptable, but we could try to make a different design which CONSTANT_Class resolution fails immediately if it contains an inaccessible Q-descriptor. That design might require a new bytecode which does what checkcast does today on a Q-descriptor.)

Meanwhile, arrays are rich sources of non-constructed zero values. They appear in bytecode as follows:

Note that there are no static type annotations on array access instruction. The practical impact of this is that, if an array of a privatized type C.val is passed outside of C, then any values in that array become accessible outside of C. Moreover, if C.val is non-atomic, clients may be able to inflict data races on the array.

Thus, the best point of control over misuse of arrays is their creation, not their access. Array creation is controlled by CONSTANT_Class constant pool entries and their access checking. When an anewarray or multianewarray tries to create an array, the CONSTANT_Class constant pool entry it uses must be consulted to see if the element type is privatized and inaccessible to the current class, and IllegalAccessError thrown if that is the case.

All this leads to special rules for resolving an entry of the form CONSTANT_Class["QC;"]. When resolving such a constant, the class file for C is loaded, and C is access checked against the current class. (This is just what happens when CONSTANT_Class["C"] gets resolved.) Next, the ValueClass attribute for C is examined; it must exist, and if it indicates privatization of C.val, then access is checked for C.val against the current class.

If that access to a privatized companion would fail, no exception is thrown, but the constant pool entry is resolved into a special restricted state. Thus, a resolved constant pool entry of the form CONSTANT_Class["QC;"] can have the following states:

That last state happens when C is accessible but C.val is not.

Likewise, a constant pool entry of the form CONSTANT_Class["[QC;"] (or a similar form with more leading array brackets) can have three states, error, full resolution, and restricted resolution.

Pre-Valhalla CONSTANT_Class entries which do not mention Q-descriptors have only two resolved states, error and full resolution.

As required above, the checkcast bytecode treats full resolution and restricted resolution states the same.

But when the anewarray or multianewarray instruction is executed, it must throw an access error if its CONSTANT_Class is not fully resolved (either it is an error or is restricted). This is how the JVM prevents creation of arrays whose component type is an inaccessible value companion type, even if the class file does not correspond to correct Java source code.

Here are all the classfile constructs that could refer to a CONSTANT_Class constant in the restricted state, and whether they respect it (throwing IllegalAccessError):

Q-descriptors not in CONSTANT_Class constants are naturally immune to privatization restrictions. In particular, CONSTANT_Methodtype constants can successfully refer to mirrors to privatized companions.

Uses of CONSTANT_Class constants which forbid Q-descriptors and their arrays are also naturally immune, since they will never encounter a constant resolved in the restricted state. These include new, aconst_init, the class sub-operands of CONSTANT_Methodref and its friends, exception catch-types, and various attributes like NestHost and InnerClasses: All of the above are allowed to refer only to proper classes, and not to their value companions or arrays.

Nevertheless, a aconst_init bytecode must throw an access error when applied to a class with an inaccessible privatized value companion. This is worth noting because the constant pool entry for aconst_init does not mention a Q-descriptor, unlike the array construction bytecodes.

Perhaps regular class constants of the form CONSTANT["C"] would also benefit slightly from a restricted state, which would be significant only to the aconst_init bytecode, and ignored by all the above “naturally immune” usages. If a JVM implementation takes this option, the same access check would be performed and recorded for both CONSTANT["C"] and CONSTANT["QC;"], but would be respected only by withvalue (for the former) and anewarray and the other cases noted above (for the latter but not the former). On the other hand, the particular issue would become moot if aconst_init, like withfield, were restricted to the nest of its class, because then privatization would not matter.

The net effect of these rules, so far, is that neither source code nor class files can directly make uninitialized variables of type C.val, if the code or class file was not granted access to C.val via C. Specifically, fields of type C.val cannot be declared nor can arrays of type C.val[] be constructed.

This includes class files as correctly derived from valid source code or as “spun” by dodgy compilers or even as derived validly from old source code that has changed (and revoked some access).

Remember that new nestmates can be injected at runtime via the Lookup API, which checks access and then loads new code that enjoys the same access. The level of access depends in detail on the selection of ClassOption.NESTMATE (for nestmate injection) or not (for package-mate injection). The JVM uses common rules for these injected nestmates or package-mates and for normally compiled ones.

There are no restrictions on the use of C.ref, beyond the basic access restrictions imposed by the language and JVM on the name C. Access checks for regular references to classes and interfaces are unchanged throughout all of the above.

There are more holes to be plugged, however. It will turn out that arrays are once again a problem. But first let’s examine how reflection interacts with companion types and access control.

Privatization and APIs

Beyond the language there are libraries that must take account of the privatization of value companions. We start on the shared boundary between language and libraries, with reflection.

Reflecting privatization

Every companion type is reflected by a Java class mirror of type java.lang.Class. A Java class mirror also represents the class underlying the type. The distinction between the concept of class and companion type is relatively uninteresting, except for a value class C, which has two companion types and thus two mirrors.

In Java source code the expression C.class obtains the mirror for both C and its companion C.ref. The expression C.val.class obtains the mirror for the value companion, if C is a value class. Both expressions check access to C as a whole, and C.val.class also checks access to the value companion (if it was privatized).

But it is a generally recognized fact that Java class mirrors are less secure than the Java class types that the mirrors represent. It is easy to write code that obtains a mirror on a class C without directly mentioning the name C in source code. One can use reflective lookup to get such mirrors, and without even trying one may also “stumble upon” mirrors to inaccessible classes and companion types. Here are some simple examples:

Class<?> lookup() {
  var name = "java.util.Arrays$ArrayList";
  //or name = "java.lang.AbstractStringBuilder";
  //> java.lang.invoke.MethodHandles.lookup().findClass(name);  //ERROR
  return Class.forName(name);  //OK!
}
Class<?> stumble1() {
  //> return java.util.Arrays.ArrayList.class;  //ERROR
  return java.util.Arrays.asList().getClass();  //OK!
}
Class<?> stumble2() {
  //> return java.lang.AbstractStringBuilder.class;  //ERROR
  return StringBuilder.class.getSuperclass();  //OK!
}
Class<?> stumble3() {
  //> return C.val.class;  //ERROR if C.val is privatized
  return C.ref.class.asValueType();  //OK!
}

Therefore, access checking class names is not and cannot be the whole story for protecting classes and their companion types from reflective misuse. If a mirror is obtained that refers to an inaccessible non-public class or privatized companion, the mirror will “defend itself” against illegal access by checking whether the caller has appropriate permissions. The same goes for method, constructor, and field mirrors derived from the class mirror: You can reflect a method but when you try to call it all of the access checks (including the check against the class) are enforced against you, the caller of the reflective API.

The checking of the caller has two possible shapes. Either a caller sensitive method looks directly at its caller, or the call is delegated through an API that requires negotiation with a MethodHandles.Lookup object that was previously checked against a caller.

Now, if a class C is accessible but its value companion C.val is privatized, all of C’s public methods and other API points are accessible (via both companion types), but access is limited to those very specific operations that could create non-constructed instances (via a variable of companion type C.val). And this boils down to a limitation on array creation. If you cannot use either source code or reflection to create an array of type C.val[], then you cannot create the conditions necessary to build non-constructed instances.

Reflective APIs should be available to report the declared properties of reference companions. It is enough to add the following two methods:

(Note that most reflective access checking should take care to work with the reference mirror, not the value mirror, as the modifier bits of the two mirrors might differ.)

Privatization and arrays

There are a number of standard API points for creating Java array objects. When they create arrays containing uninitialized elements, then a non-constructed default value can appear. Even when they create properly initialized arrays, if the type is declared non-atomic, then non-constructed states can be created by races.

The basic policy for all these API points is to conservatively limit the creation of arrays of type C.val[] if C.val is not public.

API ISSUE #1: Should we relax construction rules for zero-length arrays? This would add complexity but might be a friendly move for some use cases. A zero-length array can never expose non-constructed states. It may, however, serve as a misleading “witness” that some code has gained permission to work with flat arrays. It’s safer to disallow even zero-length arrays.

API ISSUE #2: What about public value companions of non-public inaccessible classes? In source code, we do not allow arrays of private classes to be made, or of their their public value companions. Should we be more permissive in this case? We could specify that where a value companion has to be checked against a client, its original class gets checked as well; this would exclude some use cases allowed by the above language, which only takes effect if the companion is privatized. An extra check for a public companion seems like busy-work and a source of unnecessary surprises, though. Let’s not.

There are probably legitimate use cases for arrays of privatized types, with which the new restrictions on the above API points would interfere. So as a backup, we will make API adjustments to work with privatized array types, with an extra handshake to perform the access check (via either caller sensitivity or negotiation with an instance of MethodHandles.Lookup).

Miscellaneous privatization checks

Besides newly-created or extended arrays, there are a few API points in java.lang.invoke which expose default values of reflectively determined types. Like the array creation methods, they must simply refuse to expose default values of privatized value companions.

To support reflective checks against array elements which may be privatized companion types, an internal method of the form jdk.internal.reflect.Reflection::verifyCompanionType may be defined. It will pass any reference type (regardless of class accessibility) and for a value companion it will check access of the companion (but not the class itself).

Building companion-safe APIs

The method Lookup::arrayConstructor gives enough of a “hook” to create all kinds of safe but friendly APIs in privileged JDK code. The methods in java.util could make use of this privileged API to quickly adapt their internal code to create arrays in cases they are refused by the existing methods Array.newInstance and Arrays.copyOf.

For example, a checked method MethodHandles.Lookup::defaultValue(C) may be added to provide the default value C.default if its companion C.val is accessible. It will operate as if it first creates a one-element array of the desired type, and then loads the element.

Or, a caller-sensitive method Class::defaultValue or Class::newArray could be added which check the caller and return the requested result. All such methods can be built on top of MethodHandles.Lookup.

In general, a library API may be designed to preserve some aspect of companion safety, as it allows untrusted code to work with arrays of privatized value type, while preventing non-constructed states of that type from being materialized. Each such safe and friendly API has to make a choice about how to prevent clients from creating non-constructed states, or perhaps how to allow clients to gain privilege to do so. Some points are worth remembering:

In the presence of a reconstruction capability, either in the language or in a library API or as provided by a single class, avoiding non-constructed instances includes allowing legitimate reconstruction requests; each legitimate reconstruction request must somehow preserve the intentions of the class’s designer. Reconstruction should act as if field values had been legitimately (from C’s API) extracted, transformed, and then again legitimately (to C’s API) rebuilt into an instance of C.

Serialization is an example of reconstruction, since field values can be edited in the wire format. Proposed with expressions for records are another example of reconstruction. The withfield bytecode is the primitive reconstruction operator, and must be restricted to nestmates of C since it can perform all physically possible field updates. Reconstruction operations defined outside of C must be designed with great care if they use elevated privileges beyond what C provides directly. Given the historically tricky nature of deserialization, more work is needed to consider what serialization of a C.val actually means and how it interacts with default reconstitution behaviours. One likely possibility is that wire formats should only work with C.ref types with proper construction paths (enforced by serialization), and leave conversion to C.val types to deserialization code inside the encapsulation of C.

JNI, like serialization, allows creation of arrays which is hard to constrain with access checks. We have a choice of at least two positions on this. We could allow JNI full permission to create any kind of arrays, thus effectively allowing it “inside the nest” of any value class, as far as array construction goes. Or, we could say that JNI (like Arrays::copyOf) is absolutely forbidden to create uninitialized arrays of privatized value type. The latter is probably acceptable. As with other API points, programmers with a legitimate need to create flat privatized arrays can work around the limitations of the “nice” API points by using more complex ones that incorporate the necessary access checks.

Summary of user model

A value class C has a value companion C.val which denotes the null-hostile (zero-initialized) fully flattenable value type for C.

Like other type members of C, C.val can be declared with an access modifier (public or private or neither). It is therefore quite possible that clients of C might be prevented from using the companion type.

The operations on C.val are almost the same as the operations on plain C (C.ref), so a private C.val is usually not a burden.

Operations which are unique to C.val, and which therefore may be restricted to you, are:

Library routines which create empty flattenable arrays of C.val might not work as expected, when C.val is not public. You’ll have to find a workaround, such as:

If you looked closely at the code for C above, you might have noticed that it uses its private type C.val in its public API. This is allowed. Just be aware that null values will not flow through such API points. When you get a C.val value into your own code, you can work on it perfectly freely with the type C (which is C.ref).

If a value companion C.val is declared public, the class has declared that it is willing to encounter its own default value C.default coming from untrusted code. If it is declared private, only the class’s own nest can work with C.default. If the value companion is neither public nor private, the class has declared that it is willing to encounter its own default within its own package.

If a class has declared its companion non-atomic, it is willing to encounter states arising from data races (across multiple fields) in the same places it is willing to encounter its default value.

Summary of restrictions

From the implementation point of view, the salient task is restricting clients from illegitimately obtaining non-constructed values of C, if the author of C has asked for such restrictions. (Recall that a non-constructed value of C is one obtained without using C’s constructor or other public API.) Here are the generally enforced restrictions regarding a privatized type C.val:

Even so, let us suppose you are an accident-prone client of C. Ignoring the above restrictions, you might go about obtaining a non-constructed value of C in several ways, and there is an answer from the system in each case that stops you:

And there are a number of ways you might attempt to indirectly create an array of type C.val[]:

Using C.val or C.default directly is blocked if C privatizes its value companion, unless you are coding a nestmate or package-mate of C. These checks are applied both at compile time and when the JVM resolves names, so they apply equally to source code and bytecodes created by any means whatsoever.

There are no realistic restrictions on obtaining a mirror to a companion type C.val. (Accidental and casual direct use of C.val.class is prevented by access restrictions on the type name C.val. But there are many ways to get around this limitation.) Therefore any method or API which could violate the above generally enforced restrictions must perform an appropriate dynamic access check on behalf of its mirror argument.

Such a dynamic access check can be made negotiable by an appeal to caller sensitivity or a Lookup check, so a correctly configured call can avoid the restriction. For some simple methods (perhaps Arrays::copyOf or MethodHandles::zero) there is no negotiation. Depending on the use case, access failure can be worked around via a “negotiable” API point like Lookup::arrayConstructor.