Briefly put, types in L-world are ambiguous, leading to unhygienic
mixtures of value operations with reference operations, and
uncontrolled pollution from
nulls infecting value code.
This note explores a promising proposal for resolving the key ambiguity. It is a cleaner design than the ad hoc mechanisms tried so far. The resulting system would seem to allow more predictable and debuggable behavior, a stronger backward compatibility story, and better optimization.
In the L-world design for value types, the classfile type descriptor
syntax is left unchanged, and the pre-existing descriptor form
"LFoo;" is overloaded to denote value types as well as object types.
A previous design introoduced new descriptors for value types of the
"QFoo;", and possibly a union type
"UFoo;". This design
might be called Q-world. In comparison with Q-world, the L-world
design approach has two advantages–compatibility and migration–but
also one serious disadvantage: ambiguity.
L-world is backward compatible with tools that must parse classfile
descriptors, since it leaves descriptor syntax unchanged. There have
been no changes to this syntax in almost thirty years, and there is a
huge volume of code that depends on its stability. The HotSpot JVM
itself makes hundreds of distinct decisions based on descriptor syntax
which would need careful review and testing if they were to be adapted
to take account of a new descriptor type (
Because of its backward compatibility, L-world also has a distinctly
simpler migration story than previous designs. Some value-based
classes, such as
LocalTime, have been engineered to
be candidates for migration to proper value types. We wish to allow
such a migration without recompiling the world or forcing programmers
to recode uses of the migrated types. It is very difficult to sustain
the illusion in Q-world that a value type
be operated on in old code under the original object type
Ljava/util/Optional;, since the descriptors do not match and a
myriad of adapters must be spun (one for every mention of the wrong
descriptor). With L-world, we have the simpler problem (addressed in
this document) of keeping straight the meaning of L-descriptors
in each relevant context, whether freshly recompiled or legacy
code; this is a simpler problem than spinning adapters.
But not all is well in L-world. The compatibility of descriptors implies that, when a classfile must express a semantic distinction between a reference type and an object type, it must be allowed to do so unambiguously, in a side channel outside of the descriptor.
Our first thought was, “well, just load all the value types and then you will know the list of them”. If we have a global registry of classes (such as the HotSpot system dictionary), nobody needs to express any additional distinctions, since everybody can just ask the register which are the value types.
This simple idea has a useful insight, but it goes wrong in three ways. First, for some use cases such as classfile transformation, it might be difficult to find such a global registry; in some cases we might prefer to rely on local information in the classfile. We need a way for a classfile to encode, within itself, which types it is using as value types, so that all viewers of the classfile can make consistent decisions about what’s a value and what’s not.
Second, if we are running in the JVM, the global registry of value
types has to be built up by loading classfiles. In order for every
classfile that uses a value type to know its status, the classfile
the defines the value type must be loaded first. But there is no
way to totally order these constraints, since it is easy to create
circular dependencies between value types, either directly or
indirectly. (N.B. Well-foundedness rules for layout don’t eliminate
all the possible circularities.) And it won’t work to add more
initialization phases (“first load all the classfiles, then let them
all start asking questions about their contents”), because that would
require preloading a classfile for every potential value type
mentioned in some other classfile. That’s every name in every
"LFoo;" descriptor. Loading a file for every name mentioned
anywhere is very un-Java-like, and something that drastic would be
required in order to make correct decisions about value types.
That leads to the third problem, which comes from our desire to make a migration story. Some classfiles need to operate on value types as if they were object references. (Below, we will see details of how operations can differ between value and reference types.) This means that, if we are to support migration, we need a way for legacy classfiles to make a local decision to treat a given type as a reference type, for backward compatibility. Luckily, this is possible, but it requires a local indication in the classfile so the JVM can adjust certain operations.
A solution to these problems requires a way for each classfile to
declare how it intends to use each type that is a value type, and
(what is more) a way for legacy classfiles to peacefully interoperate
with migrated value types. We have experimented with various partial
solutions, such as adding an extra bit in a context where a value type
may occur, to let the JVM know that the classfile intends a value
type. (This is the famous
ACC_FLATTENABLE bit on fields.) But it
turns out that the number of places where value-ness is significant is
hard to limit to just a few spots where we can sprinkle a mode bit.
We need a comprehensive solution that can clearly and consistently
define a classfile’s (local) view of the status of each type it works
with, so that when the “value or reference?” question comes up, there
is a clear and consistent answer. We need to prevent the values and
the references from polluting each other; we need value type
Value types can be thought of as simpler than reference types, because they lack two features of reference types:
identity: Two value types with the same immediate components are
indistinguishable, even if they were created by different code
paths. Objects, by contrast, “remember” when they were created,
and each object is a unique identity. Identities are
distinguished using the
acmp family of instructions, and Java’s
nullability: Any variable of any reference type can store the
null; in fact,
null is the initial value for fields and
array elements. So
null is one of the possible values of any
reference type, including
Object and all interfaces. By
null is not the value of any value type. Value type
variables are not nullable, because
null is a reference. (But
read on for an awkward exception.) The type
represent all values and references. Casting an unknown operand
Object to a value type
Foo must succeed if in fact the
operand is of type
Foo, but a null
Object reference must never
successfully cast to a value type.
This strong distinction between values and references is inspired, in
part, by the design of Java’s primitive types, which also are identity
free and are not nullable. Every copy of the
int value 42 is
completely indistinguishable from every other copy, and you can’t cast
int (without a null pointer exception). We hope
eventually to unify value types and primitives, but even if this
never comes to pass, our design slogan for value types is, codes
like a class, works like an int.
By divesting themselves of identity and nullability, value types are able to enjoy new behaviors and optimizations akin to those of primitives, notably flattening in the heap and scalarization in compiled code.
To unlock these benefits, the JVM must treat values and references as operationally distinct. Some of these operational distinctions are quite subtle; some required months of discussion to elucidate, though soon (we hope) they will be obvious in hindsight.
Here is a partial list of cases where the JVM should be able to distinguish value types from reference types:
checkcastoperator for a value type might reject
null(as well as rejecting instances of the wrong type). The
ldcof a dynamic constant of value type must not produce
null(instead it must fail to link). _ comparisons: The
acmpoperator family must not detect value type identities (since they are not present), so it must operate differently on values and references. In some cases, the verifier might reject
nullmixing with value types.
This list can be tweaked to make it shorter, by adjusting the rules in ways that lessen the impact of ambiguity in type names. The list is also incomplete. (We will add to it later.) Each point of distinction is the subject of detailed design trade-offs, many of which we are sketching here.
Some of these distinctions can be pushed to instruction link time
(when resolved value classes may be present) or run time (when the
actual values are on stack). A dynamic check can make a final
decision, after all questions of value-ness are settled. This seems
to be a good decision for
acmp. The linkage of a
can fail on a value class, or a
checkcast instruction can reject
inspected as part of the dynamic execution of operations like
But this delaying tactic doesn’t always work. For example, field layout must be computed during class loading, which (as was seen above) is too early to use the supposed global list of value types.
Even if some check can be delayed, like the detection of an erroneous
new on a value type, we may well decide it is more useful (or
“hygienic”) to detect the error earlier, such as at verification time,
so that a broken program can be detected before it starts to run.
Also, some operations may be contextual, to support backward
checkcast may need to consult the local
classfile about whether to reject nulls, so that legacy code won’t
suddenly fail to verify or execute just because it mixes nulls with
(what it thought were) references. Basically, a “legacy checkcast”
should work correctly with nulls, while an “upgraded checkcast” should
probably reject nulls immediately, without requiring extra tests.
We will examine these points in more detail later, but now we need to examine how to contextualize information about value types.
What is to be done? The rest of this note will propose some solutions to the problem of value type hygiene, and specifically the problem of preventing nulls from mixing with values (“null hygiene”).
Both Remi Forax and Frederic Parain have proposed the idea
of having each classfile explicitly declare the list of all value
types that it is using. For the record, this author initially
resisted the idea as overkill: I was hoping to get away with a
ACC_FLATTENABLE), but have since realized we need a more
aggressive treatment. Clean and tidy behavior from the JVM will make
it easier to implement clean and tidy support for value types in the
Throughout the processing of the classfile, the list can serve as a reliable local registry of decisions about values vs. references. First we will sketch the attribute, and then revisit the points above to see how the list may be used.
As proposed above, let us define a new attribute called
which is simply a counted array of
CONSTANT_Class indexes. Each
indexed constant is loaded and checked to be a value type. The JVM
uses this list of locally declared value types for all further
decisions about value types, relative to the current class.
As a running reference, let’s call the loaded class
C may be
any class, either an object or a value. The value types locally
C we can call
Q2, etc. These are exactly
the types which would get
Q descriptors in Q-world.
As an attribute,
ValueTypes is somewhat like the
attribute. Both list all classes, within the context a particular
classfile, which need some sort of special processing. The
InnerClasses attribute includes additional data for informing the
special processing (including the break down of “binary names” into
outer and inner names, and extra modifier bits), but the
attribute only needs to mention the classes which are known to be
Already with the
ACC_FLATTENABLE bit we have successfully defined
logic that pre-loads a supposed value type, ensures that it is in
fact a value type, and then allows the JVM to use all of the necessary
properties of that value type to improve the layout of the current
class. The classes mentioned in
ValueTypes would be pre-loaded
similarly. In fact, the
ACC_FLATTENABLE bit is no longer needed,
since the JVM can simply flatten all fields whose type names are
mantioned in the local
We now come to the distinction between properly resolved classes
CONSTANT_Class entries) and types named in descriptors. This
distinction is important to keep in mind. Once a proper class
K is resolved by
C, everything is known about it, and a
permanent link to
K goes into
C’s constant pool. The same is not
true of other type names that occur within field and method
descriptors. In order for
C to check whether its field type
is a value type, it must not try to resolve
K. Instead it must
K by name in the list of locally declared value types.
Later on, when we examine verifier types and the components of method
descriptors a similar by-name lookup will be necessary to decide
whether they refer to value types. Thus, there are two ways a type
can occur in a classfile and two ways to decide if it is a value type:
By resolving a proper constant
K and looking at the metadata, and by
matching a name
"LK;" against the local list. Happily, the answers
will be complete and consistent if all the queries look at the same
So a type name can be classified as a value type without resolution,
by looking for the same name in the names of the list of declared
value types. And this can be done even before the list of declared
value types is available. This means that any particular declared
value types might not need to be loaded until “hard data” is required
of it. A provisional determination of the value status of some
can be made very early before
Q’s classfile is actually located and
pre-loaded. That provision answer might be enough to check some early
structural constraint. It seems reasonable to actually pre-load the
Q values lazily, and only when the JVM needs hard data about
like its actual layout, or its actual supers.
What if an element of
ValueTypes turns out to be a reference type?
(Perhaps someone deployed a value-type version of
Optional but then
got cold feet; meanwhile
C is still using it under the impression it
is a value type.) There are two choices here, loose and strict,
either pretend the type wasn’t there anyway, or raise an error in the
loading of the current classfile. The strict behavior is safer; we
can loosen it later if we find a need. The case of an element failing
to load at all can be treated like the previous problem, either
loosely or strictly; strict is better all else being equal.
The strict treatment is also more in line with how to treat failed resolution of super-types, which are a somewhat similar kind of dependency: Super-types, like value types, are loaded as early as possible, and play a role in all phases of classfile loading, notably verification.
One corollary of making the list an attribute is that it can be easily
stripped, just like
BootstrapMethods. Is this a
bug or a feature? In the case of
InnerClasses, stripping the
attribute doesn’t affect execution of the classfile but it does break
some Core Reflection queries. In the case of
structural constraints on dynamic constant pool constants will break,
and the classfile will promptly fail to load. The effect of removing
ValueTypes attribute is probably somewhere in between. Because
L-world types are ambiguous, and because we specifically allow value
types to be used as references from legacy classfiles (for migration),
there’s always a way to “fake” enough reference behavior from a value
type in a classfile which doesn’t make special claims about it. So it
seems reasonable to allow
ValueTypes to be stripped, at least in
principle. At a worst case the classfile will fail to load, as in the
case of a stripped
BootstrapMethods, but the feature might actually
prove useful (say, for before-and-after migration testing).
Note that in principle a classfile generator could choose to ignore a value type, and treat it as a (legacy) reference type. Because of migration, the JVM must support at least some such moves, but such picking and choosing is not the center of our design. In particular, we do not want the same compilation unit to treat a type as a value type in one line of code, and a reference type in the next. This may come later, if we decide to introduce concepts of nullable values and/or value boxes, but we think we can defer such features.
So for now, classfiles may differ among themselves about which types are value types, but within a single classfile there is only one source of local truth about value types. (Locally-sourced, fresh, hygienic data!)
Very early during class loading, the JVM assigns an instance layout to
the new class
C. Before that point it must first load the declared
value types (
Q2, …), and then recursively extract the layout
information from each one. There is no danger of circularity in this
because a value type instance cannot contain another instance of
itself, directly or indirectly.
Both non-static and static fields of value type make sense (because a value “works like an int”). But static fields interact differently with the loading process than non-static fields.
A static value type field has no enclosing instance, unless the JVM chooses to make one secretly. Therefore it doesn’t need to be flattened. The JVM can make an invisible choice about how to store a static value type field:
putstaticinstruction would put the new value in a different buffer and change the pointer.
putstaticinstruction would store the flattened value into the same buffer.
The first option seems easiest, but the second might be more performant. The third difficult due to bootstrapping concerns.
In fact, the same implementation options apply for non-statics as for
statics, but only the third one (full flattening) is desirable. The
first one (immutable buffering) may be useful as a fallback
implemmentation technique for special cases like jumbo values and
fields which are
volatile, and thus need to provide atomicity.
The root container for all of
C’s statics, in HotSpot, happens to be
the relevant the
C.class. Presumably it’s a
good place to put the invisible pointers mentioned above.
A static field of value type
Q cannot make its initial value
<clinit> method runs, (or in the
case of re-entrant initialization, has at least started). Since
classes can circularly refer to instances of each other via static
Q might return the favor and require materialization of
The first time
Q’s default value, if
Q has not been
<clinit> method should run. This may trigger
re-entry into the initializer for
Q needs to get its act
together before it runs its
<clinit>, and immediately create
own default value, storing it somewhere in
Q’s own metadata (or else
Class mirror looks like a good spot). The invariant is that,
Q’s class initializer can run one bytecode, the default value
Q is created and registered for all time. Creating the default
value before the initializer runs is odd but harmless, as long as no
bytecode can actually access the default value without triggering
This also implies that
C should create and register its own default
value (if it is a value type) before it runs its own
Q come back and ask
C for its value type.
The JVM may be required to bootstrap value-type statics as invisible
null pointers, which are inflated (invisibly by the
putstatic instructions) into appropriate buffers, after ensuring the
initialization of the value type class. But it seems possible that if
the previous careful sequencing is observed, there is no need to do
lazy inflation of nulls, which would simplify the code for
C includes methods as well as fields, of course. A method
can receive or return a value type
Q simply by mentioning
Q as a
component of its method descriptor (as an L-descriptor
If a method
C.m()LD; mentions some type
D which is not on the
declared list, then that type
D will be treated, like always, as a
nullable, identity-bearing reference.
Interestingly, migration compatibility requires this to be the case
whether or not
D is in actual fact a value type. If
D’s value-ness, the JVM must respect this, and
preserve the illusion that
D values are “just references, nothing to
see here, move along”. Perhaps
D is freshly upgraded to a value
C isn’t recompiled yet.
C should not be penalized
for this change, if at all possible.
This points to a core decision of the L-world design, that nearly all
of the normal operations on object references “just do the right
thing” when applied to value types. The two kinds of data use the
same descriptor syntax. Value types can be converted to
references, even though the resulting pseudo-reference does not expose
any identity (and will never be null). Instructions like
operate on values just as well as references, and so on.
Basically, values in L-world routinely go around lightly disguised as
references, special pseudo-references which do not retain object
identity. As long as nobody looks closely, the fiction that they are
references is unbroken. If someone tries a
instruction, the game is over, but we think those embarassing moments
will be rare.
On the other hand, if a method
C.m()LQ; uses a locally-declared
value type, then the JVM has some interesting options. It may choose
to notice that the
Q-value is not nullable, has no identity. It can
adjust the calling sequence of
m to work with undisguised “naked
values”, which are passed on the stack, opr broken into components for
transport across the method API. This would almost be a purely
invisible decision, except that naked values cannot be null, and so
such calling sequences are hostile to null. Again, it “works like an
int”. A null
Integer value will do just the same thing if you try
to pass it to an
int-recieving method. So we have to be prepared
for an occasional embarassing NPE, when one party thinks a type is a
nullable reference type and the other party knows it’s a value type.
One might think that it is straightforward to assign a value-using method a calling sequence by examining the method signature and the locally declared value types of the declaring class. But in fact there are non-local constraints. Only static and private methods can easily be adjusted to work with naked values.
Unlike fields, methods can override similar methods in some
S. This immediately leads to the possibility of
S differing as to the status of some type
X in the method’s
signature. If neither of the
ValueTypes lists of
X, then the classes are agreed that
X is an object type
(even if in truth it happens to be a value type). They can agree
to use a reference-based calling sequence for some
m that works
If both lists mention some
Q, then both classes agree, and in fact
it must be a value type. They might be able to agree to use “naked
values” for the
Q type when calling the method. Or not: they still
have to worry about other supers that might have another opinion about
C doesn’t list
S does, and they share a method
Q? For example, what about
that case, the JVM may have already set up
S.m to return its
result as a naked value. Probably this happend before
C was even
loaded. The code for
C.m will expect simply to return a normal
reference. In reality, it will be unconsciously holding a
JVM-assigned pseudo-reference to the buffered
Q-value. The JVM must
then unwrap the reference into a naked value to match the calling
sequence it assigned (earlier, before
C was loaded) to
bottom line is that even though
C.m was loaded as a
reference-returning function, the JVM may secretly rewrite it to
return a naked value.
C.m returns a reference, it might choose to return
What happens then? The secretly installed adaptation logic cannot
extract the components of a buffer that doesn’t exist. A
NullPointerException must be thrown, at the point where
S.m, which has the greater knowledge that
Q is value
type (hence non-nullable). It will be as if the
C.m included a hidden null check.
Is such a hidden null check reasonable? One might explain that the
C code thinks (wrongly) it is working with boxes, while the
knows it is working with values. If the method were
and it were overriding
S.m()int, then if
the adapter that converts to
S.m()int must throw NPE during the
implicit conversion from
int. A value “works like an
int”, so the result must be similar with a value type. It is as if
the deficient class
C were working with boxes for
Q (indeed that’s
all it sees) while the knowledgeable class
S is working with true
values. The NPE seems justifiable in such terms, although there is no
visible adapter method to switch descriptors in this case.
The situation is a little odd when looked at the following way: If you
view nullability as a privilege, then this privilege is enjoyed only
by deficient classes, ones that have not yet been recompiled to “see”
that the type
Q is a value type. Ignorant classes may pass
back and forth through
Q APIs, all day long, until they pass it
through a class that knows
Q is a value. Then an
NPE will end
their streak of luck. Is using
null a privilege? Well, yes, but
remember also that if
Q started its career as an object type, it was
a value-based class, and such classes are documented as being
null-hostile. The null-passers were in a fool’s paradise.
Q as a value but
S doesn’t? Then the calling
sequence assigned when
S was loaded will use references, and these
references will in fact be pseudo-references to buffered
null, as just discussed). The knowledgeable method
will never produce a
null through this API. The JVM will arrange
to properly clothe the
Q-value produced by
C.m into a buffer
whose pointer can be returned from
Class hierarchies can be much deeper than just
overrides can occur at many levels on the way down. Frederic Parain
has pointed out that the net result seems to be that the first
(highest) class that declares a given method (with descriptor) also
gets to determine the calling sequence, which is then imposed on all
overrides through that class. This leads to a workable implementation
strategy, based on v-table packing. A class’s v-table is packed at
during the “preparation” phase of class linking, just after loading
before any subclass v-table is packed. The JVM knows, unambiguously,
whether a given v-table entry is new to a class, or is being
reaffirmed from a previous super-class (perhaps with an override,
perhaps just with an abstract). At this point, a new v-table slot can
be given a carefully selected internal calling sequence, which will
then be imposed on all overrides. An old v-table slot will have the
super’s calling sequence imposed on it. In this scheme, the
interpreter and compiler must examine both the method descriptor and
some metadata about the v-table slot when performing
A method coming in “sideways” from an interface is harder to manage. It is reasonable to treat such a method as “owned” by the first proper class that makes a v-table entry for it. But that only works for one class hierarchy; the same method might show up in a different hierarchy with incompatible opinions about value types in the method signature. It appears that interface default methods, if not class methods, must be prepared to use more than one kind of calling sequence, in some cases. It is as if, when a class uses a default method, it imports that method and adjusts the method’s calling sequence to agree with that class’s hierarchy.
Often an interface default method is completely new to a class hierarchy. In that case, the interface can choose the calling sequence, and this is likely to provide more coherent calling sequences for that API point.
These complexities will need watching as value types proliferate and begin to show up in interface-based APIs.
Let us assume that, if the verifier sees a value type, it should flag all invalid uses of that value type immediately, rather than wait for execution.
(This assumption can be relaxed, in which case many points in this section can be dropped. We may also try to get away with implementing as few of these checks as possible, saving them for a later release.)
When verifying a method, the verifier tracks and checks types by name,
mostly. Sometimes it pre-loads classes to see the class hierarchy.
ValueTypes attribute, there is no need to pre-load value
classes; the symbolic method is sufficient.
The verifier type system needs a way to distinguish value types from
regular object types. To keep the changes small, this distinction can
be expressed as a local predicate on type names called
implemented by referring to
ValueTypes. In this way, the
StackMapTable attribute does not need any change at all. Nor does
the verifier type system need a change: value types go under the
Reference categories, despite the fact that value types
are not object types, and values are not references.
The verifier rules need to consult
isValueType at some points. The
assignability rules for
null must be adjusted to exclude value
isAssignable(null, class(X, _)) :- not(isValueType(X)).
This one change triggers widespread null rejection: wherever a value
type is required, the verifier will not allow a
null to be on the
null is on the stack and
Q is a value type, the
following will be rejected as a consequence of the above change:
putstaticto a field of type
areturnto a return type
nullto a parameter of type
nullto a receiver of type
Q(but this is rare)
Given comprehensive null blocking (along other paths also), the
implementation of the
withfield) instruction could go
ahead and pull a buffered value off the stack without first checking
null. If the verifier does not actually reject such
the dynamic behavior of the bytecodes themselves should, to prevent
null pollution from spreading.
The verifier rules for
checkcast only check that the
input type is an object reference of some sort. More narrow type
checks are performed at runtime. A null may be rejected dynamically
by these instructions, but the verifier logic does not need to track
nulls for them.
The verifier rules for
invokespecial have special cases for
methods, but these do not need special treatment, since such calls
will fail to link when applied to a value type receiver.
The verifier could reject reference comparisons between value types
other operands (including
null, other value types, and reference
types). This would look something like an extra pair of constraints
after the main assertion that two references are on the stack:
instructionIsTypeSafe(if_acmpeq(Target), Environment, _Offset, StackFrame, NextStackFrame, ExceptionStackFrame) :- canPop(StackFrame, [reference, reference], NextStackFrame), + not( canPop(StackFrame, [_, class(X, _)], _), isValueType(X) ), + not( canPop(StackFrame, [class(X, _), _], _), isValueType(X) ), targetIsTypeSafe(Environment, NextStackFrame, Target), exceptionStackFrame(StackFrame, ExceptionStackFrame).
(The JVMS doesn’t use any such
not operator. The actual Prolog
changes would be more complex, perhaps requiring a
target type instead of
This point applies equally to
This doesn’t seem to be worth while, although it might be
interesting to try to catch javac bugs this way. In any case, such
comparisons are guaranteed to return
false in L-world, and will
optimize quickly in the JIT.
In a similar vein, the verifier could reject
monitorexit instructions when they apply to value types:
instructionIsTypeSafe(monitorenter, _Environment, _Offset, StackFrame, NextStackFrame, ExceptionStackFrame) :- canPop(StackFrame, [reference], NextStackFrame), + not( canPop(StackFrame, [class(X, _)], _), isValueType(X) ), exceptionStackFrame(StackFrame, ExceptionStackFrame).
putfield could be quickly rejected if it applies to a
instructionIsTypeSafe(new(CP), Environment, Offset, StackFrame, NextStackFrame, ExceptionStackFrame) :- StackFrame = frame(Locals, OperandStack, Flags), CP = class(X, _), + not( isValueType(X) ), ... instructionIsTypeSafe(putfield(CP), Environment, _Offset, StackFrame, NextStackFrame, ExceptionStackFrame) :- CP = field(FieldClass, FieldName, FieldDescriptor), + not( isValueType(FieldClass) ), ...
withfield could be rejected by the verifier if applied to a
The effect of any or all of these verifier rule changes (if we choose
to implement them) would be to prevent local code from creating a
null and accidentally putting it somewhere a value type belongs, or
from accidentally applying an identity-sensitive operation to an
operand known statically to be a value type. These rules only work
when a sharp verifier type unambiguously reports an operand as
or as a value type.
Nulls must also be rejected, and value types detected, when they are
hidden, at verification time, under looser types like
Protecting local code from outside
nulls must also be done
Omitting all of these rules will simply shift the responsibility for
null rejection and value detection fully to dynamic checks at
execution time, but such dynamic checks must be implemented in any
case, so the verifier’s help is mainly an earlier error check,
especially to prevent null pollution inside of a single stack frame.
For that reason, the only really important verifier change is the
isAssignable adjustment, mentioned first.
The dynamic checks which back up or replace the other verifier checks will be discussed shortly.
We need to discuss the awkward situation of
null being passed as a
value type, and value types being operated on as objects, by legacy
classfiles. One legacy classfile can dump null values into surprising
places, even if all the other classfiles are scrupulous about
We will also observe some of the effects of having value types “invade” a legacy classfile which expects to apply identity-sensitive operations to them.
By “legacy classfile” we of course mean classfiles which lack
ValueTypes attributes, and which may attempt to misuse value types
in some way. (Force of habit: it’s strong.) We also can envision
half-way cases where a legacy classfile has a
which is not fully up to date. In any case, there is a type
is not locally declared as a value type, by the legacy class
The first bad thing that can happen is that
C declares a field of
Q. This field will be formatted as a reference field, even
though the field type is a value type. Although we implementors might
grumble a bit, the JVM will have to arrange to use pseudo-pointers to
represent values stored in that field. (It’s as if the field were
volatile, or not flattenable for some other reason.) That wasn’t too
bad, but look what’s in the field to start with: It’s a null. That
means that any legitmate operation on this initial value will throw an
NPE. Of course, the writer of
Q as a value-based class,
so the initial null will be discarded and replaced by a suitable
non-null value, before anything else happens.
C makes a mistake, and passes a
null to another class
which does know
Q is a value? At that point we have a choice, as
with the verifier’s null rejection whether to do more work to detect
the problem earlier, or whether to let the
null flow through and
eventually cause an
NPE down the line. Recall that if an API point
gets a calling sequence which recognizes that
Q is a value type, it
will probably unbuffer the value, throwing
NPE immediately if
makes a mistake. This is good, because that’s the earliest we could
hope to flag the mistake. But if the method accepts the boxed form of
Q, then the
null will sneak in, skulk around in the callee’s stack
frame, and maybe cause an error later.
Meanwhile, if the JVM tries to optimize the callee, it will have to
limit its optimizations somewhat, because the argument value is
nullable (even if only ever by mistake). To cover this case, it may
be useful to define that method entry to a method that knows about
Q is null-hostile, even if the calling sequence for some reason
allows references. This means that, at function entry, every known
value type parameter is null-checked. This needs to be an official
rule in the JVM, not just an optimization for the JIT, in order for
the JIT to use it.
What if our
C returns a
null value to a caller who intends to use
it as a value? That won’t go well either, but unless we detect the
null aggressively, it might rattle around for a while, disrupting
optimization, before produing an inscrutable error. (“Where’d that
null come from??”). The same logic applies as with arguments: When
null is returned from a method call that purports to return
this can only be from a legacy file, and the calling sequences were
somehow not upgraded. In that case, the JVM needs to mandate a null
check on every method invocation which is known to return a value
The same point also applies if another class
Q as a
value type, happens to load a
null from one of
C’s fields. The
C field is formatted as a reference, and thus can hand
A must refuse to see it, and throw
getfield instruction, if it is pointed at a legacy
non-flattened field, will need to null-check the value loaded
from the field.
C is allowed to
null all day
long into its own fields (and fields of other benighted legacy classes
that it may be friends with). Thus, the
instructions link to slightly different behavior, not only based on
the format of the field, but also based on “who’s asking”. Code in
C is allowed to witness
nulls in its
Q fields, but code in
(upgraded) is not allowed to see them, even though it’s the same
getfield to the same symbolic reference. Happily, fields are not
shared widely across uncoordinated classfiles, so this is a corner
case mainly for testers to worry about.
C stores a
null into somebody else’s
Q field, or into an
element of a
Q array? In that case,
C must throw an immediate
NPE; there’s no way to reformat someone else’s data structure,
C may be.
C gets a null value from somewhere and casts it to
NPE (as it should in a classfile where
Q is known to be a value type)? For compatibility, the answer is
“no”; old code needs to be left undisturbed if possible. After all,
C believes it has a legitimate need for
nulls, and won’t be fixed
until it is recompiled and its programmer fixes the source code.
That’s about it for
null. If the above dynamic checks are
implemented, then legacy classfiles will be unable to disturb upgraded
classfiles with surprise null values. The goal mentioned above
null on all paths is fulfilled blocking
across API calls (which might have a legacy class on one end), and by
nulls never mix with values, locally within a single
There are a few other things
C’s could do to abuse
Legacy code needs to be prevented immediately from making any of the
putfieldto a field of
Qvalue should throw
Happily, the above rules are not specific to legacy code but apply uniformly everywhere.
A final mistake is executing an
acmp instruction on a value type.
Again, this is possible everywhere, not just in legacy files, even if
the verifier tries to prevent the obvious occurrences. There are
several options for
acmp on value types. The option which breaks
the least code and preserves the O(1) performance model of
to quickly detect a value type operand and just report
if the JVM can tell, somehow, that it’s the same buffer containing the
same value, being compared to itself.
All of these mistakes can be explained by analogy, supposing that the
C were working with a box type
Integer where other
classes had been recoded to use
int. All variables under
control are nullable, but when it works with new code it sees only
int variables. Implicit conversions sometimes throw
monitorenter) operations on boxed
Integer values yield
unspecific (or nonsensical) results.
Linked instructions which are clearly wrong should throw a
LinkageError of some type. Examples already given are
putfield on value types.
When a field reference of value type is linked it will have to
correctly select the behavior required by both the physical layout of
the field, and also the stance toward any possible
null if the field
is nullable. (As argued above, the stance is either lenient for
legacy code or strict for new code.)
getstatic linkage may elect to replace an invisible
a default value.
invoke is linked it will have to arrange to correctly
execute the calling sequence assigned to its method or its v-table.
invokeinterface will be even more dynamic, since the
calling sequence cannot be determined until the receiver class is
Linkage of dynamic constants in the constant pool must reject
for value types. Value types can be determined either globally based
on the resolved constant type, or locally based on the
attribute associated with the constant pool in which the resolution
Most of the required dynamic behaviors to support value type hygiene
have already been mentioned. Since values are identity-free and
non-nullable, the basic requirement is to avoid storing
value-type variables, and degrade gracefully when value types are
queried about their identities. A secondary requirement is to support
the needs of legacy code.
For null hygeine, the following points apply:
nullfor locally declared value types, but not for others.
withfieldinstructions should do so dynamically. (Otherwise the other rules are sufficient to contain
aastoreto a value type array (
Q) should reject
nulleven if the array happens to use invisible indirections as an implementation tactic (say, for jumbo values). This is a purely dynamic behavior, not affected by the
Linked field and invoke instructions need sufficient linkage metadata
to correctly flatten instance fields and use unboxed (and/or
hostile) calling sequences.
As discussed above, the
acmp must short circuit on values. This is
a dynamic behavior, not affected by the
Generally speaking, any instruction that doesn’t refer to the constant
pool cannot have contextual behavior, because there is no place to
store metadata to adjust the behavior. The
areturn instruction is
an exception to this observation; it is a candidate for bytecode
rewriting to gate the extra null check for applicable methods.
Some adjustments may be needed for the various reflection APIs, in order to bring them into alignment with the changed bytecode.
Class.castshould be given a null-hostile partner
Class.castValue, to emulate the updated
Fieldshould be given a dynamic
withfield, and the
LookupAPI given a way to surface the corresponding MH.
Class.getValueTypes, to reflect the attribute, may be useful.
The details are complex, but the story as a whole becomes more intelligible when we require each classfile to locally declare its value types, and handle all values appropriately according to the local declaration.
Outside of legacy code, and at its boundaries, tight control of null values is feasible. Inside value-rich code, and across value-rich APIs, full optimization seems within reach.
Potential problems with ambiguity in L-world are effectively addressed
by a systematic side channel for local value type declarations,
assisting the interpratation of
L-type descriptors. This side
channel can be the