% A Field Guide to Java Field Types
% as told to the Valhalla Expert Group
% PROVISIONAL PROVISIONAL November, 2021 DRAFT DRAFT YMMV
# Oh, look, you found a Java field type (or are seeking one now). {#START}
> **Fine print** looks like this. You can skip it.
> The form of this Field Guide will be a series of questions,
like a flow-chart. You can see a picture of this flow
chart here:
> Because this Field Guide was written on the cheap, it has no
pictures. However, its author has drawn up a nice map of
the Zoo of Java Field Types for you in another place:
> There is no word on whether a Method Guide is in the
offing. Don't wait up.
# First, decide if it is a reference or primitive: {#reforprim}
Pick one of the following questions to answer...
* _nullable:_ [Can a variable of this type take `null` as a
value?](#isitnullable)
* _circular:_ [Does this type contain a field of its own type and/or
does it require lazy loading?](#isitcircular)
* _secure:_ [Must this type be securely constructed and/or safely
published?](#isitsecure)
* _polymorphic:_ [Is this type polymorphic, allowing values of more than
one subtype?](#isitpolymorphic)
Congratulations; if you answered any of the above questions
you may well have provided answers for them all.
## _nullable:_ Can a variable of this type take `null` as a value? {#isitnullable}
### Yes, it likes `null` ⇒ it's a [reference type](#ref) {#nullable}
If one can always assign `null` to a variable of this value,
then `null` is also the default initial value of this type
for fields and array elements. If this is the case you have
found a [reference type](#ref)!
### It rejects `null` ⇒ it's a [primitive type](#prim) {#antinull}
For any type that does not mix with `null`, the default
initial value is some sort of zero-like scalar or aggregate.
The fields of an aggregate default will be assigned
(recursively) to their own particular default values. If
this is the case you are looking at an [primitive
type](#prim).
## _circular:_ Does it contain itself circularly? {#isitcircular}
Does the type need to contain an instance of itself, as one
of its fields? If so, it needs a circular reference to
itself.
> More subtly, is it part of a circle of types each of which
contains the next one in the circle? If so, at least one
type in the circle must have the power to refer to itself
circularly, even though the reference is indirect through
the other parties in the circle.
### Circular self-reference ⇒ must be a [reference type](#ref) {#circular}
If this type needs to refer to itself, the cycle must be
broken somewhere by a `null`. So this type is certainly a
[reference type](#ref).
> We don't want `null`-hostile types to be circular, since
the default value of such a type would denote an endless
loop through itself, like a hall of mirrors. While
theoretically possible, this is not a desirable language
feature.
### A non-recursive tuple ⇒ could be a [primitive type](#prim) {#nonrecursive}
This type quickly bottoms out into fields which are scalar
primitives and references.
Although it could be a [reference type](#ref) (for other
reasons) you may have found a [primitive type](#prim). You
need to [try another question](#reforprim) to see if it
really is a reference.
## _secure:_ Must every instance come from a constructor execution? {#isitsecure}
Try this checklist, and see if you get any "yes" answers:
- Does the constructor validate important class-level
invariants?
- Would the default all-zero state of the object be
inconvenient if your clients got hold of it?
- If someone were to create data races on your object,
would you care, or would you simply say "don't do that"?
If you said "yes", you are looking for secure construction.
> If your object is mutable you might also be looking for
synchronization; hold that thought for later.
### Secure construction is required ⇒ it's a [reference type](#ref) {#secure}
It must never be the case that an instance of this type can
ever be observed, by any thread, which was not produced by a
successful exit from that type's constructor.
To avoid leaking the all-zero default value of this type,
arrays of this type must be initialized to `null`. Thus,
this condition is really [nullability](#nullable) in
disguise.
> If the default value of this type were not `null` it would
have to be a tuple of zero or `null` scalars. For such a
type, clients would have to tolerate that default value, and
methods might have to include of validity checking logic to
check for that default value.
> We are consciously avoiding the complexity of user-defined
defaults, as that would create new kinds of initialization
operations in the JVM beyond simply stamping down zero bits.
You are certainly chasing a [reference type](#ref) here.
### (Should you be thinking about data races?) {#mayberaces}
Races can subvert the apparent security of a class's
constructors, separately from the effects of non-`null`
default values. This complicated subject is discussed
[further elsewhere](#races).
A type may explicitly shift the burden of synchronizing
against races onto its clients, as may be seen with
[`ArrayList`] or [`HashMap`], whose documentation says
something like **Note that this implementation is not
synchronized**. Or, as in the case of iterators for those
collections, a type may simply permit undefined behavior
from data races, relying on lower-level guarantees from the
JVM to prevent crashes.
[`ArrayList`]:
[`HashMap`]:
Such a type's constructor logic may check against everyday
bugs and edge cases. This level of validation, while not
resistant to races, is often acceptable. If the type you
are looking at works like this, you may indeed still be
looking at an [extended primitive type](#extprim).
### All-zero default, maybe races ⇒ must be a [primitive type](#prim) {#loose}
We might ask ourselves, why would we _not_ want enforcement
of constructor security and the atomicity of variables?
Well, defending a bunch of fields against default values and
data races might have a cost in performance or footprint,
since race protection often uses extra indirections or side
locks or other fancy machinery.
> For example, we probably don't care that the real and
imaginary components of a complex number can race relative
to each other. Race conditions on complex numbers can
produce numeric errors, but (in most cases) having a mix of
real and imaginary components from different threads does
not seem any worse than a having a mix of whole values from
those threads. We might feel differently if the components
represented, instead, parts of a cryptographic key.
> Loading and storing each of a group of loosely aggregated
values is probably always going to be faster than loading or
storing the group as an atomic whole. Probably complex
numbers should not try to resist races, even if
cryptographic keys should be more resistant.
> And if you need race protection later on, you can wrap the
whole thing in a type which _does_ prevent struct tearing.
So if you only need a bunch of loosely coordinated field
values, with a constructor performing "best efforts"
validation, you might want to use an [extended primitive
type](#exprim).
And of course, if you don't need a "smart" constructor,
simple records, arrays, or bare scalar primitives are all
good options to consider.
## _polymorphic:_ Is it a polymorphic type? {#isitpolymorphic}
Can values of this type possess distinct subtypes?
> There might be only one such subtype, but that would count
as distinct from _this_ type, the polymorphic supertype.
### Distinct subtypes ⇒ must be a [reference type](#ref) {#polymorphic}
If this type needs to refer to values of other types, then
this type is certainly a [reference type](#ref). Only
references are polymorphic.
> Some polymorphic types are `interfaces` or `abstract`
classes, so that there are no actual instances of that exact
type. But a class that is neither `abstract` nor `final`
can have instances of its own exact type, _and_ is also
polymorphic, since the type can refer to subclasses as well.
`Object` is famously both concrete and polymorphic. It acts
like an "honorary interface". Also you can also make an
instance of plain old `Object`, though this is less useful
than one might think.
> A primitive can implement polymorphism indirectly by
including polymorphic component fields. Or it can simulate
polymorphism, alongside the Java type system, by internally
encoding a type tag and providing cast-like extraction
operations for the various possible alternative types.
### No subtypes ⇒ could be a [primitive type](#prim) {#invariant}
A type allows no subtypes when its class is declared
`final`, explicitly or implicitly. Such a type can be an
[extended primitive type](#exprim) (which is implicitly
declared `final`) or a [reference type](#ref) such as a
`final` class or an `enum`.
You need to [try another question](#reforprim) to see if it
really is a reference.
> A legacy [scalar primitive](#scalarprim) has hardwired rules
that purport to define subtypes and supertypes. But `int`
is a subtype of `long` only in the sense that there are
specially defined conversions between the respective value
sets.
# _PRIM:_ It's a primitive, not a reference {#prim}
Having ruled out references, now you know you are looking at
a primitive type! One more quick question:
## Is this primitive defined by a class-like body? {#isitprimclass}
### No class, no fields ⇒ a legacy [scalar primitive](#scalarprim) {#scalarprim}
If it has no fields, it's one of eight legacy types like
`int`. Here you have one of the few denizens of the ancient
island of [scalar primitives](#scalar). No class, no
fields, no header word. Like Robinson Crusoe, it's
[primitive as can be].
> Every data structure, when viewed as a large graph of
aggregates, bottoms out (ignoring reference cycles) in some
mix of scalar primitives and/or `null` references. When
viewed as a local block of data (such as might be found in
the JVM heap), every data structure bottoms out in a mix of
scalar primitives and references, where each reference is
either `null` or points to some other block of data.
Visualized as a pointer, a reference is also a kind of
scalar. In this view, scalars--both primitives and
references--are the building blocks of Java data structures,
which combine scalars using classes and arrays.
[primitive as can be]:
# _EX-PRIM:_ It's an extended primitive type {#exprim}
This is an _extended primitive type_. Although it codes
like a class, it works more like an `int`.
With its class hat on, it has constructors, fields, methods,
and supers. The language forces the class to be `final` as
well as all of its fields. In fact, it will act like a
[value-based class], with extra enforcement of the rules
that apply to such classes.
With its primitive hat on, it has no object identity, but
rather acts like a loose bundle of variables, the instance
fields (non-`static` fields) declared by its class.
The default value of an extended primitive type is simply
the individual default values of its fields. This type
cannot directly represent the `null` reference. Also, it
cannot contain fields of its own type, either directly or
indirectly.
> Although the class has a constructor, which may attempt to
enforce class invariants, the constructor has no ability to
reject the all-zero default value. Sorry, Mr. Constructor.
> Data races among the component fields (or among the 32-bit
words of `long` or `double` fields) can cause [struct
tearing] on heap variables of this type, leading to
paradoxical values becoming visible, which might have been
rejected by constructor logic.
Depending on context, you might prefer to think of instances
of an extended primitive type behaving like a loose group of
scalar values (old-school primitives and reference pointers)
running around inside the computer in any of these forms:
- flattened into a group of scalars laid out in a single
array element or field
- flattened into a group of registers
holding an argument, return value, or local
- broken down into a group of scalar IR nodes
- viewed via a pointer, a group of scalars safely buffered
together in a block on the heap (like an `int` boxed as
an `Object`)
All of these representations except the last are generically
called "flattened" or "scalarized". The last one is called
"buffered" or sometimes "boxed".
Whether or not you think of any of these representations,
the VM might be doing it behind your back.
### (Wait, isn't this stuff already done by Escape Analysis?) {#ea}
No. Escape analysis looks good in theory but in practice it
can only speculate, and so the wins it gives are unreliable,
provisional, and limited. Extended primitives and pure
objects provide a new contract that is a reliable,
permanent, and global basis for a wider class of
optimizations.
> The flattening of object into scalar IR nodes is the
triumphant result of the very famous "Escape Analysis" based
optimizations. The reasoning is that if an object does not
"escape" from a known set of uses in a sea of IR nodes, then
its heap allocation can be deferred or elided. This is very
cool when it happens, but it doesn't happen often enough.
Pure reference and extended primitive types are defined in
such a way that the heroics of Escape Analysis, per se, are
not needed, once the JIT or JVM recognizes one of those
types. The best Escape Analysis can do is to guess that all
_known_ uses of a type are compatible with flattening and
scalarization. But, in practice, Escape Analysis always
fails to rule out some impending future use of an object
that is incompatible with the flattening and scalarization;
it must therefore be tentative and conservative, always read
to roll back its decisions. Meanwhile, our new Valhalla
types have flattening and scalarization "baked in" from the
start. There is no need for global analysis heroics, and
the benefits of flattening appear uniformly everywhere: In
arguments and return values, on the heap, and in the IR
itself. Escape Analysis never flattens with such
consistency, except in the most feverish dreams of
theorists.
### Would you like it in a box? ⇒ as a [pure object](#noid) {#boxed}
Primitives are shape-shifters, so you may observe them in
either of two forms, the `int`-like flattened form and the
object-like reference form. Yes, after all this work
identifying a primitive type, you might find it scuttling
away to go be a reference after all.
So every primitive declaration `P` defines an extended primitive
type (named `P`). But there's more; the declaration also defines
a pure object reference type `P.ref`. (Syntax is TBD; think
`P.ref`, `P.box`, or perhaps even the deeply tantalizing
`P?`.) Thus, no primitive type _is_ a reference type, but
every primitive type _has_ a companion reference type.
> It is not certain whether we want to say that `P` is an
object type per se. And if not, `P` might not even be a
class. We might be looking at a world of classes,
interfaces, and primitives, all of them with class-like
bodies, but all of them distinct kinds of declaration.
Casting a primitive to `Object` repackages it as a reference
to a pure object, keeping its value the same. (That is,
each and every field value is preserved, and the class stays
the same too.) We use the old term "boxing" for this,
although there is no separate box _class_.
> There may be slightly different rules for legacy
primitives like `int`, which have their own old family
relations to `Integer` and the other wrappers.
Casting a reference to a primitive first checks that the
reference is not null, and is in fact of the correct
primitive type, and then just returns the primitive. Again,
we use the old term "unboxing" for this, although there is
no separate box to recycle.
> Buffered values are not exactly boxes in the sense of Java
autoboxing, because the JVM manages the transformation, not
the language. The JVM might accommodate boxing and unboxing
requests by buffering and unbuffering the value through the
JVM's garbage collected heap. But because the box is
virtually non-existent, the JVM can often keep packaging
waste to an environmentally friendly minimum.
When viewed as a reference `P.ref`, a primitive `P`
continues to represent the same value set, except that the
reference can also be set to `null`. The reference can of
course be assigned to a polymorphic type like `Object`.
> Even if an extended primitive is buffered on the heap, it
does does not gain [object identity](#idobj) but rather
behaves like an identity-free [pure object](#noid).
In particular, the `==` operator (and `acmp` bytecode)
performs field-wise equality checks as needed to see if two
references to the same extended primitive type actually
refer to the same value. Thus, the rules for equality are
the same whether or not the compared primitives are buffered
in the JVM heap, or whether one buffered value is being
compared to itself or two distinct buffered values are
compared to see if they are field-wise equal.
### You may like them in a tree? {#primtrees}
An extended primitive can bend the rules against [circular
self-reference](#circular) by referring to itself, from one
of its own fields, by means of its own reference type. Such
a reference can never be circular (try it!), but it can
allow the primitive type to represent trees and DAGs. Thus,
if you would like your primitive in a tree, you must also
consent to have it in a box in a tree.
> In principle, a tree of primitives could be the backing
storage of a `List`, `Set`, or `Map` instance. If that were
the case, then the `==` operator would perform structural
checks on the elements, keys, or values of the instance.
You may like them; you will see. (Thank you, Dr. Seuss.)
# _REF:_ It's a reference type {#ref}
By whatever route you got here, you have arrived at a
reference type. Most types in Java are references, because
reference types have so many useful properties.
Next, quickly rule out whether this type is a very special
classless reference type.
## Is this reference type defined by a class-like body? {#isitclassy}
### A length field and components ⇒ it's an [array](#array) {#array}
Oh, look, you found an [array type](#array). (Hoo-array!)
If you'd like to investigate its component type, you'll have
to [start again at the top.](#reforprim)
Every other reference type, as well as every [extended
primitive](#exprim), enjoys the benefits of a class-like
declaration.
## Is this reference type non-concrete? {#isitconcrete}
That is, do all values of this type instances of some
_other_ type?
As a squirrely corner case, we define by fiat that `Object`
is an "honorary interface". If you make an instance, as
with `new Object()`, we shall pretend that the instance is
of some subtype of the interface `Object`, a very
uninteresting concrete class which looks a lot like
`Object`.
### Not concrete ⇒ abstract polymorphic. {#polymorphicagain}
We already encountered [polymorphic types]{#polymorphic}
above in our quest to discover reference types. If a
reference type cannot refer to instances of its own exact
type, it must be polymorphic.
> As a corner case, it might also be an empty type with no
instances. The type `Void` is such a class. The only
legitimate value of such a reference variable is `null`.
For most practical purposes, an empty type works the same as
an abstract polymorphic type.
We briefly note that abstract polymorphic types come in
several flavors:
- interfaces
- `Object` (when it is used as an "honorary interface")
- generic type variables, as in `>`
- `abstract` classes
But you can stop asking questions now. We have no more
guidance today regarding non-concrete types. The rest of
this guide will help you recognize different kinds of
concrete, class-based reference types.
## Next, does it have an object identity? {#isitidobj}
Answer one of the following questions about your
concrete reference type:
* _mutable:_ [Are instances of this type mutable and/or
synchronizable?](#isitmutable)
* _ever new:_ [Does `new` always make a fresh instance?](#isitevernew)
* _fast cmp:_ (For performance tweakers only.) [Must the equality
operator `==` compile to a single comparison
instruction?](#isitfastcmp)
## _mutable:_ Can you mutate or synchronize it? {#isitmutable}
### Mutable or `synchronized` member ⇒ must be an [identity object reference](#idobj) {#mutable}
OK, so it has a non-`final` field, or a `synchronized`
method, or it is in some other way open to side effects. In
short, it is a _mutable_ object. Object identity is a
necessary aspect of the bookkeeping of mutations.
This classic [identity object class](#idobj) equips each of
its instances with a separate object identity, to help
organize its side effects.
> Even if a class is completely empty, if it also allows
subclasses (is not declared `final`) and is not marked
`abstract` in an appropriate manner, the JVM must assume
that any instance under this class could ultimately contain
mutable state. Therefore, unless explicitly declared
"pure", a class must incorporate object identity, so that
any of its subclasses that eventually require object
identity can inherit it.
> Interfaces do not force object identity on their subtypes.
Any interface permits subtypes which are free of object
identity, except in the special condition that the interface
is a subtype of the special marker type `IdentityObject`.
### No mutation or synchronization ⇒ could be a [pure object](#noid) {#immutable}
It might be a [pure object](#noid) or it might have been
coded--for whatever reason--as a non-pure object to an
[identity object](#idobj). You might need to [try another
question](#ref) to see whether it is pure or not.
## _ever new:_ Does `new` always make a fresh instance, with a new pointer? {#isitevernew}
### `new` makes fresh object identities ⇒ it's an [identity object](#idobj) {#evernew}
Did you observe that every `new` expression makes a freshly
allocated instance, every time? Even if you keep handing
the same arguments to the constructor? That's because an
object of this type remembers not only its field values, but
also its own _object identity_, which was uniquely assigned
when it was constructed.
You have identified a classic [identity object](#idobj)
type, which is a kind of reference type.
### Two `new` expressions, same value ⇒ must be a [pure object](#noid) {#twonewoneobj}
The secret X-factor that makes `new` expression return a
distinct value every time is [object identity](#idobj).
That X-factor is turned off for [pure objects](#noid).
## _fast cmp:_ Does the equality operator (`==`) a simple pointer compare? {#isitfastcmp}
Usually this aspect of reference processing is in the noise,
but sometimes a highly observant programmer might wish for
one behavior over the other. A single pointer comparison is
probably faster in some microbenchmarks, while the fieldwise
comparison mandated for pure objects (including primitives)
might take a few more cycles.
### The `==` operator is a pointer compare ⇒ must be an [identity object](#idobj) {#fastcmp}
Only [identity objects]() get to perform quick and simple
comparison on their heap pointers.
> Usually an identity object has a unique home location on
the JVM heap. In that case, the address of the home
location is a handy key for comparing object identities.
But even then, "quick and simple comparison" is only an
approximate idea. The GC might possibly be messing around
with forwarding pointers under the covers. And the JIT
often removes comparisons completely, when it proves the
answer in advance.
### The `==` operator does more work ⇒ must be a [pure object](#noid) {#broadcmp}
The definition of a [pure object](#noid) requires extra
work from the `==` operator to examine all the relevant
field values.
Only pure objects (including primitives) perform `==`
comparison on their field values. They can't use their heap
pointers, since the same value might be present in several
physical locations, including non-heap locations not
reachable by a pointer.
Primitives and pure objects never possess object identity.
As a benefit of this, primitives and pure objects can be
copied around freely, and the JVM doesn't need to keep track
of which heap block (if any) the fields of the primitive or
pure object was first stored into.
# _IDENTITY:_ It's an identity object type {#idobj}
Before Valhalla, every object, no matter how immutable, had
its own identity. This is still a very common case.
Object identity unlocks the ability to have mutable fields
or synchronization. When present, it ensures that every
`new` expression returns a new reference value, distinct
from any previous reference that may also be observable.
> Reasons to reach for object identity have been observed in
passing above. They include mutation, synchronization,
reliably distinct values (from constructors), and quick
and simple comparison.
> There are many use cases for object identity, even beyond
those just listed, to the point it seems reasonable to leave
it turned on all the time, as many languages do. But it has
large costs in the end. In most cases, object identity is
incompatible with flattening. But flattening pays off
handsomely on modern hardware, because pointer chasing is
now expensive, but a pointer, once it is chased, delivers
several words (per cache line) of loosely coupled scalar
values.
> An object identity affects an object like an extra field
storing a hypothetical _serial number_, with a different
value for each different object. (You might even think of
it as being stored in the object's header, although it's
more properly identified as the _address_ of the header.)
The serial number must be tracked carefully through all uses
of the object. This makes it very hard to lift the object
out of the heap and flatten it into registers, or into the
body of a containing object. The best that can be done is
to speculate or prove that the serial number is never, ever
observed, but [this technique quickly runs out of
steam](#ea).
# _PURE:_ It's a pure object, a class-based aggregation of fields {#noid}
A pure object is a Java object whose identity depends only
on the value of its fields, which in fact must be `final`.
Pure objects can be defined directly as classes, or
indirectly as the reference box types derived from [extended
primitives](#exprim).
A pure object type is a balanced compromise between plain
primitives and classic identity-laden reference types. It
provides a useful trade-off point for programmers.
- It is securely constructed and safely publishable via a reference.
- It is nullable (has no mysterious "all zeros" default value).
- Its comparison operation (`==`) works fieldwise, avoiding object identity.
- It is flattenable and scalarizable like a primitive.
Pure objects can be viewed as an upgrade to the concept
of a [value-based class].
When a pure object reference is derived from a primitive
value, it is not securely constructed, except in the limited
sense that the implicit "constructor" which converts a bunch
of scalar fields (with possibly broken invariants) to the
box type captures those field values once and safely and
stably publishes them forever after, whether they were valid
or not. Thus, once a primitive is boxed, that particular
pure object will never be subject to races.
## Equality vs. identity
In a way that may seem surprising at first, the special
capabilities of pure objects depend on their interaction
with the lowly equality operator `==`.
Two pure objects `x`, `y` (both of the same class) differ if
and only if at least one of their fields `f` _differs
somehow_ (`x.f != y.f`). That's pretty simple and
reasonable.
It also differs radically from equality of classic
identity-laden object references, which simply determines
(for better or worse) whether the two references point to an
object which was created by a single `new` expression.
> For pure object fields, the precise definition of "differs
somehow" (`x.f != y.f`) considers the [physical
bits][Float::floatToIntBits] of `float` and `double` fields,
so two pure objects with `NaN` in a field can be equal
(unlike the `NaN` compared to itself), unless the two `NaN`s
have different detail bits. Similar caveat for
negative-zero floating values. See [documentation for
`Float::equals`][Float::equals].
[Float::floatToIntBits]:
[Float::equals]:
> So comparing two values simply distributes a bitwise
comparison across the fields, since there is no object
identity to observe. (Recursive reference processing might
disturb this simple story.) Because of this it is now
possible for two `new` expressions to produce the same
value. The job of a pure object constructor is to produce a
configuration of values valid for the declaring class; it is
not to produce a completely new object identity.
> If two pure objects `x`, `y` both have a reference field
`r`, their equality comparison will require inspection of
the values of the field `x.r`, `y.r`. If the contents of
those two field variables are numerically equal, well and
good. But if both fields point to a pure object, of the
same class, then the JVM must recursively examine the fields
of _those_ objects, looking for some `f` where `x.r.f !=
y.r.f`. After all, even if `x.r` and `y.r` are physically
distinct pure objects, if they contain the same values,
field-wise, then we must find that `x.r == y.r`. In
principle this recursion could go one for a while.
Only because of its treatment of `==`, a pure object is
_freely copyable_. That means if the JVM (or it JIT or GC)
detects that two copies of a pure object are identical, or
constructs one copy from an original, either copy can serve
as a replacement for the other copy. Copying is a physical
process managed by the JVM, and it is impossible for the
Java programmer to detect, because of the semantics of `==`.
> A freely copyable type has no object identity or "home
location" on the heap. Therefore it makes no sense to try
to mutate its parts or synchronize on it. The best you can
do, regarding mutation, is to make a new instance of the
same type, with adjusted field values. This entails a trip
through the constructor, or a return to the original default
value, if it's a primitive.
# _BONUS #1:_ A mini-guide to representations on the heap
Instead of exploring the abundant garden of Java types,
suppose you are creeping around in the underbrush of the JVM
internals, looking at representations of these types. Here
is a way to figure out, sometimes, what you are looking at:
## Is it a non-pointer? ⇒ Probably a primitive scalar
There are a limited number of ways you can represent a
`byte` value, and computers have marked preferences for how
to do it.
> There is an outside chance that a small numeric value
might in fact represent a reference type (such as an
`enum`), if the JVM can prove that there is a small,
statically enumerable set of instances of that type. It's
the JVM's secret whether it does this or not.
> Also, `null` is usually a zero word. But see below.
> By the way, when Java wants to work with native addresses
(outside the JVM heap) it usually carries them around in
`long` values. For safety these ugly things are often
stored out of sight in private fields of reference types.
## Is it a JVM heap pointer? ⇒ Perhaps a plain reference
The reference might refer to:
- an identity object defined by a class
- an array (which is also an identity object)
- a pure object
- a buffered primitive (which acts like a pure object)
## Is it a group of all of the above? ⇒ Something was flattened
A lone scalar (pointer or non-pointer) might also be the
only field of an extended primitive after the primitive has
been flattened away.
In fact, if you see two or more scalars in some sort of
loose aggregation, they might collectively be the fields
remaining after their containing primitive has been
flattened away.
Similarly, one or more scalars might be the fields of a pure
object type, after it has been flattened away. In the case
of pure object types, if the JVM flattens them, it may be
obligated to figure out a way to distinguish the value
`null` from the other proper values of the type.
## Is it nothing at all? ⇒ Could be an empty primitive
If your value seems to be nowhere at all, consider if it
might be a pure object type or extended primitive type with
no fields: Representing such a value requires zero bits.
## Is it zero? ⇒ Could be a default value: `0`, `null`, `false`
Down in the weeds it's hard to tell a zero from a `null`.
> It is fairly likely (though not certain) that every
initial value, whether a `null` reference or a zero or
`false`, is represented by an all-zero bit pattern
somewhere. Much like the C `calloc` function, the JVM often
prefers to "stamp down" zero bits when it allocates a new
chunk of heap.
> One exception to this rule (`null` is a zero pointer)
_might be_ that a pure object with a single reference field
_could be_ flattened to a single pointer word, where if the
word is zero, the the pure object reference is `null`, and
therefore the state in which the field is `null` is
represented by some non-zero value, such as `0x0001`, as
long as this other ("sentinel") value is distinct from all
possible proper references that might be stored in the
field. Such a trick might help optimize
`java.util.Optional`, if it were migrated to a pure
reference class.
## Is it inconsistently flattened? ⇒ Let JVM be JVM
The JVM has no obligation to use the "obvious"
representation every time for some particular type. And if
it comes up with a clever representation, it is not
permanently obligated to use _that_ one either.
> For example, the JVM could choose to use heap pointers to
buffered primitives or pure objects when data races must be
controlled. But when data races are not a problem (values
on the stack or in registers, `final` variables, etc.) the
JVM could choose to flatten those variables. Or not.
# _BONUS #2:_ More about races {#races}
Under some circumstances, merely having a constructor does
not prevent the creation of invalid objects, because the
object's substructure is complicated enough to allow data
races. This point holds independently of whether the
object's all-zero default value (if not `null`) would also
pass muster through the constructor.
According to the [Java Memory Model FAQ], a data race
happens when one thread reads a variable, another thread
writes the same variable, and there has is not enough
synchronization between the threads to determine whether or
not the read should observe the write. As one might expect
there are more gory details, but that's enough for the
present purpose.
[Java Memory Model FAQ]:
Races can create invalid object states, causing a composite
Java type to lose its integrity, as designed by its author.
Data races on the type's components can cause unrelated
values to appear in one object as if they were the related
work of a single thread. This can happen in three ways:
- A classic Java object might independently update two of
its mutable (non-`final`) fields in separate races. The
`synchronized` keyword is the classic remedy for this.
- Mutable `long` and `double` variables might update their
high and low halves in separate races. In such cases we
say the variable as a whole gets a "[non-atomic
treatment]." No other legacy scalar primitives are
allowed to do this. In practice, modern JVMs treat
`long` and `double` atomically as well.
- A mutable variable which is an [extended primitive](#extprim)
might undergo [struct tearing], if the primitive has two
or more fields, and the JVM finds it inconvenient to
package those fields atomically. Two sub-fields of an
aggregate extended primitive can "race apart" if a
variable holding an aggregate of that type
In the presence of data races, the author of the type would
be unable to prove the security of the type's encapsulation
and its intended invariants. A bug in client logic (a
mistake or a sinister attack) could use data races to mix
together the field values of two unrelated valid object
states to create a possibly invalid third state, a
non-constructed hybrid of the first two.
These risks are present for a classic object with
non-`final` fields, and also for an extended primitive, when
it is _contained_ in a non-`final` field or an array
element.
The risk of data races can be addressed in several ways.
- Only share primitive values across threads via `final`
fields or [_references_ to primitive types](#box). A
primitive value of type `P` cannot by torn by races, if
the value is stored in a `final` field, or used via a
reference.
- Require client code to ensure that an affected mutable
object or mutable primitive value is never processed by
more than one thread at a time, using correct
synchronization to order thread accesses.
- Avoid using mutable objects and primitive values
completely, preferring a [value-based class], especially
a [pure object](#noid). The `final` fields of such
objects make races impossible.
- Mark primitive fields `volatile`. This enforces atomic
treatment of the primitive, but will usually have extra
costs, to implement the specific concurrency effects
that `volatile` guarantees.
[value-based class]:
[non-atomic treatment]:
[struct tearing]: