% A Field Guide to Java Field Types % as told to the Valhalla Expert Group % PROVISIONAL PROVISIONAL November, 2021 DRAFT DRAFT YMMV # Oh, look, you found a Java field type (or are seeking one now). {#START} > **Fine print** looks like this. You can skip it. > The form of this Field Guide will be a series of questions, like a flow-chart. You can see a picture of this flow chart here: > Because this Field Guide was written on the cheap, it has no pictures. However, its author has drawn up a nice map of the Zoo of Java Field Types for you in another place: > There is no word on whether a Method Guide is in the offing. Don't wait up. # First, decide if it is a reference or primitive: {#reforprim} Pick one of the following questions to answer... * _nullable:_ [Can a variable of this type take `null` as a value?](#isitnullable) * _circular:_ [Does this type contain a field of its own type and/or does it require lazy loading?](#isitcircular) * _secure:_ [Must this type be securely constructed and/or safely published?](#isitsecure) * _polymorphic:_ [Is this type polymorphic, allowing values of more than one subtype?](#isitpolymorphic) Congratulations; if you answered any of the above questions you may well have provided answers for them all. ## _nullable:_ Can a variable of this type take `null` as a value? {#isitnullable} ### Yes, it likes `null` ⇒ it's a [reference type](#ref) {#nullable} If one can always assign `null` to a variable of this value, then `null` is also the default initial value of this type for fields and array elements. If this is the case you have found a [reference type](#ref)! ### It rejects `null` ⇒ it's a [primitive type](#prim) {#antinull} For any type that does not mix with `null`, the default initial value is some sort of zero-like scalar or aggregate. The fields of an aggregate default will be assigned (recursively) to their own particular default values. If this is the case you are looking at an [primitive type](#prim). ## _circular:_ Does it contain itself circularly? {#isitcircular} Does the type need to contain an instance of itself, as one of its fields? If so, it needs a circular reference to itself. > More subtly, is it part of a circle of types each of which contains the next one in the circle? If so, at least one type in the circle must have the power to refer to itself circularly, even though the reference is indirect through the other parties in the circle. ### Circular self-reference ⇒ must be a [reference type](#ref) {#circular} If this type needs to refer to itself, the cycle must be broken somewhere by a `null`. So this type is certainly a [reference type](#ref). > We don't want `null`-hostile types to be circular, since the default value of such a type would denote an endless loop through itself, like a hall of mirrors. While theoretically possible, this is not a desirable language feature. ### A non-recursive tuple ⇒ could be a [primitive type](#prim) {#nonrecursive} This type quickly bottoms out into fields which are scalar primitives and references. Although it could be a [reference type](#ref) (for other reasons) you may have found a [primitive type](#prim). You need to [try another question](#reforprim) to see if it really is a reference. ## _secure:_ Must every instance come from a constructor execution? {#isitsecure} Try this checklist, and see if you get any "yes" answers: - Does the constructor validate important class-level invariants? - Would the default all-zero state of the object be inconvenient if your clients got hold of it? - If someone were to create data races on your object, would you care, or would you simply say "don't do that"? If you said "yes", you are looking for secure construction. > If your object is mutable you might also be looking for synchronization; hold that thought for later. ### Secure construction is required ⇒ it's a [reference type](#ref) {#secure} It must never be the case that an instance of this type can ever be observed, by any thread, which was not produced by a successful exit from that type's constructor. To avoid leaking the all-zero default value of this type, arrays of this type must be initialized to `null`. Thus, this condition is really [nullability](#nullable) in disguise. > If the default value of this type were not `null` it would have to be a tuple of zero or `null` scalars. For such a type, clients would have to tolerate that default value, and methods might have to include of validity checking logic to check for that default value. > We are consciously avoiding the complexity of user-defined defaults, as that would create new kinds of initialization operations in the JVM beyond simply stamping down zero bits. You are certainly chasing a [reference type](#ref) here. ### (Should you be thinking about data races?) {#mayberaces} Races can subvert the apparent security of a class's constructors, separately from the effects of non-`null` default values. This complicated subject is discussed [further elsewhere](#races). A type may explicitly shift the burden of synchronizing against races onto its clients, as may be seen with [`ArrayList`] or [`HashMap`], whose documentation says something like **Note that this implementation is not synchronized**. Or, as in the case of iterators for those collections, a type may simply permit undefined behavior from data races, relying on lower-level guarantees from the JVM to prevent crashes. [`ArrayList`]: [`HashMap`]: Such a type's constructor logic may check against everyday bugs and edge cases. This level of validation, while not resistant to races, is often acceptable. If the type you are looking at works like this, you may indeed still be looking at an [extended primitive type](#extprim). ### All-zero default, maybe races ⇒ must be a [primitive type](#prim) {#loose} We might ask ourselves, why would we _not_ want enforcement of constructor security and the atomicity of variables? Well, defending a bunch of fields against default values and data races might have a cost in performance or footprint, since race protection often uses extra indirections or side locks or other fancy machinery. > For example, we probably don't care that the real and imaginary components of a complex number can race relative to each other. Race conditions on complex numbers can produce numeric errors, but (in most cases) having a mix of real and imaginary components from different threads does not seem any worse than a having a mix of whole values from those threads. We might feel differently if the components represented, instead, parts of a cryptographic key. > Loading and storing each of a group of loosely aggregated values is probably always going to be faster than loading or storing the group as an atomic whole. Probably complex numbers should not try to resist races, even if cryptographic keys should be more resistant. > And if you need race protection later on, you can wrap the whole thing in a type which _does_ prevent struct tearing. So if you only need a bunch of loosely coordinated field values, with a constructor performing "best efforts" validation, you might want to use an [extended primitive type](#exprim). And of course, if you don't need a "smart" constructor, simple records, arrays, or bare scalar primitives are all good options to consider. ## _polymorphic:_ Is it a polymorphic type? {#isitpolymorphic} Can values of this type possess distinct subtypes? > There might be only one such subtype, but that would count as distinct from _this_ type, the polymorphic supertype. ### Distinct subtypes ⇒ must be a [reference type](#ref) {#polymorphic} If this type needs to refer to values of other types, then this type is certainly a [reference type](#ref). Only references are polymorphic. > Some polymorphic types are `interfaces` or `abstract` classes, so that there are no actual instances of that exact type. But a class that is neither `abstract` nor `final` can have instances of its own exact type, _and_ is also polymorphic, since the type can refer to subclasses as well. `Object` is famously both concrete and polymorphic. It acts like an "honorary interface". Also you can also make an instance of plain old `Object`, though this is less useful than one might think. > A primitive can implement polymorphism indirectly by including polymorphic component fields. Or it can simulate polymorphism, alongside the Java type system, by internally encoding a type tag and providing cast-like extraction operations for the various possible alternative types. ### No subtypes ⇒ could be a [primitive type](#prim) {#invariant} A type allows no subtypes when its class is declared `final`, explicitly or implicitly. Such a type can be an [extended primitive type](#exprim) (which is implicitly declared `final`) or a [reference type](#ref) such as a `final` class or an `enum`. You need to [try another question](#reforprim) to see if it really is a reference. > A legacy [scalar primitive](#scalarprim) has hardwired rules that purport to define subtypes and supertypes. But `int` is a subtype of `long` only in the sense that there are specially defined conversions between the respective value sets. # _PRIM:_ It's a primitive, not a reference {#prim} Having ruled out references, now you know you are looking at a primitive type! One more quick question: ## Is this primitive defined by a class-like body? {#isitprimclass} ### No class, no fields ⇒ a legacy [scalar primitive](#scalarprim) {#scalarprim} If it has no fields, it's one of eight legacy types like `int`. Here you have one of the few denizens of the ancient island of [scalar primitives](#scalar). No class, no fields, no header word. Like Robinson Crusoe, it's [primitive as can be]. > Every data structure, when viewed as a large graph of aggregates, bottoms out (ignoring reference cycles) in some mix of scalar primitives and/or `null` references. When viewed as a local block of data (such as might be found in the JVM heap), every data structure bottoms out in a mix of scalar primitives and references, where each reference is either `null` or points to some other block of data. Visualized as a pointer, a reference is also a kind of scalar. In this view, scalars--both primitives and references--are the building blocks of Java data structures, which combine scalars using classes and arrays. [primitive as can be]: # _EX-PRIM:_ It's an extended primitive type {#exprim} This is an _extended primitive type_. Although it codes like a class, it works more like an `int`. With its class hat on, it has constructors, fields, methods, and supers. The language forces the class to be `final` as well as all of its fields. In fact, it will act like a [value-based class], with extra enforcement of the rules that apply to such classes. With its primitive hat on, it has no object identity, but rather acts like a loose bundle of variables, the instance fields (non-`static` fields) declared by its class. The default value of an extended primitive type is simply the individual default values of its fields. This type cannot directly represent the `null` reference. Also, it cannot contain fields of its own type, either directly or indirectly. > Although the class has a constructor, which may attempt to enforce class invariants, the constructor has no ability to reject the all-zero default value. Sorry, Mr. Constructor. > Data races among the component fields (or among the 32-bit words of `long` or `double` fields) can cause [struct tearing] on heap variables of this type, leading to paradoxical values becoming visible, which might have been rejected by constructor logic. Depending on context, you might prefer to think of instances of an extended primitive type behaving like a loose group of scalar values (old-school primitives and reference pointers) running around inside the computer in any of these forms: - flattened into a group of scalars laid out in a single array element or field - flattened into a group of registers holding an argument, return value, or local - broken down into a group of scalar IR nodes - viewed via a pointer, a group of scalars safely buffered together in a block on the heap (like an `int` boxed as an `Object`) All of these representations except the last are generically called "flattened" or "scalarized". The last one is called "buffered" or sometimes "boxed". Whether or not you think of any of these representations, the VM might be doing it behind your back. ### (Wait, isn't this stuff already done by Escape Analysis?) {#ea} No. Escape analysis looks good in theory but in practice it can only speculate, and so the wins it gives are unreliable, provisional, and limited. Extended primitives and pure objects provide a new contract that is a reliable, permanent, and global basis for a wider class of optimizations. > The flattening of object into scalar IR nodes is the triumphant result of the very famous "Escape Analysis" based optimizations. The reasoning is that if an object does not "escape" from a known set of uses in a sea of IR nodes, then its heap allocation can be deferred or elided. This is very cool when it happens, but it doesn't happen often enough. Pure reference and extended primitive types are defined in such a way that the heroics of Escape Analysis, per se, are not needed, once the JIT or JVM recognizes one of those types. The best Escape Analysis can do is to guess that all _known_ uses of a type are compatible with flattening and scalarization. But, in practice, Escape Analysis always fails to rule out some impending future use of an object that is incompatible with the flattening and scalarization; it must therefore be tentative and conservative, always read to roll back its decisions. Meanwhile, our new Valhalla types have flattening and scalarization "baked in" from the start. There is no need for global analysis heroics, and the benefits of flattening appear uniformly everywhere: In arguments and return values, on the heap, and in the IR itself. Escape Analysis never flattens with such consistency, except in the most feverish dreams of theorists. ### Would you like it in a box? ⇒ as a [pure object](#noid) {#boxed} Primitives are shape-shifters, so you may observe them in either of two forms, the `int`-like flattened form and the object-like reference form. Yes, after all this work identifying a primitive type, you might find it scuttling away to go be a reference after all. So every primitive declaration `P` defines an extended primitive type (named `P`). But there's more; the declaration also defines a pure object reference type `P.ref`. (Syntax is TBD; think `P.ref`, `P.box`, or perhaps even the deeply tantalizing `P?`.) Thus, no primitive type _is_ a reference type, but every primitive type _has_ a companion reference type. > It is not certain whether we want to say that `P` is an object type per se. And if not, `P` might not even be a class. We might be looking at a world of classes, interfaces, and primitives, all of them with class-like bodies, but all of them distinct kinds of declaration. Casting a primitive to `Object` repackages it as a reference to a pure object, keeping its value the same. (That is, each and every field value is preserved, and the class stays the same too.) We use the old term "boxing" for this, although there is no separate box _class_. > There may be slightly different rules for legacy primitives like `int`, which have their own old family relations to `Integer` and the other wrappers. Casting a reference to a primitive first checks that the reference is not null, and is in fact of the correct primitive type, and then just returns the primitive. Again, we use the old term "unboxing" for this, although there is no separate box to recycle. > Buffered values are not exactly boxes in the sense of Java autoboxing, because the JVM manages the transformation, not the language. The JVM might accommodate boxing and unboxing requests by buffering and unbuffering the value through the JVM's garbage collected heap. But because the box is virtually non-existent, the JVM can often keep packaging waste to an environmentally friendly minimum. When viewed as a reference `P.ref`, a primitive `P` continues to represent the same value set, except that the reference can also be set to `null`. The reference can of course be assigned to a polymorphic type like `Object`. > Even if an extended primitive is buffered on the heap, it does does not gain [object identity](#idobj) but rather behaves like an identity-free [pure object](#noid). In particular, the `==` operator (and `acmp` bytecode) performs field-wise equality checks as needed to see if two references to the same extended primitive type actually refer to the same value. Thus, the rules for equality are the same whether or not the compared primitives are buffered in the JVM heap, or whether one buffered value is being compared to itself or two distinct buffered values are compared to see if they are field-wise equal. ### You may like them in a tree? {#primtrees} An extended primitive can bend the rules against [circular self-reference](#circular) by referring to itself, from one of its own fields, by means of its own reference type. Such a reference can never be circular (try it!), but it can allow the primitive type to represent trees and DAGs. Thus, if you would like your primitive in a tree, you must also consent to have it in a box in a tree. > In principle, a tree of primitives could be the backing storage of a `List`, `Set`, or `Map` instance. If that were the case, then the `==` operator would perform structural checks on the elements, keys, or values of the instance. You may like them; you will see. (Thank you, Dr. Seuss.) # _REF:_ It's a reference type {#ref} By whatever route you got here, you have arrived at a reference type. Most types in Java are references, because reference types have so many useful properties. Next, quickly rule out whether this type is a very special classless reference type. ## Is this reference type defined by a class-like body? {#isitclassy} ### A length field and components ⇒ it's an [array](#array) {#array} Oh, look, you found an [array type](#array). (Hoo-array!) If you'd like to investigate its component type, you'll have to [start again at the top.](#reforprim) Every other reference type, as well as every [extended primitive](#exprim), enjoys the benefits of a class-like declaration. ## Is this reference type non-concrete? {#isitconcrete} That is, do all values of this type instances of some _other_ type? As a squirrely corner case, we define by fiat that `Object` is an "honorary interface". If you make an instance, as with `new Object()`, we shall pretend that the instance is of some subtype of the interface `Object`, a very uninteresting concrete class which looks a lot like `Object`. ### Not concrete ⇒ abstract polymorphic. {#polymorphicagain} We already encountered [polymorphic types]{#polymorphic} above in our quest to discover reference types. If a reference type cannot refer to instances of its own exact type, it must be polymorphic. > As a corner case, it might also be an empty type with no instances. The type `Void` is such a class. The only legitimate value of such a reference variable is `null`. For most practical purposes, an empty type works the same as an abstract polymorphic type. We briefly note that abstract polymorphic types come in several flavors: - interfaces - `Object` (when it is used as an "honorary interface") - generic type variables, as in `>` - `abstract` classes But you can stop asking questions now. We have no more guidance today regarding non-concrete types. The rest of this guide will help you recognize different kinds of concrete, class-based reference types. ## Next, does it have an object identity? {#isitidobj} Answer one of the following questions about your concrete reference type: * _mutable:_ [Are instances of this type mutable and/or synchronizable?](#isitmutable) * _ever new:_ [Does `new` always make a fresh instance?](#isitevernew) * _fast cmp:_ (For performance tweakers only.) [Must the equality operator `==` compile to a single comparison instruction?](#isitfastcmp) ## _mutable:_ Can you mutate or synchronize it? {#isitmutable} ### Mutable or `synchronized` member ⇒ must be an [identity object reference](#idobj) {#mutable} OK, so it has a non-`final` field, or a `synchronized` method, or it is in some other way open to side effects. In short, it is a _mutable_ object. Object identity is a necessary aspect of the bookkeeping of mutations. This classic [identity object class](#idobj) equips each of its instances with a separate object identity, to help organize its side effects. > Even if a class is completely empty, if it also allows subclasses (is not declared `final`) and is not marked `abstract` in an appropriate manner, the JVM must assume that any instance under this class could ultimately contain mutable state. Therefore, unless explicitly declared "pure", a class must incorporate object identity, so that any of its subclasses that eventually require object identity can inherit it. > Interfaces do not force object identity on their subtypes. Any interface permits subtypes which are free of object identity, except in the special condition that the interface is a subtype of the special marker type `IdentityObject`. ### No mutation or synchronization ⇒ could be a [pure object](#noid) {#immutable} It might be a [pure object](#noid) or it might have been coded--for whatever reason--as a non-pure object to an [identity object](#idobj). You might need to [try another question](#ref) to see whether it is pure or not. ## _ever new:_ Does `new` always make a fresh instance, with a new pointer? {#isitevernew} ### `new` makes fresh object identities ⇒ it's an [identity object](#idobj) {#evernew} Did you observe that every `new` expression makes a freshly allocated instance, every time? Even if you keep handing the same arguments to the constructor? That's because an object of this type remembers not only its field values, but also its own _object identity_, which was uniquely assigned when it was constructed. You have identified a classic [identity object](#idobj) type, which is a kind of reference type. ### Two `new` expressions, same value ⇒ must be a [pure object](#noid) {#twonewoneobj} The secret X-factor that makes `new` expression return a distinct value every time is [object identity](#idobj). That X-factor is turned off for [pure objects](#noid). ## _fast cmp:_ Does the equality operator (`==`) a simple pointer compare? {#isitfastcmp} Usually this aspect of reference processing is in the noise, but sometimes a highly observant programmer might wish for one behavior over the other. A single pointer comparison is probably faster in some microbenchmarks, while the fieldwise comparison mandated for pure objects (including primitives) might take a few more cycles. ### The `==` operator is a pointer compare ⇒ must be an [identity object](#idobj) {#fastcmp} Only [identity objects]() get to perform quick and simple comparison on their heap pointers. > Usually an identity object has a unique home location on the JVM heap. In that case, the address of the home location is a handy key for comparing object identities. But even then, "quick and simple comparison" is only an approximate idea. The GC might possibly be messing around with forwarding pointers under the covers. And the JIT often removes comparisons completely, when it proves the answer in advance. ### The `==` operator does more work ⇒ must be a [pure object](#noid) {#broadcmp} The definition of a [pure object](#noid) requires extra work from the `==` operator to examine all the relevant field values. Only pure objects (including primitives) perform `==` comparison on their field values. They can't use their heap pointers, since the same value might be present in several physical locations, including non-heap locations not reachable by a pointer. Primitives and pure objects never possess object identity. As a benefit of this, primitives and pure objects can be copied around freely, and the JVM doesn't need to keep track of which heap block (if any) the fields of the primitive or pure object was first stored into. # _IDENTITY:_ It's an identity object type {#idobj} Before Valhalla, every object, no matter how immutable, had its own identity. This is still a very common case. Object identity unlocks the ability to have mutable fields or synchronization. When present, it ensures that every `new` expression returns a new reference value, distinct from any previous reference that may also be observable. > Reasons to reach for object identity have been observed in passing above. They include mutation, synchronization, reliably distinct values (from constructors), and quick and simple comparison. > There are many use cases for object identity, even beyond those just listed, to the point it seems reasonable to leave it turned on all the time, as many languages do. But it has large costs in the end. In most cases, object identity is incompatible with flattening. But flattening pays off handsomely on modern hardware, because pointer chasing is now expensive, but a pointer, once it is chased, delivers several words (per cache line) of loosely coupled scalar values. > An object identity affects an object like an extra field storing a hypothetical _serial number_, with a different value for each different object. (You might even think of it as being stored in the object's header, although it's more properly identified as the _address_ of the header.) The serial number must be tracked carefully through all uses of the object. This makes it very hard to lift the object out of the heap and flatten it into registers, or into the body of a containing object. The best that can be done is to speculate or prove that the serial number is never, ever observed, but [this technique quickly runs out of steam](#ea). # _PURE:_ It's a pure object, a class-based aggregation of fields {#noid} A pure object is a Java object whose identity depends only on the value of its fields, which in fact must be `final`. Pure objects can be defined directly as classes, or indirectly as the reference box types derived from [extended primitives](#exprim). A pure object type is a balanced compromise between plain primitives and classic identity-laden reference types. It provides a useful trade-off point for programmers. - It is securely constructed and safely publishable via a reference. - It is nullable (has no mysterious "all zeros" default value). - Its comparison operation (`==`) works fieldwise, avoiding object identity. - It is flattenable and scalarizable like a primitive. Pure objects can be viewed as an upgrade to the concept of a [value-based class]. When a pure object reference is derived from a primitive value, it is not securely constructed, except in the limited sense that the implicit "constructor" which converts a bunch of scalar fields (with possibly broken invariants) to the box type captures those field values once and safely and stably publishes them forever after, whether they were valid or not. Thus, once a primitive is boxed, that particular pure object will never be subject to races. ## Equality vs. identity In a way that may seem surprising at first, the special capabilities of pure objects depend on their interaction with the lowly equality operator `==`. Two pure objects `x`, `y` (both of the same class) differ if and only if at least one of their fields `f` _differs somehow_ (`x.f != y.f`). That's pretty simple and reasonable. It also differs radically from equality of classic identity-laden object references, which simply determines (for better or worse) whether the two references point to an object which was created by a single `new` expression. > For pure object fields, the precise definition of "differs somehow" (`x.f != y.f`) considers the [physical bits][Float::floatToIntBits] of `float` and `double` fields, so two pure objects with `NaN` in a field can be equal (unlike the `NaN` compared to itself), unless the two `NaN`s have different detail bits. Similar caveat for negative-zero floating values. See [documentation for `Float::equals`][Float::equals]. [Float::floatToIntBits]: [Float::equals]: > So comparing two values simply distributes a bitwise comparison across the fields, since there is no object identity to observe. (Recursive reference processing might disturb this simple story.) Because of this it is now possible for two `new` expressions to produce the same value. The job of a pure object constructor is to produce a configuration of values valid for the declaring class; it is not to produce a completely new object identity. > If two pure objects `x`, `y` both have a reference field `r`, their equality comparison will require inspection of the values of the field `x.r`, `y.r`. If the contents of those two field variables are numerically equal, well and good. But if both fields point to a pure object, of the same class, then the JVM must recursively examine the fields of _those_ objects, looking for some `f` where `x.r.f != y.r.f`. After all, even if `x.r` and `y.r` are physically distinct pure objects, if they contain the same values, field-wise, then we must find that `x.r == y.r`. In principle this recursion could go one for a while. Only because of its treatment of `==`, a pure object is _freely copyable_. That means if the JVM (or it JIT or GC) detects that two copies of a pure object are identical, or constructs one copy from an original, either copy can serve as a replacement for the other copy. Copying is a physical process managed by the JVM, and it is impossible for the Java programmer to detect, because of the semantics of `==`. > A freely copyable type has no object identity or "home location" on the heap. Therefore it makes no sense to try to mutate its parts or synchronize on it. The best you can do, regarding mutation, is to make a new instance of the same type, with adjusted field values. This entails a trip through the constructor, or a return to the original default value, if it's a primitive. # _BONUS #1:_ A mini-guide to representations on the heap Instead of exploring the abundant garden of Java types, suppose you are creeping around in the underbrush of the JVM internals, looking at representations of these types. Here is a way to figure out, sometimes, what you are looking at: ## Is it a non-pointer? ⇒ Probably a primitive scalar There are a limited number of ways you can represent a `byte` value, and computers have marked preferences for how to do it. > There is an outside chance that a small numeric value might in fact represent a reference type (such as an `enum`), if the JVM can prove that there is a small, statically enumerable set of instances of that type. It's the JVM's secret whether it does this or not. > Also, `null` is usually a zero word. But see below. > By the way, when Java wants to work with native addresses (outside the JVM heap) it usually carries them around in `long` values. For safety these ugly things are often stored out of sight in private fields of reference types. ## Is it a JVM heap pointer? ⇒ Perhaps a plain reference The reference might refer to: - an identity object defined by a class - an array (which is also an identity object) - a pure object - a buffered primitive (which acts like a pure object) ## Is it a group of all of the above? ⇒ Something was flattened A lone scalar (pointer or non-pointer) might also be the only field of an extended primitive after the primitive has been flattened away. In fact, if you see two or more scalars in some sort of loose aggregation, they might collectively be the fields remaining after their containing primitive has been flattened away. Similarly, one or more scalars might be the fields of a pure object type, after it has been flattened away. In the case of pure object types, if the JVM flattens them, it may be obligated to figure out a way to distinguish the value `null` from the other proper values of the type. ## Is it nothing at all? ⇒ Could be an empty primitive If your value seems to be nowhere at all, consider if it might be a pure object type or extended primitive type with no fields: Representing such a value requires zero bits. ## Is it zero? ⇒ Could be a default value: `0`, `null`, `false` Down in the weeds it's hard to tell a zero from a `null`. > It is fairly likely (though not certain) that every initial value, whether a `null` reference or a zero or `false`, is represented by an all-zero bit pattern somewhere. Much like the C `calloc` function, the JVM often prefers to "stamp down" zero bits when it allocates a new chunk of heap. > One exception to this rule (`null` is a zero pointer) _might be_ that a pure object with a single reference field _could be_ flattened to a single pointer word, where if the word is zero, the the pure object reference is `null`, and therefore the state in which the field is `null` is represented by some non-zero value, such as `0x0001`, as long as this other ("sentinel") value is distinct from all possible proper references that might be stored in the field. Such a trick might help optimize `java.util.Optional`, if it were migrated to a pure reference class. ## Is it inconsistently flattened? ⇒ Let JVM be JVM The JVM has no obligation to use the "obvious" representation every time for some particular type. And if it comes up with a clever representation, it is not permanently obligated to use _that_ one either. > For example, the JVM could choose to use heap pointers to buffered primitives or pure objects when data races must be controlled. But when data races are not a problem (values on the stack or in registers, `final` variables, etc.) the JVM could choose to flatten those variables. Or not. # _BONUS #2:_ More about races {#races} Under some circumstances, merely having a constructor does not prevent the creation of invalid objects, because the object's substructure is complicated enough to allow data races. This point holds independently of whether the object's all-zero default value (if not `null`) would also pass muster through the constructor. According to the [Java Memory Model FAQ], a data race happens when one thread reads a variable, another thread writes the same variable, and there has is not enough synchronization between the threads to determine whether or not the read should observe the write. As one might expect there are more gory details, but that's enough for the present purpose. [Java Memory Model FAQ]: Races can create invalid object states, causing a composite Java type to lose its integrity, as designed by its author. Data races on the type's components can cause unrelated values to appear in one object as if they were the related work of a single thread. This can happen in three ways: - A classic Java object might independently update two of its mutable (non-`final`) fields in separate races. The `synchronized` keyword is the classic remedy for this. - Mutable `long` and `double` variables might update their high and low halves in separate races. In such cases we say the variable as a whole gets a "[non-atomic treatment]." No other legacy scalar primitives are allowed to do this. In practice, modern JVMs treat `long` and `double` atomically as well. - A mutable variable which is an [extended primitive](#extprim) might undergo [struct tearing], if the primitive has two or more fields, and the JVM finds it inconvenient to package those fields atomically. Two sub-fields of an aggregate extended primitive can "race apart" if a variable holding an aggregate of that type In the presence of data races, the author of the type would be unable to prove the security of the type's encapsulation and its intended invariants. A bug in client logic (a mistake or a sinister attack) could use data races to mix together the field values of two unrelated valid object states to create a possibly invalid third state, a non-constructed hybrid of the first two. These risks are present for a classic object with non-`final` fields, and also for an extended primitive, when it is _contained_ in a non-`final` field or an array element. The risk of data races can be addressed in several ways. - Only share primitive values across threads via `final` fields or [_references_ to primitive types](#box). A primitive value of type `P` cannot by torn by races, if the value is stored in a `final` field, or used via a reference. - Require client code to ensure that an affected mutable object or mutable primitive value is never processed by more than one thread at a time, using correct synchronization to order thread accesses. - Avoid using mutable objects and primitive values completely, preferring a [value-based class], especially a [pure object](#noid). The `final` fields of such objects make races impossible. - Mark primitive fields `volatile`. This enforces atomic treatment of the primitive, but will usually have extra costs, to implement the specific concurrency effects that `volatile` guarantees. [value-based class]: [non-atomic treatment]: [struct tearing]: