Value-Based Frozen Arrays

John Rose (version 0.12)

Here is a detailed application of the rules for value-based classes to the concept of extending “classic” Java arrays with immutability. The basic idea is to define a method Arrays.freeze(a) (or a.freeze()) which produces an immutable copy of the array referenced by a. Like Arrays.copyOf(a) (or a.clone()), the operation does not change its operand a in any way.

To quote from the JDK 8 definition, instances of value-based classes:

are final and [immutable] (though may contain references to mutable objects);
have implementations of [equals], [hashCode], and [toString] which are computed solely from the instance’s state and not from its identity or the state of any other object or variable;
make no use of identity-sensitive operations such as [acmp] reference equality (==) between instances, [idhash] identity hash code of instances, or [sync] synchronization on an instance’s intrinsic lock;
are considered equal solely based on equals(), not based on reference equality (==);
do not have accessible constructors, but are instead instantiated through [factory] methods which make no committment as to the [identity] of returned instances;
are freely [substitutable] when equal, meaning that interchanging any two instances x and y that are equal according to equals() in any computation or method invocation should produce no visible change in behavior.

The term “value-based” is defined to apply evenly to whole classes. But in the case of frozen arrays, there is no whole class to call “value-based”. Rather, individual frozen arrays must be value-based instances of regular types like int[].

Note: The standard array types int[] and Object[] will, of course, never be value-based classes, since many of their instances are mutable. We could attempt to introduce new types for frozen (and/or mutable) arrays, along the lines of int @Frozen[] vs. int @Mutable[], but this appears to be needlessly disruptive to existing code bases.

Therefore, a few of the rules for value-based classes do not apply. Some (but not all) object arrays are frozen, but the following do not occur:

Fields of Object[] do not become final. (Obviously, since arrays don’t have fields.)
Codes that performs reference-equality tests on object array references do not become invalid.
The behavior of equals, hashCode, and toString methods do not change for non-frozen arrays. For compatibility, those arrays continue to inherit those behaviors from Object.

[immutable] A regular array can have any of its components updated (using aastore, etc.), but a frozen array will instead throw an exception (FrozenArrayStoreException or the like).

[equals] Array types need an equals method which respects frozen-ness by consulting Arrays.equals. For compatibility, non-frozen arrays must continue to use reference equality.

[hashCode] Array types need a hashCode method which respects frozen-ness by consulting Arrays.hashCode. For compatibility, non-frozen arrays must continue to use the identity hash code.

[toString] Array types need a toString method which respects frozen-ness by consulting Arrays.toString. For compatibility, non-frozen arrays must continue to use the simple string produced by Object.equals.

[acmp] The acmp instruction (reference equality operator) could be modified for frozen arrays, but I believe this is a bridge too far. The value-based doc carefully avoids going there. Instead, the system must guide coders away from relying on identity comparisons on frozen arrays. JDK methods which operate on arrays must be decoupled from identity comparisons on them, as appropriate.

Note: There are many occurrences of reference equality checks in the JDK, but most are backed up by calls to Object.equals which cover up any indeterminacy in acmp that might be caused by value-based instance semantics. Some comparisons will be inherently problematic, and we will need to use static analysis tools (like FindBugs) to amend them.

[idhash] Calls to System.identityHashCode on a frozen array are just as problematic as pointer comparisons. The safest thing to do is throw an exception (UnsupportedOperationException) when a frozen array (or any value-based object) is encountered. This means that a few library types (like IdentityHashMap) will fail when presented with frozen arrays, and will need to be upgraded to support them.

Note: Alternatively, the call could return a hash code, either instance-based as before, or content-based from Arrays.hashCode. We would need to issue caveats that are parallel to the caveats on pointer comparison. As with reference equality, there may be some low-level uses for identity hash code even on value-based objects, although users are told to make no expectations.

Note: Object.toString consults identityHashCode, so is doubly bad for frozen arrays. In any case, using Arrays.toString for Object.toString calls to frozen arrays will encourage users to adopt the arrays, since Object.toString is nearly useless on non-frozen arrays.

[sync] You can’t synchronize on a frozen array; an attempt to do so will throw IllegalMonitorStateException (or the like). Alternatively, the synchronization could be displaced to a coarsened monitor shared by many or all frozen arrays.

[factory] Freezing a non-frozen array reads all of its components and preserves them permanently in a fresh immutable copy of the array.

[identity] The JVM is free to use caching or any other means to provide previously frozen arrays to satisfy new freezing requests, if the previously frozen arrays have the same (==) components. In particular, freezing an already-frozen array, or any copy thereof, can return the original frozen array.

Note: Both expressions a.freeze() and a.clone() on all array types produce results with contents identical to the original. Unlike clone, freeze may return the same object more than once, as long as the contents are the same.

[substitutable] With the exception of reference equality and identity hash code (and classes like IdentityHashMap which use them), all operations on arrays treat frozen arrays of identical content as identical values. The JVM may perform optimizations that cause some reference-sensitive codes to produce unpredictable answers.

Note: Substitutability is the hardest part of the value-based contract to specify clearly. In the most extreme form of this rule, we could make the JIT and GC free to run around commoning up equivalent value-based objects, at any time. Getting the corner cases to behave well enough may require a complicated set of design compromises. For example, it might be best to amend IdentityHashMap with special handling of value-based objects; this which suggests the need for a general query System.isFrozen, for library codes to use if they need to adjust to value-based instances.

Other observations:

[dimensionality] If an array has two or more dimensions, its frozen status is logically independent from the status of any of its sub-arrays. Thus, an assignment a[i][j]=x might fail because the component a[i] is null or because the sub-array a[i] is frozen, but it will not fail merely because the array a itself is frozen. A frozen array can contain non-frozen sub-arrays, and a non-frozen array can contain frozen sub-arrays. Also, individual sub-arrays can be frozen or non-frozen, independently of each other. On the other hand, it is plausible that if the language were (in the future) to support direct declaration of frozen arrays, the freezing would typically apply equally to all sub-arrays.

Note: The Java Language Specification uses the term component to refer to a variable in an array which is reached by indexing the array once (e.g., a[i]). Such a variable is sub-array of the array if its dimension is greater than one. The term element is reserved for a variable which is reached by indexing D times, where D is the rank of the array (e.g., a[i][j], assuming a has two dimensions).

[null] Since the null reference does not refer to an array, it cannot be frozen. Thus, an expression Arrays.freeze(a) is liklely to elicit a NullPointerException when the operand is null, just as Arrays.copyOf does.

Note: In some contexts it will be reasonable to pretend that the result of freezing a null reference is the same (and unique) null reference. It is possible that if we introduce an operation Arrays.deepFreeze it will pass over null components (and perhaps any other non-array references) without changing them.

[bytecode] The array store bytecodes (aastore, iastore, etc.) need to be adjusted to throw the appropriate exception if the operand array is frozen. This check must be coordinated with the pre-existing checks (null reference, array index range, reference store check). It seems reasonable to order the check after index range and before any other store check.

Note: Since frozen-ness is a property of array instances, not array references, bytecodes which copy references (such as dup, aload_0, astore_1, etc.) are do not affect frozen-ness. All references to the same array refer either to a frozen array or a non-frozen array.

[reflection] Reflective APIs must respect frozen-ness. (jlr.Array.set needs to perform the same checks as the bytecode.)

[jni] Native APIs must respect frozen-ness. (There must be a way to protect against mutations from JNI code. The existing conventions for throwing errors are sufficient. The JNI support code must make the same checks as the bytecode.)

[unsafe] Any system codes that use Unsafe, such as deserialization and method handles, must must be adjusted to respect frozen-ness. Unsafe is not documented as being able to “stomp” on object headers or metadata, so there is no documented way for Unsafe to affect the frozen-ness of an array. Using Unsafe to set elements of a frozen array will have unpredictable consequences.

[serialization] The effect of serialization on frozen arrays must be defined. It is likely that all deserialized arrays will be mutable clones, although an immutable array option might be attractive to some users.

[language] None of the present points about the JVM have any direct bearing on any changes to the Java language which might support frozen arrays. Strictly speaking, no changes at all are needed. Although the notation a.freeze() appears to impinge either on the language or the class Object, it could be restated as a static method Arrays.freeze(a).

[debugging] In order to assess the viability of converting existing codes to use frozen instances, it may be desirable to implement a JVM mode which can assist the user in detecting and diagnosing code which violates the rules for value-based classes and instances. Specifically, dangerous uses of acmp and identityHashCode can be diagnosed.

[optimization] The system (and particularly the JIT) gets some extra freedom of action with frozen arrays, if value-based semantics are applied. Generally speaking, a chain of clone and freeze operations can be collapsed up to the oldest frozen operand. Of course, a double freeze can be the identity operation on the first frozen operand, but if intermediate non-frozen operands in such a chain are non-escaping and not modified, they can also be treated as frozen. The user model is that the sooner you freeze an array that won’t be further modified, the more optimizations the system can make. Also, re-freezing is desirable: It is cheap, and has the effect of narrowing the scope of any stray mutable copies of a copy-chain.

Note: Much of this logic is applicable to other legacy types, such as Integer or String. Experiments are required to assess whether these types could be made value-based, either fully, or (if public constructors are retained) on an instance-by-instance basis. See JEP 169 for more discussion.