State of Enhanced Enums

State of Enhanced Enums

December 2018: (v. 0.1)

Maurizio Cimadamore

The goal of JEP 301 (Enhanced Enums) is to enhance the expressiveness of the enum construct in the Java Language in two ways: first, by allowing type-variables in enums (generic enums); secondly, by performing sharper type-checking for enum constants. The two improvements strengthen each other, as demonstrated in the following example (taken from the JEP):

enum Argument<X> { // declares generic enum
   STRING<String>(String.class), 
   INTEGER<Integer>(Integer.class), ... ;

   Class<X> clazz;

   Argument(Class<X> clazz) { this.clazz = clazz; }

   Class<X> getClazz() { return clazz; }
}

//uses sharper typing of enum constant
Class<String> cs = Argument.STRING.getClazz();

In this example, Argument is a generic enum. Its declaration introduces a type parameter (namely, X), which can be instantiated in the subsequent enum constant declarations - e.g. STRING is a constant for which X is replaced with the concrete type java.lang.String. A generic enum can refer to its type parameters in the same way as one would expect in a class declaration - here we have the signature of both the constructor and the getClazz method depend on X.

When using generic enum constants, we can see sharper typing in action: instead of conservatively typing the constant STRING as Argument, the compiler goes further, and obtains the type Argument<String>. This idiom is very powerful and effectively allows enums to be used as type tokens that can act as inputs during e.g. the type inference process.

JEP 301: A quick recap

The text of JEP 301 describes the changes required to support generic enums and sharp constant typing in great details. In this section we will try to summarize the key points of the analysis in JEP 301, which will provide some context to the reader.

Binary compatibility

Let's consider the following code:

enum Test {
   A { void a() { } }
   B { void b() { } }
}

This code is currently translated by javac as follows:

/* enum */ class Test extends Enum<Test> {
   static Test A = new Test() { void a() { } }
   static Test B = new Test() { void b() { } }
}

If we allow sharper type for enum constants, one option would be to translate the code as follows:

/* enum */ class Test extends Enum<Test> {
   static Test$1 A = new Test() { void a() { } }
   static Test$2 B = new Test() { void b() { } }
}

This scheme however, breaks binary compatibility; consider a legacy classfile containing the following opcode:

getstatic Test::A

This code will obviously stop working if recompiling Test will change the enum constant types. To accommodate that, JEP 301 proposes that, instead of sharpening the types of the (synthetic) fields holding the enum constants, the compiler could resort to synthetic casts in order to access to sharp members. For instance, access to A::a can be achieved as follows:

getstatic Test::A
checkcast Test$1
invokevirtual Test$1::a

This highlights another issue: the names of the synthetic classes associated with the enum constants are very opaque, and prone to breaking binary compatibility e.g. in case the order in which the constants are declared in the enum declaration is changed. To provide more robust binary compatibility guarantees, JEP 301 proposes that more transparent names - such as Test$A and Test$B - should be used instead. The JEP text illustrates the impact of this alternate naming scheme on serialization and reflection - for the sake of brevity, we won't cover such details here.

Accessibility

As we have seen in the previous section, classfiles referring to sharp enum constant members need to reference the class in which such members are declared (e.g. for A::a either Test$1 or Test$A). This poses some challenges when it comes to access control: if we are trying to access a public member on a constant declared in a public enum, we might encounter access errors, which are caused by the fact that the synthetic classes generated by javac holding the enum constants are implicitly package private. Therefore, to avoid these accessibility issues, JEP 301 proposes that the synthetic classes generated by javac inherit the same visibility as the enum declaration in which they appear.

Source compatibility

The fact that enum constants are sharply typed can expose some source compatibility edges, as illustrated in the following case:

List.of(Test.A)

Without the enhancements proposed in JEP 301, the type of this expression would simply be List<Test>. But, if we apply sharp typing of enum constants, the expression will have type List<Test$1>, which is incompatible with the original type. Now, the JEP stresses how, in most cases, target-typing makes this a non-issue (the inference machinery will produce a type that is compatible with the least strict type constraint). Some corpus analysis has been carried out on the JDK, where it was shown that this was indeed not a big issue in terms of source compatibility.

An experiment

All the features described in JEP 301 have been implemented and made available in the enhanced-enums amber branch. The size of the patch is relatively small (most of the changes are in the parser, in order to support generic syntax on enums). While all test runs have come back clear, we felt that such a change needed some real world validation; for this reason, we tried to use generic enums in the context of the javac compiler.

One experiment we wanted to make was to see if enhanced enums could make javac's own Option enum better. This enum defines a bunch of constants modeling javac command line arguments (e.g -classpath, -d, etc.). Most options, along with the value of their arguments are simply stored into a Map<String, String>, where they can be looked up. Since option values are encoded as Strings, it's up to the client to parse these values in a suitable way.

With enhanced enums it should be possible to do better than this; more specifically, if enums supported generics, each option could specify a type argument - that is, the type of the value associated to that particular option.

So, an option with a plain string argument would be encoded as Option<String>, as follows:

D<String>("-d", ...)

While an option for which multiple choices are available, could be encoded using an enum set as a type-argument - for instance:

G_CUSTOM<EnumSet<DebugOption>>("-g:",  ...)

where DebugOption would be defined as follows:

enum DebugOption {
     LINES, VARS, SOURCE;
}

So, instead of storing all options (and values) into a Map<String, String>, we could store them into a Map<Option, Object>. Then, we could turn the Options::get method from this:

public String get(String option) { ... }

into something like this:

public Z get(Option<Z> option) { ... }

Now clients would be able to access to option values as follows:

boolean g_vars = options.get(Option.G_CUSTOM)
                        .contains(DebugOption.VARS);

Note how we raised the expressiveness level of the client, which no longer has to do parsing duties (and domain conversions). All things considered, this experiment looked as a reasonably comprehensive (and complex) real-world validation test for the feature.

Dead end?

The experiment, as well as the discussion of its results can be found here. As the name of this section implies, the results of this experiment were not as successful as we would have hoped. As soon as we generified the javac Option enum, the compiler started to report many errors - most of which had to do with EnumSet and EnumMap used in combination with the newly generified enum.

The crux of the issue can be summarized as follows; let R be a raw reference to some generic enum of the kind R<X>. Then, the type EnumSet<R>, is not a well-formed parameterized type. To see why that's the case, let's look at the declaration of the EnumSet class:

class EnumSet<E extends Enum<E>> { ... }

here, the type parameter E has an f-bound. In concrete terms, to check well-formedness of the type term EnumSet<Foo>, we have to check that:

R <: [E:=R]Enum<E>

That is, the actual type-argument must conform to its declared bound. But if we follow that check, we obtain:

R <: [E:=R]Enum<E>
R <: Enum<R>
Enum <: Enum<R> (*)
false

The last step of the derivation is where things go wrong: since R is a raw type, all its supertypes will be erased as per JLS 4.8. This means that the supertype of R is not Enum<R> as one would expect, but just Enum. This leads the subtyping check (and the type validation that depends on it) to fail.

In other words, there is no way to express the type of a set of heterogeneous enum constants - in the aforementioned writeup we also show how using wildcards results in similar issues.

It appears we are doomed: if we can't make generic enums work with standard enum libraries such as EnumSet and EnumMap (and chances are that people will have written libraries with similar generic signatures outside of the JDK too), it seems there's no much point in supporting the features described in JEP 301.

Raw types: friend or foe?

The problem we hit in our experiment is, ultimately, a failure of raw types when it comes to source compatibility. More specifically, there are a number of situations in which the typing rules for raw types seem excessively restrictive:

While these rules were designed with an eye towards supporting legacy non-generic clients accessing generified libraries in a raw form, they did not take into consideration another (and possibly equally important) use case: parameterization of existing libraries already using generic types. Let's say that we have a non-generic class declaration:

class StringList extends ArrayList<String> {
    ...
}

A client would use the library in the usual way:

var sl = new StringList();
sl.add("Hello"); //ok
sl.add(42); //1 - ERROR
String s = sl.get(0); //2

Now, let's imagine that StringList is parameterized, as follows:

class StringList<X> extends ArrayList<String> {
    ...
}

That is, we have only added a type parameter - but we have not changed the semantics of the class. Unfortunately, the typing rules for raw types are set up in such a way that doing this modification breaks clients in unexpected ways; for instance, a client would no longer receive an error in (1), since the StringList::add now takes an Object, thanks to the erased supertype. Worse, a new error will be issued in (2), since the erased return type of StringList::get will be Object - which is incompatible with the expected type String. So, we have a case where simply adding a (unused) type parameter to an existing class is enough to break existing clients. These issues have already been reported in the past.

Migration-friendly raw types

Some of the issue described in the previous section could be mitigated by restricting the way in which erasure is applied to the supertypes and members of a raw type. One way to do this would be, for instance, to only erase those supertypes and members whose type actually depends on some of the type variables introduced by the class. This way, if the class doesn't use type-parameters, or if clients of the class never touch members involving the new type parameters, source compatibility would be preserved.

Of course, such an approach would not be free of its own source incompatibilities; eyeballing at the existing tests, we have identified some potential issues that could arise if we were to make typing rules for raw types more precise. In the following sections we will show some snippets illustrating potential issues that could arise by making typing of raw types sharper.

Cast conversion

Cast conversion is one of the areas that would be affected by a sharpening of the typing rules, as illustrated by the following example:

interface Super<P> {}

interface S<B> extends Y<Integer> {}
class T<B> extends X<Double> {}

...
S s = ...
T t = (T)s;

The cast in the last line currently works, without any warnings - S is a raw type so it is compatible with T (also a raw type). But if we make the type analysis sharper, by avoiding erasure for S and T supertypes (which, note, do not depend on the type parameters introduced in these classes), we would find that S and T, even if raw, are provably distinct (as they derive from different parameterizations of the generic Super interface).

Functional interfaces

Sharper typing of raw types can also affect the notion of functional interface, as shown in the code below:

interface Sup<X> {
    boolean m(X x);
}

interface Sub<X> extends Sup<String> {
    boolean m(String s);
}

Sub<Object> so = (s) -> true; //OK
Sub sr = (s) -> true; //ERROR

Again, we have a situation in which recursive erasure of raw type's supertypes leads to the type Sub not to be considered a functional interface - the raw type has two abstract members with incompatible signatures, m(Object) and m(String), respectively. Tweaking the above typing rules would allow Sub to be used as a functional interface here.

Heap pollution

Last, we will consider an example in which too broad erasure leads to missing errors in cases where heap pollution is 100% guaranteed:

interface Box<X> {
    X get();
}

class SubBox<X> implements Box<String> { 
    String s;
    Foo(String s) { this.s = s; }

    public String get() { return s; }
}


SubBox foo = new SubBox("Hello");
Box<Integer> bi = foo; //1
Integer i = bi.get(); //ALWAYS ERROR!

Here we have that SubBox implements Box<String>. Therefore, any attempt to assign a SubBox instance to Box<Integer> is flawed, regardless of whether raw types are involved: under the hood, SubBox::get will always return a String and any attempt to obtain anything else will fail with a runtime exception. Making typing of raw types more precise, will instead give a compile-time error in (1), which seems a reasonable consequence.

Back to generic enums: a way forward?

As mentioned previously, the root of the issues we have encountered when experimenting with generic enums can be traced back to the fact that the supertype of a raw enum type R is not Enum<R>, but just Enum. We have seen how this is a consequence of the (excessively?) broad erasure applied by the compiler when dealing with raw types.

Assuming we can revisit some of the rules regarding type-checking of raw types, an alternate story is possible: let's first show again the code for the Argument generic enum (for a full listing please refer to the beginning of this document):

enum Argument<X> {
   STRING<String>(String.class), 
   INTEGER<Integer>(Integer.class), ... ;

   ...
}

One question worth asking is: what is the direct supertype of Argument<X>? If the supertype is Enum<Argument<X>>, then no change to type-checking of raw types can help us - since the supertype mentions a type parameter defined in the very enum Argument<X>, the language has no choice but to erase that supertype.

But what if the supertype was chosen in a more appropriate (but less precise) way - e.g. what if it was just Enum<Argument> ? This way, we would have the following subtyping relationships (for the sharp type of the two constants STRING and INTEGER):

Argument<String> <: Enum<Argument> //STRING
Argument<Integer> <: Enum<Argument> //INTEGER

Since the supertype now does not depend on any type parameter, it would not be erased using more precise typing rules such as the ones described in the previous section. So, let us try to prove well-formedness for the type EnumSet<Argument> under the new rules:

Argument <: [E:=Argument]Enum<E>
Argument <: Enum<Argument>
Enum<Argument> <: Enum<Argument>
true

This time, since no erasure occurred when accessing the supertype of the raw enum type Argument, the subtyping derivation was able to complete normally. We finally have a way to create EnumSet and EnumMap of generic enums.

Synthetic enum members

In order to maximize compatibility, synthetic enum members such as values and valueOf also have to be defined in terms of raw types; for instance, in the case of the Argument enum shown above, we would have:

class /*enum*/ Argument<X> extends Enum<Argument> {
    public Argument[] values() { ... }
    public static Argument valueOf(String name) { ... }
}

Since these methods are meant to operate on heterogeneous collections of constants, the use of raw types is the most appropriate answer, which ensure smooth interoperability with other APIs such as streams:

Arrays.stream(Argument.values())
        .filter(a -> Number.class.isAssignableFrom(a.getClazz()))
        .collect(toCollection(() -> EnumSet.noneOf(Option.class)));

This works correctly and returns an expression of type EnumSet<Argument>, as expected.

Disable raw warnings for generic enums

As we have seen, raw types are an useful way to think about an enum as an heterogeneous collection of constants (which might have conflicting static types). As such, we propose to relax the raw types warning check provided by javac, so that usages of raw enums such as Argument will not result in a lint warning. There is a precedent here, as the compiler already disables raw type warnings for raw references to the type java.lang.Class, as these are very common (and, to some degree, unavoidable) when operating with Java reflection.

To erase or not to erase?

We have seen how our ability to rescue generic enums relies on being able to come up with more precise rules to deal with erasure in raw types members/supertypes. But we have also seen how altering these rules can lead to source compatibility issues. On the one hand, the more precise rules will disallow certain operations regardless of raw types being involved; on the other hand, there are instances (as with functional interfaces) where the updated rules can make previously invalid programs compile without errors.

There is a question on how broadly we want to affect typing rules for raw types; one option is to alter the rules for all type references; another, more conservative option, would be to use the refined rules only for enum raw type references (on the basis that generic enums and raw types will be frequently used together). The latter path has the clear advantage of avoiding all kinds of source compatibility issues, but it is more inconsistent - refactoring a generic enum into a class might lead to surprises.

Too raw?

Language purists might frown at the use of raw types described in this document; if Foo<X> is a generic enum shouldn't a type such as Foo<?> be preferred over the raw type reference Foo ? While in principle we could tweak the supertype of a generic enum to be Enum<Foo<?>> instead of Enum<Foo>, doing so would quickly run out of gas when considering examples such as the stream method chain shown above. The problem is that enum APIs such as EnumSet and EnumMap rely on type tokens, expressed as class literals. But the typing rule for class literals also involve raw types - e.g. the type of List.class is Class<List>, not Class<List<?>>. This mismatch would lead to errors very quickly if we went down the wildcards path for generic enums leaving class literal typing unchanged. Of course, with deeper type-system fixes it could be possible to retcon class literal typing - although this path is not free of compatibility challenges. For these reasons, we believe that the choice of raw types we made here is ultimately consistent with the language/API we have now.