Lambda Specification, Part G: Type Inference

Navigation: Overview - Part A - Part B - Part C - Part D - Part E - Part F - Part G - Part H - Part J
Sections: 18 - 18.1 - 18.1.1 - 18.1.2 - 18.1.3 - 18.2 - 18.2.1 - 18.2.1.1 - 18.2.1.2 - 18.2.2 - 18.2.3 - 18.2.4 - 18.2.5 - 18.3 - 18.4 - 18.5 - 18.5.1 - 18.5.2 - 18.5.3 - 18.5.4 - 18.5.5 - 15.12.2.7 - 15.12.2.8
Version 0.6.1. Copyright © 2012 Oracle America, Inc. Legal Notice.

Summary

A variety of compile-time analyses require reasoning about types that are not yet known. Principal among these are generic method applicability testing and generic method invocation type inference. In general, we refer to the process of reasoning about unknown types as type inference.

At a high level, type inference consists of reducing compatibility assertions about expressions or types, called constraint formulas, to a set of bounds on and dependencies between inference variables. For example, given inference variable α, it might be determined that the constraint formula String[] <: α[] reduces to the lower bound α :> String. As bounds and dependencies are inferred, they are incorporated into a bound set. Ultimately, this bound set is resolved to produce an instantiation for each of the inference variables.

To infer a generic method's applicability, we assert that the arguments to the method are compatible with their corresponding formal parameter types, and reduce these constraints to a bound set. We then check that there exists a resolution of the resulting bounds.

To infer the type of a generic method invocation, we re-use the inferred bound set from method applicability testing, augmenting it with the results of asserting the compatibility of the method's return type with its target type. We also assert that any exceptions thrown by lambda expression bodies are contained by the throws clauses of their target descriptors. After reduction, we resolve the inference variables and use them as type arguments to determine the invocation type of the method.

Type inference is also used to determine a target function descriptor for a lambda expression that is assigned to a wildcard-parameterized functional interface type, and also to determine the validity of an unchecked conversion.

The following details of reduction are especially noteworthy:

When a generic method invocation or diamond constructor invocation appears as the argument to another invocation, the target type of the nested invocation is the targeted formal parameter type. If the target type contains inference variables, the inference variables and bounds from the nested invocation are "lifted" into the outer inference analysis, and dependencies between the outer and inner inference variables are inferred. Ultimately, information from an outermost invocation can be "pushed down" to nested invocations via these dependencies.

When a lambda expression appears as the argument to a generic invocation, and its target type includes inference variables, we proceed by:

A method or constructor reference appearing as the argument to a generic invocation is handled in much the same way.

18 Type Inference [New]

A variety of compile-time analyses require reasoning about types that are not yet known. Principal among these are generic method applicability testing (18.5.1) and generic method invocation type inference (18.5.2). In general, we refer to the process of reasoning about unknown types as type inference.

At a high level, type inference can be decomposed into three processes:

These processes interact closely: reduction can trigger incorporation; incorporation may lead to further reduction; and resolution may cause further incorporation.

We elaborate in the following sections.

Discussion and motivation:
  1. This section introduces a new chapter to replace the current specification's definition of type argument inference in 15.12.2.7 and 15.12.2.8.

    In comparison to JLS 7, important changes to inference include:

    • Adding support for lambda expressions and method references as method invocation arguments.
    • Generalizing to define inference in terms of poly expressions, which may not have well-defined types until after inference is complete. This has the notable effect of improving inference for nested generic method and diamond constructor invocations.
    • Describing how inference is used to handle wildcard-parameterized functional interface target types.
    • Clarifying the distinction between invocation applicability testing (which involves only the invocation arguments) and invocation type inference (which incorporates a target type).
    • Delaying resolution of all inference variables, even those with lower bounds, until invocation type inference, in order to get better results.
    • Improving inference behavior for interdependent (or self-dependent) variables.
    • Eliminating bugs and potential sources of confusion. This revision more carefully and precisely handles the distinction between specific conversion contexts and subtyping, and describes reduction by paralleling the corresponding non-inference relations. Where there are intentional departures from the non-inference relations, these are explicitly identified as such.
    • Laying a foundation for future evolution: enhancements to or new applications of inference will be easier to integrate into the specification.
  2. When reading this material for the first time, section 18.1 is useful for establishing the proper vocabulary; but then it may be useful for context to skip ahead to 15.12.2.2 and subsequent sections, then jump back to 18.5, and save the details of the algorithm (18.2, 18.3, and 18.4) for last.

18.1 Concepts and Notation [New]

This section defines inference variables, constraint formulas, bounds, dependencies, and bound sets, as the terms will be used throughout this chapter. It also presents the notation we will use.

18.1.1 Inference Variables [New]

Inference variables are meta-variables for types—that is, they are special names that allow us to reason about types abstractly. To distinguish them from type variables, we represent inference variables with Greek letters, principally α.

We'll loosely use the term "type" in this chapter to include type-like syntax that contains inference variables. We use the term proper type to exclude such "types" that mention inference variables. Assertions that involve inference variables are assertions about every proper type that can be produced by replacing each inference variable with a proper type.

18.1.2 Constraint Formulas [New]

Constraint formulas are assertions of compatibility or subtyping that may involve inference variables. The formulas may take one of the following forms:

A constraint formula of the form Expression → T may be set aside during reduction to be revisited at a later time. In this context, the formula is called a delayed constraint formula and is expressed as ⟨Expression → T⟩.

Examples of constraint formulas:

18.1.3 Bound Sets [New]

During the inference process, a set of bounds on inference variables and dependencies between them is maintained.

A single bound has one of the following forms, where T is a proper type:

A single dependency has one of the following forms, where S mentions at least one inference variable (but is not itself an inference variable):

Bounds and dependencies are distinct entities, although they have similar structures—a dependency is not a bound, nor is a bound a dependency.

An important intermediate result of inference is a bound set, B. We use the term set here as shorthand for one of:

It is sometimes convenient to refer to an empty bound set with the literal true; this is merely out of convenience, and the two are interchangeable.

In the context of a particular bound set, the upper bounds of an inference variable α are all proper types T such that the bound set contains α <: T; similarly, the lower bounds of α are all proper types T such that the bound set contains α :> T. When a variable has a bound of the form α = T, the variable is considered resolved, and its instantiation is T.

Examples of bound sets:

When inference begins, a bound set is typically generated from a list of type parameter declarations P1, ..., Pp and associated inference variables α1, ..., αp. Such a bound set is constructed as follows. For each l (1 ≤ l ≤ p):

Reduction works with a "current" bound set; when this set is not false, each inference variable of interest has at least one bound.

The incorporation process, which maintains the current bounds, guarantees the following properties:

Discussion and motivation:
  1. The strategy described here of decomposing a TypeBound into individual types separated by &, allowing some components to be treated as bounds and others as dependencies, is a slight enhancement to the JLS 7 behavior, which treats the bound as a monolithic intersection type.

18.2 Reduction [New]

Reduction is the process by which a set of constraint formulas (18.1.2) is simplified to produce a bound set (18.1.3). An initial bound set, describing known bounds and dependencies on inference variables of interest, is provided as input; this acts as the "current" bound set, which is periodically updated as further bounds are inferred.

Each constraint formula is considered in turn. The rules in this section outline how the formula is reduced to one or both of:

The results of a reduction step are always soundness-preserving: if an inference variable instantiation satisfies the reduced constraints and bounds, it will also satisfy the original constraint. On the other hand, reduction is not completeness-preserving: there may exist inference variable instantiations that satisfy the original constraint but do not satisfy a reduced constraint or bound. This is due to inherent limitations of the algorithm, along with a desire to avoid undue complexity. One effect is that there are expressions for which type argument inference fails to find a solution, but that can be well-typed if the programmer explicitly inserts appropriate types.

Discussion and motivation:
  1. The reduction rules defined in JLS 7 are neither soundness- nor completeness-preserving. The reduction rules described here improve upon this by guaranteeing soundness and reducing the set of invocations which inference is unable to handle. The rules are also, of course, expanded to support the new language features.

    Unsoundness in JLS 7 arises, among other things, out of the fact that false and true results are treated interchangeably—neither has any impact on the current bound set. It is not a fundamental type safety problem, because the results of inference are checked for validity before an expression is considered well-typed. But it has the unfortunate effect of covering up bugs: if the spec misses an important constraint, there's not a hard line that says that's "wrong." The claim of soundness here raises the bar: if the rules miss a constraint that is necessary to guarantee that the initial constraint formula is true, that is a failing of the rules, and it must be corrected to make the spec internally consistent.

    In addition, a soundness-preserving approach avoids the potential confusion that could arise if, say, inference unsoundly ignored certain difficult-to-analyze lambda expression arguments. (The effect might be that inference gets "lucky" in guessing valid type arguments in one scenario, only to fail when a seemingly unrelated adjustment changes the results.)

18.2.1 Expression Compatibility Constraints [New]

A constraint formula of the form Expression → T is reduced as follows: [jsr335-18.2.1-10]

Discussion and motivation:
  1. By treating nested generic method invocations as poly expressions, we improve the behavior of inference for nested invocations. For example, the following is illegal in Java 7 but legal in Java 8:
    ProcessBuilder b = new ProcessBuilder(Collections.emptyList());
    // Constructor expects a List<String>

    When both the outer and the nested invocation require inference, the problem is more difficult. For example:

    List<String> ls = new ArrayList<>(Collections.emptyList());

    Our approach is to "lift" the bounds inferred for the nested invocation (simply { α <: Object } in the case of emptyList here) into the outer inference process (in this case, trying to infer β where the constructor is for type ArrayList<β>). We also infer dependencies between the nested inference variables and the outer inference variables (the constraint List<α>Collection<β> would reduce to the dependency α = β). In this way, resolution of the inference variables in the nested invocation can wait until additional information can be inferred from the outer invocation (based on the assignment target, β = String).

18.2.1.1 Lambda Expression Compatibility [New]

A constraint formula of the form LambdaExpression → T, where T mentions at least one inference variable, is reduced as follows, following the compatibility rules outlined in 15.27.3. [jsr335-18.2.1.2-10]

If T is an inference variable, the constraint reduces to the delayed constraint ⟨LambdaExpression → T⟩. [jsr335-18.2.1.2-12]

Otherwise, if T is not a functional interface type (9.8), the constraint reduces to false. [jsr335-18.2.1.2-14]

Otherwise, a target function descriptor for the expression must be determined, as described in 15.27.3. If no valid descriptor can be found, the constraint reduces to false. [jsr335-18.2.1.2-20]

Otherwise, if the lambda parameters' types are to be inferred, and at least one of the descriptor's parameter types is not a proper type, then the constraint reduces to the delayed constraint ⟨LambdaExpression → T⟩. [jsr335-18.2.1.2-22]

Otherwise, the congruence of the LambdaExpression with the target function descriptor is asserted as follows: [jsr335-18.2.1.2-30]

Discussion and motivation:
  1. The key piece of information to derive from a compatibility constraint involving a lambda expression is the set of bounds on inference variables appearing in the target descriptor's return type. This is crucial, because functional interfaces are often generic, and many methods operating on these types are generic, too.

    In the simplest case, a lambda expression may simply provide a lower bound for an inference variable:

    <T> List<T> makeThree(Factory<T> factory) { ... }
    String s = makeThree(() -> "abc").get(2);
    

    In more complex cases, a result expression may be a poly expression—perhaps even another lambda expression—and so the inference variable might be passed through multiple constraint formulas with different target types before a bound is produced.

    Most of the work described in this section precedes assertions about the result expressions; its purpose is to derive the lambda expression's function descriptor, and to check for expressions that are clearly disqualified from compatibility.

    We do not attempt to produce bounds on inference variables that appear in the target descriptor's throws clause. This is because exception containment is not part of compatibility (15.27.3)—in particular, it must not influence method applicability (18.5.1). However, we do get bounds on these variables later, because invocation type inference (18.5.2) produces exception containment constraint formulas (18.2.5).

  2. If the target type is an inference variable, or if the target descriptor parameter types contain inference variables, we produce a delayed constraint (18.1.2). When more information about the target type can be inferred, a substitution can be performed eliminate the blocking inference variables and allow further reduction. For example:
    <T> T id(T arg) { return arg; }
    Runnable r = id(() -> { System.out.println("hi"); });
    

    During applicability inference for id, it is impossible to determine what kind of functional interface is meant to be represented by the lambda expression. So a delayed constraint is produced:

    () -> { System.out.println("hi"); } → α⟩

    Later, when the target type of the id invocation is examined, we can resolve α = Runnable, and then perform substitution to get the new constraint formula:

    () -> { System.out.println("hi"); } → Runnable

18.2.1.2 Method and Constructor Reference Compatibility [New]

A constraint formula of the form MethodReference → T, where T mentions at least one inference variable, is reduced as follows, following the compatibility rules outlined in 15.28.1. [jsr335-18.2.1.3-10]

If T is neither a functional interface type (9.8) nor an inference variable, or if T is a functional interface type but does not have a function descriptor, the constraint reduces to false. [jsr335-18.2.1.3-12]

Otherwise, if T is an inference variable or if at least one of T's function descriptor parameter types is not a proper type, then the constraint reduces to the delayed constraint ⟨MethodReference → T⟩. [jsr335-18.2.1.3-14]

Otherwise, if there does not exist a compile-time declaration for the method reference, as defined in 15.28.1, the constraint reduces to false. [jsr335-18.2.1.3-16]

If a compile-time declaration can be found there are three cases: [jsr335-18.2.1.3-40]

A constraint formula of the form ConstructorReference → T, where T mentions at least one inference variable, is reduced as follows, following the compatibility rules outlined in 15.28.1. [jsr335-18.2.1.3-60]

If T is neither a functional interface type (9.8) nor an inference variable, or if T is a functional interface type but does not have a function descriptor, the constraint reduces to false. [jsr335-18.2.1.3-62]

Otherwise, if T is an inference variable or if at least one of T's function descriptor parameter types is not a proper type, then the constraint reduces to the delayed constraint ⟨ConstructorReference → T⟩. [jsr335-18.2.1.3-64]

Otherwise, if there does not exist a compile-time declaration for the constructor reference, as defined in 15.28.1, the constraint reduces to false. [jsr335-18.2.1.3-66]

If a compile-time declaration can be found there are two cases: [jsr335-18.2.1.3-70]

Discussion and motivation:
  1. Delayed constraints can be produced for method and constructor references just as they were for lambda expressions (18.2.1.2).
  2. The strategy used to determine a return type for a generic referenced method follows the same pattern as for generic method invocations (18.2.1). This may involve "lifting" bounds into the outer context and inferring dependencies between the two sets of inference variables.

18.2.2 Type Compatibility Constraints [New]

A constraint formula of the form S → T is reduced as follows: [jsr335-18.2.2-10]

Discussion and motivation:
  1. To do: we need to allow for unchecked conversion here. We may not be able to handle it in full generality, since unchecked conversion itself relies on inference (18.5.5); the situation is similar to a nested generic method invocation.
  2. Boxing T to T' is not completeness-preserving: for example, if T were long, S might be instantiated to Integer, which could be unboxed and then widened to long. We avoid this problem is most cases by giving special treatment to inference-variable return types that we know are already constrained to be certain boxed primitive types. See 18.5.2.

18.2.3 Subtyping Constraints [New]

To do: this section mimics 15.12.2.7, describing how subtyping assertions reduce to inference variable bounds and dependencies.

18.2.4 Type Equality Constraints [New]

To do: this section mimics 15.12.2.7, describing how type equality assertions reduce to inference variable bounds and dependencies.

18.2.5 Checked Exception Constraints [New]

To do: the purpose of this section is to derive bounds on inference variables that appear in the throws clause of a functional interface descriptor. It mimics the checking of lambda expressions and method references above, deriving a target descriptor and then comparing the body's uncaught checked exceptions to the descriptor's throws clause. It is also necessary to recursively traverse parenthesized expressions, conditional expressions, and return expressions of lambdas, because these may contain lambda expressions or method references that can influence inference variables.
Discussion and motivation:
  1. Constraints on checked exceptions are handled separately from constraints on return types, because return type compatibility influences applicability of methods (18.5.1), while exceptions only influence the invocation type after overload resolution is complete (18.5.2). This could be simplified by including exception compatibility in the definition of lambda expression compatibility, but this would lead to possibly-surprising cases in which exceptions that can be thrown by a lambda body change overload resolution.
  2. The handling of the case in which more than one inference variable appears in a descriptor's throws clause is not completeness-preserving. Either variable may, on its own, satisfy the constraint that each checked exception be declared, but we can't be sure which one is intended. So, for predictability, we constrain them both.

18.3 Incorporation [New]

To do: incorporation is the process by which new bounds and dependencies are added to a bound set. It guarantees that the bounds are consistent, and propagates information between inference variables according to their dependencies.

18.4 Resolution [New]

To do: resolution is the process by which instantiations of inference variables are determined. It mimics 15.12.2.7 and 15.12.2.8, with some enhancements.

18.5 Uses of Inference [New]

Using the inference processes defined above, the following analyses are performed at compile-time.

18.5.1 Invocation Applicability Inference [New]

Given a method invocation that provides no explicit type arguments, the following process, modeled after the applicability rules in 15.12.2.2, 15.12.2.3, & 15.12.2.4, determines whether a potentially-applicable generic method m is applicable: [jsr335-18.5.1-10]

To do: we need to do something here about the varargs accessibility constraint.
Discussion and motivation:
  1. Consider the following method invocation and assignment:
    List<Number> ln = Arrays.asList(1, 2.0);

    A most-specific applicable method for the invocation must be identified as described in 15.12. The only potentially-applicable method (15.12.2.1) is declared as follows:

    public static <T> List<T> asList(T... a)

    Trivially (because of its arity), this method is neither applicable by strict invocation (15.12.2.2) nor applicable by loose invocation (15.12.2.3). But since there are no other candidates, in a third phase the method is checked for applicability as a variable-arity method.

    The initial bound set, B, is a trivial upper bound for a single inference variable, α:

    { α <: Object }

    The initial constraint formula set is as follows:

    { 1 → α, 2.0 → α }

    These are reduced to a new bound set, B':

    { α <: Object, α :> Integer, α :> Double }

    Then, to test whether the method is applicable, we attempt to resolve these bounds. We succeed, producing the rather complex instantiation

    α = Number & Comparable<? extends Number & Comparable<?>>

    We have thus demonstrated that the method is applicable; since no other candidates exist, it is the most-specific applicable method. The type of the method, and its compatibility with the target type in the assignment, is not determined until further inference can occur, as described in the next section.

18.5.2 Invocation Type Inference [New]

Consider a method invocation that provides no explicit type arguments, and a corresponding most-specific applicable generic method m. The following process is used to infer the invocation type (15.12.2.6) of the chosen method. [jsr335-18.5.2-10]

To do: we need to figure out how this interacts with unchecked warnings and the resulting erasure of the invocation type (15.12.2.6).
To do: Should R be captured when targeting S?
To do: Need to describe how delayed constraints in B'' are reduced before proceeding to resolve all the inference variables.
Discussion and motivation:
  1. It is important to note that two "rounds" of inference are involved in finding the type of a method invocation. This is necessary to allow a target type to influence the type of the invocation without allowing it to influence the choice of an applicable method. The first round produces a bound set and tests that a resolution exists, but does not commit to that resolution. The second round reduces additional constraints and then performs a second resolution, this time "for real."

    The distinction between these two rounds of inference was confused in previous iterations of the JLS (compare the discussion of inference in 15.12.2.2 vs. 15.12.2.6; also see 15.12.2.8, which makes ambiguous use of the term "now").

  2. Let's revisit the example from the previous section:
    List<Number> ln = Arrays.asList(1, 2.0);

    The most-specific applicable method is declared as follows:

    public static <T> List<T> asList(T... a)

    In order to complete type-checking of the method invocation, we must determine whether it is compatible with its target type, List<Number>.

    The bound set used to demonstrate applicability in the previous section, B', was:

    { α <: Object, α :> Integer, α :> Double }

    The new constraint formula set is as follows:

    { 1throws α, 2.0throws α, List<α>List<Number> }

    The throws constraints are trivially true, because they do not involve lambda expressions or method references. The assignability constraint produces an equality bound for α, which is included in the new bound set, B'':

    { α <: Object, α :> Integer, α :> Double, α = Number }

    These bounds are trivially resolved:

    α = Number

    Finally, we perform a substitution on the declared return type of asList to determine that the invocation expression has type List<Number>; clearly, this is compatible with the target type.

    Note that this strategy for inference is different than JLS 7, which would have instantiated α based on its lower bounds (before even considering the invocation's target type), as we did in the previous section. This would result in a type error, since the resulting type is not a subtype of List<Number>.

18.5.3 Functional Interface Parameterization Inference [New]

Where a lambda expression with explicit parameter types P1, ..., Pn targets a wildcard-parameterized functional interface, F<A1, ..., Am>, one or more of the wildcards in T may be instantiated according to the following process: [jsr335-18.5.3-20]

Discussion and motivation:
  1. In order to determine the descriptor of a wildcard-parameterized functional interface, we have to "instantiate" the wildcard type arguments with specific types. The "default" approach is to simply replace the wildcards with their bounds, as described in 9.8. But this produces spurious errors in cases in which a lambda expression has explicit parameter types that do not correspond to the wildcard bounds. For example:
    Predicate<? super Integer> p = (Number n) -> n.equals(23);

    The lambda expression in this example is a Predicate<Number>, which is a subtype of Predicate<? super Integer> but not Predicate<Integer>. The analysis in this section is used to infer that Number is an appropriate choice for the type argument to Predicate.

    While the analysis is described in terms of inference, Since the only constraints used for inference are equality constraints, the analysis amounts to a simple pattern match on the parameter types.

  2. The analysis here, while described in terms of general type inference, is intentionally quite simple. The only constraints are equality constraints, which means that reduction amounts to simple pattern matching.

    A more powerful strategy might also infer constraints from the body of the lambda expression. But, given possible interactions with inference for surrounding and/or nested generic method invocations, this would introduce a lot of extra complexity.

18.5.4 More Specific Method Inference [New]

This section must describe how inference is used to test whether a method m1 is more specific than generic method m2 given argument expressions exp1, ..., expn.

18.5.5 Unchecked Conversion Inference [New]

The following demonstrates how inference might be used to implement checking for a legal unchecked conversion (the check is left unspecified in JLS 7). There are currently no cross-references mandating the use of this strategy from, for example, Chapter 5; for now, this section is just illustrative.

Unchecked conversions (5.1.9) are typically followed by widening reference conversions (5.1.5). When this occurs, the type resulting from the unchecked conversion is not expressed in the program, and must be inferred as follows.

Let S be the raw type to be converted, and let T be the type expected after widening reference conversion. Then:

Discussion and motivation:
  1. This section addresses an existing specification bug: when an unchecked conversion is followed by widening, the implementation is left to figure out what the unchecked conversion produces, without any help from the specification. This has led to inconsistent behavior between compiler implementations. It turns out that this problem is a perfect fit for type inference.

15.12.2.7 Inferring Type Arguments Based on Actual Arguments [Modified]

Compare JLS 15.12.2.7

This section is deleted; it has been replaced by Chapter 18, principally 18.2 and 18.5.1.

15.12.2.8 Inferring Unresolved Type Arguments [Modified]

Compare JLS 15.12.2.8

This section is deleted; it has been replaced by Chapter 18, principally 18.3, 18.4, and 18.5.2.