State of the Lambda

THIS DOCUMENT HAS BEEN SUPERSEDED BY VERSION 4 AND IS PROVIDED FOR HISTORICAL CONTEXT ONLY

Brian Goetz, 10 October 2010

This is an updated proposal to add lambda expressions (informally, "closures") to the Java programming language. This sketch is built on the straw-man proposal made by Mark Reinhold in December 2009 and the previous iteration posted in July 2010.

1. Background; SAM types

The Java programming language already has a form of closures: anonymous inner classes. There are a number of reasons these are considered imperfect closures, primarily:

  1. Bulky syntax
  2. Inability to capture non-final local variables
  3. Transparency issues surrounding the meaning of return, break, continue, and 'this'
  4. No nonlocal control flow operators

It is not a goal of Project Lambda to address all of these issues. The current draft addresses (1) quite substantially, ameliorate (2) by allowing the compiler to infer finality (allowing capture of effectively final local variables), and ameliorate (3) by making 'this' within a lambda expression be lexically scoped. It is unlikely that we will go further at this time (e.g., we do not intend to address nonlocal flow control at all, nor allow arbitrary capture of mutable variables.)

The standard way for Java APIs to define callbacks is to use an interface representing the callback method, such as:

public interface CallbackHandler { 
    public void callback(Context c);
}

The CallbackHandler interface has a useful property: it has a single abstract method. Many common interfaces and abstract classes have this property, such as Runnable, Callable, EventHandler, or Comparator. We call these classes SAM types. This property, SAM-ness, is a structural property identified by the compiler, as is not represented in the type system.

The biggest pain point for anonymous inner classes is bulkiness. To call a method taking a CallbackHandler, one typically creates an anonymous inner class:

foo.doSomething(new CallbackHandler() { 
                    public void callback(Context c) { 
                        System.out.println("pippo");
                    }
                });

The anonymous inner class here is what some might call a "vertical problem": five lines of source code to encapsulate a single statement.

2. Lambda expressions

Lambda expressions are anonymous functions, aimed at addressing the "vertical problem" by replacing the machinery of anonymous inner classes with a simpler mechanism. One way to do that would be to add function types to the language, but this has several disadvantages:

So, we have instead chosen to take the path of "use what you know" -- since existing libraries use SAM types extensively, we leverage the notion of SAM types and use lambda expressions to make it easier create instances of callback objects.

Here are some examples of lambda expressions:

#{ -> 42 }

#{ int x -> x + 1 }

The first expression takes no arguments, and returns the integer 42; the second takes a single integer argument, named x, and returns x+1.

Lambda expressions are delimited by #{ } and have argument lists, the arrow token ->, and a lambda body. For nilary lambdas the arrow token can be omitted, meaning the first example can be shortened to

#{ 42 }

The method body can either be a single expression or an ordinary list of statements (like a method body). In the single-expression form, no return or semicolon is needed. These simplification rules are based on the expectation that many lambda expressions will be quite small, like the examples above, and the "horizontal noise" such as the return keyword become a substantial overhead in those cases.

3. SAM conversion

One can describe a SAM type by its return type, parameter types, and checked exception types. Similarly, one can describe the type of a lambda expression by its return type, parameter types, and exception types.

Informally, a lambda expression e is convertible-to a SAM type S if an anonymous inner class that is a subtype of S and that declares a method with the same name as S's abstract method and a signature and return type corresponding to the lambda expressions signature and return type would be considered assignment-compatible with S.

The return type and exception types of a lambda expression are inferred by the compiler; the parameter types may be explicitly specified or they may be inferred from the assignment context (see Target Typing, below.)

When a lambda expression is converted to a SAM type, invoking the single abstract method of the SAM instance causes the body of the lambda expression to be invoked.

For example, SAM conversion will happen in the context of assignment:

CallbackHandler cb = #{ Context c -> System.out.println("pippo") };

In this case, the lambda expression has a single Context parameter, has void return type, and throws no checked exceptions, and is therefore compatible with the SAM type CallbackHandler.

4. Target Typing

Lambda expressions can only appear in context where it will be converted to a variable of SAM type. These include assignment context, cast target context, and method invocation context. So the following examples are valid examples of statements using lambda expressions:

Runnable r = #{ System.out.println("Blah") };
Runnable r = (Runnable) #{ System.out.println("Blah") };
executor.submit( #{ System.out.println("Blah") } );

The following use of lambda expressions is forbidden because it does not appear in a SAM-convertible context:

Object o = #{ 42 };

In a method invocation context, the target type for a lambda expression used as a method parameter is inferred by examining the set of possible compatible method signatures for the method being invoked. This entails some additional complexity in method selection; ordinarily the types of all parameters are computed, and then the set of compatible methods is computed, and a most specific method is selected if possible. Inference of the target type for lambda-valued actual parameters happens after the types of the other parameters is computed but before method selection; method selection then happens using the inferred target types for the lambda-valued parameters.

The types of the formal parameters to the lambda expression can also be inferred from the target type of the lambda expression. So we can abbreviate our callback handler as:

CallbackHandler cb = #{ c -> System.out.println("pippo") };

as the type of the parameter c can be inferred from the target type of the lambda expression.

Allowing the formal parameter types to be inferred in this way furthers a desirable design goal: "Don't turn a vertical problem into a horizontal problem." We wish that the reader of the code have to wade through as little code as possible before arriving at the "meat" of the lambda expression.

The user can explicitly choose a target type by using a cast. This might be for clarity, or might be because there are multiple overloaded methods and the compiler cannot correctly chose the target type. For example:

executor.submit(((Callable<String>) #{ "foo" }));

If the target type is an abstract class, there is no place to put constructor arguments, so we cannot invoke other than the no-arg constructor. However, such cases always have the option to use inner classes.

5. Lambda bodies

In addition to the simplified expression form of a lambda body, a lambda body can also contain a list of statements, similar to a method body, with several differences: the break and continue statements are not permitted at the top level (break and continue are of course permitted within loops, but break and continue labels must be inside the lambda body). The "return" statment can be used in a multi-statement lambda expression to indicate the value of the lambda expression; this is a "local" return. The type of a multi-statement lambda expression is inferred by unifying the types of the values returned by the set of return statements. As with method bodies, every control path through a multi-statement lambda expression must either return a value (or void) or throw an exception.

6. Local variable capture

The current rules for capturing local variables of enclosing contexts in inner classes are quite restrictive; only final variables may be captured. For lambda expressions (and for consistency, probably inner class instances as well), we relax these rules to also allow for capture of effectively final local variables. (Informally, a local variable is effectively final if making it final would not cause a compilation failure; this can be considered a form of type inference.)

It is our intent to not permit capture of mutable local variables. The reason is that idioms like this:

int sum = 0;
list.forEach(#{ e -> sum += e.size(); });

are fundamentally serial; it is quite difficult to write lambda bodies like this that do not have race conditions. Unless we are willing to enforce (preferably statically) that such lambdas not escape their capturing thread, such a feature may well cause more trouble than it solves.

7. Lexical scoping

Unlike inner classes, lambda expressions are lexically scoped, meaning that the body of a lambda expression are scoped just like a code block in the enclosing environment, with local variables for each formal parameter. The 'this' variable (and any associated OuterClassName.this variables) has the same meaning as it does immediately outside the lambda expression.

Lambda bodies that reference either the 'this' variables or members of enclosing instances (those that are implicitly qualified with a 'this' variable, in which case the references are treated as if they used the appropriate 'this' variable) are treated as capturing the appropriate 'this' variable as per capture of effectively final local variables.

This has a potentially beneficial implication for memory management; where inner class instances always hold a strong reference to their enclosing instance, lambdas that do not capture members from the enclosing instance do not have this behavior. This characteristic of inner class instances can often be a source of memory leaks (the so-called lapsed listener problem). This risk is reduced by the lexical scoping of lambda bodies.

The move to lexical scoping introduces a complication in creating self-referential lambda expressions; in some cases it is desirable to create lambdas such as the following:

Timer timer = ...
timer.schedule(
    #{ 
        if (somethingHappened())
            // cancel the timer task that this lambda represents
            someRefToTimerTask.cancel();
        else
            System.out.println("foo");
    });

We cannot refer to the TimerTask to which the lambda is being converted, since it is anonymous. Instead, we will update the definitely assigned/unassigned rules to allow the above code to be refactored as:

Timer timer = ...
final TimerTask t = #{ 
    if (somethingHappened())
        // cancel the timer task that this lambda represents
        t.cancel();
    else
        System.out.println("foo");
});
timer.schedule(t);

Under current DA/DU rules, the reference to 't' in the lambda body would be illegal. However, the compiler can verify that the body cannot be executed (and therefore 't' cannot be evaluated) until the assignment to 't' completes. The DA/DU rules will be updated to allow this situation. (The same issue would come up with lambda expressions that wanted to recurse.)

8. Method references

SAM conversion allows us to take an anonymous method body and treat it as if it were a SAM type. It is often desirable to do the same with an existing method (such as when a class has multiple methods that are signature-compatible with Comparable.compareTo().)

Method references are expressions which have the same treatment as lambda expressions (i.e., they can only be SAM-converted), but instead of providing a method body they refer to a method of an existing class or object instance.

For example, consider a Person class that can be sorted by name or by age:

class Person { 
    private final String name;
    private final int age;

    public static int compareByAge(Person a, Person b) { ... }

    public static int compareByName(Person a, Person b) { ... }
}

Person[] people = ...
Arrays.sort(people, #Person.compareByAge);

Here, the expression #Person.compareByAge can be considered sugar for a lambda expression whose formal argument list is copied from the method Person.compareByAge, and whose body calls Person.compareByAge (though the actual implementation is more efficient than this.) This lambda expression will then get SAM-converted to a Comparator.

If the method being referenced is overloaded, it can be disambiguated by providing a list of argument types:

Arrays.sort(people, #Person.compareByAge(Person, Person));

Instance methods can be referenced as well, by providing a receiver variable:

Arrays.sort(people, #comparatorHolder.comparePersonByAge);

In this case, the implicit lambda expression would capture a final copy of the "comparatorHolder" reference and the body would invoke the comparePersonByAge using that as the receiver.

We will likely restrict the forms that the receiver can take, rather than allowing arbitrary object-valued expressions like "#foo(bar).moo", when capturing instance method references.

One syntactic alternatives that is under consideration include placing the '#' symbol in place of the dot (making '#' an infix rather than prefix token).

9. Extension methods

A separate document on virtual extension methods proposes our strategy for extending existing interfaces with virtual extension methods.

10. Exception transparency

A separate document on exception transparency proposes our strategy for amending generics to allow abstraction over thrown checked exception types.

11. Putting it together

The features described here are not an arbitrary combination of features, but instead designed to work together. Take the typical example of sorting a collection. Today we write:

Collections.sort(people, new Comparator<Person>() {
    public int compare(Person x, Person y) {
        return x.getLastName().compareTo(y.getLastName());
    }
});

This is a very verbose way to write "sort people by last name"!

With lambda expressions, we can make this expression more concise:

Collections.sort(people, 
                 #{ Person x, Person y -> x.getLastName().compareTo(y.getLastName()) });

However, while more concise, it is not any more abstract; it still burdens the programmer with the need to do the actual comparison (which is even worse when the sort key is a primitive.) Small changes to the libraries can help here, such as introducing a sortyBy method which takes a function that extracts a (Comparable) sort key:

interface Extractor<T,U> { public U extract(T t); }

public void<T, U extends Comparable<? super U>>
    sortBy(Collection<T> collection, Extractor<T, U> extractor);

Collections.sortBy(people, #{ Person p -> p.getLastName() });

Even this can be made less verbose, by allowing the compiler to infer the type of the lambda formal:

Collections.sortBy(people, #{ p -> p.getLastName() });

The lambda in the above expression is simply a forwarder for the existing method getLastName(). We can use method references to reuse the existing method rather than creating a new lambda:

Collections.sortBy(people, #Person.getLastName);

The use of an anciallary method like Collections.sortBy() is undesirable; not only is it more verbose, but it is less object-oriented and undermines the value of the Collection interface since users can't easily discover the static sortBy() method when inspecting the documentation for Collection. Extension methods address this problem:

people.sortBy(#Person.getLastName);

Which reads like the problem statement in the first place: sort the people collection by last name.