State of the Lambda

THIS DOCUMENT HAS BEEN SUPERSEDED AND IS PROVIDED FOR HISTORICAL CONTEXT ONLY

December 2011

4th edition

This is an informal overview of the enhancements to the Java programming language specified by JSR 335 and implemented in the OpenJDK Lambda Project. It refines the previous iteration posted in October 2010. A formal description of some of the language changes may be found in the Early Draft Specification for the JSR; an OpenJDK Developer Preview is also available. Additional design documents—in particular a more detailed examination of default methods—can be found at the OpenJDK project page. As all of these artifacts are works-in-progress, there may be inconsistencies between them, which will be resolved by the time the spec and implementation are finalized.

The high-level goal of Project Lambda is to allow programming patterns that require modeling code as data to be convenient and idiomatic in Java. The principal new language features include:

Lambda expressions (informally, "closures" or "anonymous methods")
Expanded target typing
Method and constructor references
Default methods

These are described and illustrated below.

1. Background

Java is, primarily, an object-oriented programming language. In both object-oriented and functional languages, basic values can dynamically encapsulate program behavior: object-oriented languages have objects with methods, and functional languages have functions. This similarity may not be obvious, however, because Java objects tend to be relatively heavyweight: instantiations of separately-declared classes wrapping a handful of fields and many methods.

Yet it is not uncommon for some objects to essentially encode nothing more than a function. In a typical use case, a Java API defines an interface, informally called a "callback interface," and expects a user to provide an instance of the interface when invoking the API. For example:

public interface ActionListener { 
    void actionPerformed(ActionEvent e);
}

Rather than declaring a class that implements ActionListener for the sole purpose of allocating it once at an invocation site, a user typically instantiates the implementing class inline, anonymously:

button.addActionListener(new ActionListener() { 
  public void actionPerformed(ActionEvent e) { 
    ui.dazzle(e.getModifiers());
  }
});

Many useful libraries rely on this pattern. It is particularly important for parallel APIs, in which the code to execute must be expressed independently of the thread in which it will run. The parallel-programming domain is of special interest, because as CPU makers focus their efforts on improving performance through a proliferation of cores, serial APIs are limited to a shrinking fraction of available processing power.

Given the increasing relevance of callbacks and other functional-style idioms, it is important that modeling code as data in Java be as lightweight as possible. In this respect, anonymous inner classes are imperfect for a number of reasons, primarily:

Bulky syntax
Confusion surrounding the meaning of names and this
Inflexible class-loading and instance-creation semantics
Inability to capture non-final local variables
Inability to abstract over control flow

This project addresses many of these issues. It eliminates (1) and (2) by introducing new, much more concise expression forms with local scoping rules, sidesteps (3) by defining the semantics of the new expressions in a more flexible, optimization-friendly manner, and ameliorates (4) by allowing the compiler to infer finality (allowing capture of effectively final local variables).

However, it is not a goal of this project to address all the problems of inner classes. Neither arbitrary capture of mutable variables (4) nor nonlocal control flow (5) are within this project's scope, but such features may be revisited in a future iteration of the language.

2. Functional interfaces

The anonymous inner class approach, despite its limitations, has the nice property of fitting very cleanly into Java's type system: a function value with an interface type. This is convenient for a number of reasons: interfaces are already an intrinsic part of the type system; they naturally have a runtime representation; and they carry with them informal contracts expressed by Javadoc comments, such as an assertion that an operation is commutative.

The interface ActionListener, used above, has just one method. Many common callback interfaces have this property, such as Runnable and Comparator. We'll give all interfaces that have just one method a name: functional interfaces. (These were previously called SAM Types.)

Nothing special needs to be done to declare an interface as functional—the compiler identifies it as such based on its structure. (This identification process is a little more complex than just counting method declarations. For example, an interface might inherit methods from multiple parents that logically represent the same method, or it might redundantly declare a method that is automatically provided by the class Object, like toString.)

An alternative (or complementary) approach to function types, suggested by some early proposals, would have been to introduce a new, structural function type. A type like "function from a String and an Object to an int" might be expressed as (String,Object)->int. This idea was considered and rejected, at least for now, due to several disadvantages:

It would add complexity to the type system and further mix structural and nominal types.
It would lead to a divergence of library styles—some libraries would continue to use callback interfaces, while others would use structural function types.
The syntax could be unweildy, especially when checked exceptions were included.
It is unlikely that there would be a runtime representation for each distinct function type, meaning developers would be further exposed to and limited by erasure. For example, it would not be possible (perhaps surprisingly) to overload methods m(T->U) and m(X->Y).

So, we have instead chosen to take the path of "use what you know"—since existing libraries use functional interfaces extensively, we codify and leverage this pattern.

To illustrate, here are some of the functional interfaces in Java SE 7 that are well-suited for being used with the new language features; the examples that follow illustrate the use of a few of them.

3. Lambda expressions

The biggest pain point for anonymous inner classes is bulkiness. They have what we might call a "vertical problem": the ActionListener instance from section 1 uses five lines of source code to encapsulate a single statement.

Lambda expressions are anonymous methods, aimed at addressing the "vertical problem" by replacing the machinery of anonymous inner classes with a lighter-weight mechanism.

Here are some examples of lambda expressions:

(int x, int y) -> x + y

() -> 42

(String s) -> { System.out.println(s); }

The first expression takes two integer arguments, named x and y, and returns x+y. The second takes no arguments and returns the integer 42. The third takes a string and prints it to the console, returning nothing.

The general syntax consists of an argument list, the arrow token ->, and a body. The body can either be a single expression or a statement block. In the expression form, the body is simply evaluated and returned. In the block form, the body is evaluated like a method body—a return statement returns control to the caller of the anonymous method; break and continue are illegal at the top level, but are of course permitted within loops; and if the body produces a result, every control path must return something or throw an exception.

The syntax is optimized for the common case in which a lambda expression is quite small, as illustrated above. For example, the expression-body form eliminates the need for a return keyword, which could otherwise represent a substantial syntactic overhead relative to the size of the expression.

It is also expected that lambda expressions will frequently appear in nested contexts, such as the argument to a method invocation or the result of another lambda expression. To minimize noise in these cases, unnecessary delimiters are avoided. However, for situations in which it is useful to set the entire expression apart, it can be surrounded with parentheses, just like any other expression.

Here are some examples of lambda expressions appearing in statements:

FileFilter java = (File f) -> f.getName().endsWith(".java");

String user = doPrivileged(() -> System.getProperty("user.name"));

new Thread(() -> {
  connectToService();
  sendNotification();
}).start();

4. Target typing

Note that the name of a functional interface is not part of the lambda expression syntax. So what kind of object does a lambda expression represent? Its type is inferred from the surrounding context. For example, the following lambda expression is an ActionListener:

ActionListener l = (ActionEvent e) -> ui.dazzle(e.getModifiers());

An implication of this approach is that the same lambda expression can have different types in different contexts:

Callable<String> c = () -> "done";

PrivilegedAction<String> a = () -> "done";

In the first case, the lambda expression () -> "done" represents an instance of Callable. In the second case, the same expression represents an instance of PrivilegedAction.

The compiler is responsible for inferring the type of each lambda expression. It uses the type expected in the context in which the expression appears; this type is called the target type. A lambda expression can only appear in a context that has a target type.

Of course, no lambda expression will be compatible with every possible target type. The compiler checks that the types used by the lambda expression are consistent with the target type's method signature. That is, a lambda expression is compatible with a target type T if all of the following conditions hold:

T is a functional interface type
The lambda expression has the same number of parameters as T's method, and those parameters' types are the same
Each expression returned by the lambda body is compatible with T's method's return type
Each exception thrown by the lambda body is allowed by T's method's throws clause

Since a functional interface target type already "knows" what types the lambda expression's formal parameters should have, it is often unnecessary to repeat them. The use of target typing often allows the lambda parameters' types to be inferred:

Comparator<String> c = (s1, s2) -> s1.compareToIgnoreCase(s2);

In addition, when there is just one parameter whose type is inferred (a very common case), the parentheses surrounding a single parameter name are optional:

FileFilter java = f -> f.getName().endsWith(".java");

button.addActionListener(e -> ui.dazzle(e.getModifiers()));

These enhancements further a desirable design goal: "Don't turn a vertical problem into a horizontal problem." We want the reader of the code to have to wade through as little syntax as possible before arriving at the "meat" of the lambda expression.

Lambda expressions are not the first Java expressions to have context-dependent types: generic method invocations and "diamond" constructor invocations, for example, are similarly type-checked based on an assignment's target type.

List<String> ls = Collections.emptyList();
List<Integer> li = Collections.emptyList();

Map<String,Integer> m1 = new HashMap<>();
Map<Integer,String> m2 = new HashMap<>();

5. Contexts for target typing

We stated earlier that lambda expressions can only appear in contexts that have target types. So, what contexts have target types?

Variable declarations
Assignments
Return statements
Array initializers
Method or constructor arguments
Lambda expression bodies
Conditional expressions ?:
Cast expressions

In the first three cases, the target type is simply the type being assigned to or returned.

Comparator<String> c;
c = (String s1, String s2) -> s1.compareToIgnoreCase(s2);

public Runnable toDoLater() {
  return () -> {
    System.out.println("later");
  };
}

Array initializer contexts are like assignments, except that the "variable" is an array component and its type is derived from the array's type.

runAll(new Callable<String>[]{ ()->"a", ()->"b", ()->"c" });

In the method argument case, things are more complicated: the target type is determined by two other language features, overload resolution and type argument inference. For each potentially-applicable method, the compiler determines whether the lambda expression is compatible with the corresponding target type, and also infers any type arguments. Once the best method declaration is chosen, that declaration provides the actual target type for the expression.

void invoke(Runnable r) { r.run(); }
<T> T invoke(Callable<T> c) { return c.call(); }

String s = invoke(() -> "done"); // invoke(Callable)

(If the choice of a best method declaration is ambiguous, casts can provide a workaround; see the below discussion.)

Lambda expressions themselves provide target types for their bodies, in this case by deriving that type from the outer target type. This makes it convenient to write functions that return other functions:

Callable<Runnable> c = () -> () -> { System.out.println("hi"); };

Similarly, conditional expressions can "pass down" a target type from the surrounding context:

Callable<Integer> c = flag ? (() -> 23) : (() -> 42);

Finally, cast expressions provide a mechanism to explicitly provide a lambda expression's type if none can be conveniently inferred from context:

// Illegal: Object o = () -> { System.out.println("hi"); };
Object o = (Runnable) () -> { System.out.println("hi"); };

Casts are also useful to help resolve ambiguity when a method declaration is overloaded with unrelated functional interface types.

The expanded role of target typing in the compiler is not limited to lambda expressions: generic method invocations and "diamond" constructor invocations can also take advantage of target types wherever they are available. The following declarations are illegal in Java SE 7 but valid under JSR 335:

List<String> ls =
  Collections.checkedList(new ArrayList<>(), String.class);

Set<Integer> si = flag ? Collections.singleton(23)
                       : Collections.emptySet();

6. Lexical scoping

Determining the meaning of names (and this) in inner classes is significantly more difficult and error-prone than when classes are limited to the top level. Inherited members—including methods of class Object—can accidentally shadow outer declarations, and unqualified references to this always refer to the inner class itself.

Lambda expressions are much simpler: they do not inherit any names from a supertype, nor do they introduce a new level of scoping. Instead, they are lexically scoped, meaning names in the body are interpreted just as they are in the enclosing environment (with the addition of new names for the lambda expression's formal parameters). As a natural extension, the this keyword and references to its members have the same meaning as they would immediately outside the lambda expression.

To illustrate, the following program prints "Hello, world!" twice to the console:

public class Hello {
  Runnable r1 = () -> { System.out.println(this); }
  Runnable r2 = () -> { System.out.println(toString()); }

  public String toString() { return "Hello, world!"; }

  public static void main(String... args) {
    new Hello().r1.run();
    new Hello().r2.run();
  }
}

The equivalent using anonymous inner classes would instead, perhaps to the programmer's surprise, print something like Hello$1@5b89a773 and Hello$2@537a7706.

Of course, if this inside a lambda expression refers to the enclosing class, it cannot also be used to refer to the function value described by the lambda expression. It is not usually necessary to do so, but on some occasions—say, when defining a recursive function—it is important.

Fortunately, by simply refining the "initialized-before-use" analysis that determines when variables can be referenced, the compiler can permit a lambda expression to mention the variable to which it is assigned.

final Runnable r = () -> {
  // This reference to 'r' is legal:
  if (!allDone) { workQueue.add(r); }
  else { displayResults(); }
};

// For contrast:
// final Object[] objs =
  // This reference to 'objs' is illegal (it's uninitialized):
  // { "x", 23, objs };

When a lambda expression appears in any other context, such as a return expression, there is no way for it to refer to itself. The proper approach in such cases is to name the object with a variable declaration and replace the original expression with a variable reference.

Consistent with the lexical-scoping approach, and following the pattern set by other local parameterized constructs like for loops and catch clauses, the parameters of a lambda expression must not shadow any local variables in the enclosing context.

7. Variable capture

The compiler check for references to local variables of enclosing contexts in inner classes (captured variables) is quite restrictive in Java SE 7: an error occurs if the captured variable is not declared final. We can relax this restriction—for both lambda expressions and inner classes—by also allowing the capture of effectively final local variables.

Informally, a local variable is effectively final if its initial value is never changed—in other words, declaring it final would not cause a compilation failure.

Callable<String> helloCallable(String name) {
  String hello = "Hello";
  return () -> (hello + ", " + name);
}

References to this—including implicit references through unqualified field references or method invocations—are, essentially, references to a final local variable. Lambda bodies that contain such references capture the appropriate instance of this. In other cases, no reference to this is retained by the object.

This has a beneficial implication for memory management: while inner class instances always hold a strong reference to their enclosing instance, lambdas that do not capture members from the enclosing instance do not hold a reference to it. This characteristic of inner class instances can often be a source of memory leaks (the so-called lapsed listener problem).

It is our intent to prohibit capture of mutable local variables. The reason is that idioms like this:

int sum = 0;
list.forEach(e -> { sum += e.size(); });

are fundamentally serial; it is quite difficult to write lambda bodies like this that do not have race conditions. Unless we are willing to enforce—preferably at compile time—that such a function cannot escape its capturing thread, this feature may well cause more trouble than it solves.

A better approach is to up-level the computation and allow the the libraries to manage the coordination between threads; in this example, the user might invoke a reduce method in place of forEach:

int sum = list.map(e -> e.size())
              .reduce(0, (a, b) -> a+b);

The reduce operation takes a base value (in case the list is empty) and an operator (here, addition), and computes the following expression:

0 + list[0] + list[1] + list[2] + ...

Reduction can be done with other operations as well, such as minimum, maximum, product, etc, and if the operator is associative, is easily parallelized. So, rather than supporting an idiom that is fundamentally sequential and prone to data races (mutable accumulators), instead we choose to provide library support to express accumulations in a more parallelizable and less error-prone way.

Common operations (such as addition) can use method references to make reductions more readable:

int sum = list.map(e -> e.size())
              .reduce(0, Integer::plus);

8. Method references

Lambda expressions allow us to define an anonymous method and treat it as an instance of a functional interface. It is often desirable to do the same with an existing method.

Method references are expressions which have the same treatment as lambda expressions (i.e., they require a target type and encode functional interface instances), but instead of providing a method body they refer to a method of an existing class or object.

For example, consider a Person class that can be sorted by name or by age. (The following example denotes a method reference using the syntax ClassName::methodName; this syntax is provisional.)

class Person { 
  private final String name;
  private final int age;

  public static int compareByAge(Person a, Person b) { ... }
  public static int compareByName(Person a, Person b) { ... }
}

Person[] people = ...
Arrays.sort(people, Person::compareByAge);

Here, the expression Person::compareByAge can be considered shorthand for a lambda expression whose formal parameter list is copied from Comparator<String>.compare and whose body calls Person.compareByAge (though the actual implementation need not be identical).

Because the functional interface method's parameter types act as arguments in an implicit method invocation, the referenced method signature is allowed to manipulate the parameters—via widening, boxing, grouping as a variable-arity array, etc.—just like a method invocation.

interface Block<T> { void run(T arg); }

Block<Integer> b1 = System::exit;   // void exit(int status)
Block<String[]> b2 = Arrays::sort;  // void sort(Object[] a)
Block<String> b3 = MyProgram::main; // void main(String... args)
Runnable r = MyProgram::main;       // void main(String... args)

9. Kinds of method references

The examples in the previous section all use static methods. There are actually three different kinds of method references, each with slightly different syntax:

A static method
An instance method of a particular object
An instance method of an arbitrary object of a particular type

For a static method reference, as illustrated in the previous section, the class to which the method belongs precedes the :: delimiter.

For a reference to an instance method of a particular object, that object precedes the delimiter:

class ComparisonProvider {
    public int compareByName(Person p1, Person p2) { ... }
    public int compareByAge(Person p1, Person p2) { ... }
}
...
Arrays.sort(people, comparisonProvider::compareByName);

Here, the implicit lambda expression would capture the comparisonProvider variable and the body would invoke compareByName using that variable as the receiver.

The ability to reference the method of a specific object provides a convenient way to convert between different functional interface types:

Callable<Path> c = ...
PrivilegedAction<Path> a = c::call;

For a reference to an instance method of an arbitrary object, the type to which the method belongs precedes the delimiter, and the invocation's receiver is the first parameter of the functional interface method:

Arrays.sort(names, String::compareToIgnoreCase);

Here, the implicit lambda expression uses its first parameter as the receiver and its second parameter as the compareToIgnoreCase argument.

If the class of the instance method is generic, its type parameters can be provided before the :: delimiter or, in most cases, inferred from the target type.

Note that the syntax for a static method reference might also be interpreted as a reference to an instance method of a class. The compiler determines which is intended by attempting to identify an applicable method of each kind (noting that the instance method has one less argument).

For all forms of method references, method type arguments are inferred as necessary, or they can be explicitly provided following the :: delimiter.

10. Constructor references

Constructors can be referenced in much the same was as static methods by using the name new:

SocketImplFactory factory = MySocketImpl::new;

If a class has multiple constructors, the target type's method signature is used to select the best match in the same way that a constructor invocation is resolved.

In order to create a new instance of an inner class, an additional enclosing instance parameter is required. For a constructor reference, this extra parameter may either be implicitly provided by an enclosing this at the site of the reference, or it may be the functional interface method's first parameter (in the same way that the first parameter for a method reference may act as an instance method's receiver).

class Document {
  class Cursor { ... }

  // The enclosing instance, 'this', is implicit:
  Factory<Cursor> cursorFactory = Cursor::new;

  // The enclosing instance is the Mapper's parameter:
  static Mapper<Document, Cursor> DOC_TO_CURSOR = Cursor::new;
}

No syntax supports explicitly providing an enclosing instance parameter at the site of the constructor reference.

If the class to instantiate is generic, type arguments can be provided after the class name. If the constructor itself is generic, these type arguments can follow the :: token.

11. Default methods

Lambda expressions and method references add a lot of expressiveness to the Java language, but the key to really achieving our goal of making code-as-data patterns "convenient and idiomatic" is to complement these new features with libraries tailored to take advantage of them.

Adding new functionality to existing libraries is somewhat difficult in Java SE 7. In particular, interfaces are essentially set in stone once they are published. The purpose of default methods (sometimes referred to as virtual extension methods or defender methods) is to enable interfaces to be evolved in a compatible manner after their initial publication.

To illustrate, the standard collections API obviously ought to provide new lambda-friendly operations. For example, the removeAll method could be generalized to remove any of a collection's elements for which an arbitrary property held, where the property was expressed as an instance of a functional interface Predicate. But where would this new method be defined? We can't add an abstract method to the Collection interface—many existing implementations wouldn't know about the change. We could make it a static method in the Collections utility class, but that would relegate these new operations to a sort of second-class status.

Instead, default methods provide a more object-oriented way to add concrete behavior to an interface. These are a new kind of method: an interface method can either be abstract, as usual, or declare a default implementation.

interface Iterator<E> {
  boolean hasNext();
  E next();
  void remove();

  void skip(int i) default {
    for (; i > 0 && hasNext(); i--) next();
  }
}

Given the above definition of Iterator, all classes that implement Iterator would inherit a skip method. From a client's perspective, skip is just another virtual method provided by the interface. Invoking skip on an instance of a subclass of Iterator that does not provide a body for skip has the effect of invoking the default implementation: calling hasNext and next up to a certain number of times. If a class wants to override skip with a better implementation—by advancing a private cursor directly, for example, or incorporating an atomicity guarantee—it is free to do so.

When one interface extends another, it can add, change, or remove the default implementations of the superinterface's methods. To remove a default, the clause default none; is used. (The keyword none here is context-dependent; in every other context, none is still interpreted as an identifier, not a keyword.)

12. Inheritance of default methods

Default methods are inherited just like other methods; in most cases, the behavior is just as one would expect. However, in a few special circumstances, some explanation is called for.

First, when an interface redeclares a method of one of its supertypes—that is, it repeats the method's signature without mentioning a default—the default, if any, is inherited from the overridden declaration. Redeclaration is a common documentation practice, and we would not want the mere mention of a method that is already implicitly a member to have surprising side-effects.

Second, when a class's or interface's supertypes provide multiple methods with the same signature, the inheritance rules attempt to resolve the conflict. Two basic principles drive these rules:

Class method declarations are preferred to interface defaults. This is true whether the class method is concrete or abstract. (Hence the default keyword: default methods are a fallback if the class hierarchy doesn't say anything.)
Methods that are already overridden by other candidates are ignored. This circumstance can arise when supertypes share a common ancestor. Say the Collection and List interfaces provided different defaults for removeAll; in the following implements clause, the List declaration would have priority over the Collection declaration inherited by Queue:

class LinkedList<E> implements List<E>, Queue<E>

In the event that two independently-defined defaults conflict, or a default method conflicts with a default none method, the programmer must explicitly override the supertype methods. Often, this amounts to picking the preferred default. An enhanced syntax for super supports the invocation of a particular superinterface's default implementation:

interface Robot implements Artist, Gun {
  void draw() default { Artist.super.draw(); }
}

The name preceding super must refer to a direct superinterface that defines or inherits a default for the invoked method. This form of method invocation is not restricted to simple disambiguation—it can be used just like any other invocation, in both class and interface bodies.

13. Putting it together

The language features for Project Lambda were designed to work together. To illustrate, we'll consider the task of sorting a list of people by last name.

Today we write:

Collections.sort(people, new Comparator<Person>() {
    public int compare(Person x, Person y) {
        return x.getLastName().compareTo(y.getLastName());
    }
});

This is a very verbose way to write "sort people by last name"!

With lambda expressions, we can make this expression more concise:

Collections.sort(people, 
                 (Person x, Person y) -> x.getLastName().compareTo(y.getLastName()));

However, while more concise, it is not any more abstract; it still burdens the programmer with the need to do the actual comparison (which is even worse when the sort key is a primitive). Small changes to the libraries can help here, such as introducing a comparing method, which takes a function for mapping each value to a sort key and returns an appropriate comparator:

public <T, U extends Comparable<? super U>>
    Comparator<T> comparing(Mapper<T, ? extends U> mapper) { ... }

interface Mapper<T,U> { public U map(T t); }

Collections.sort(people, Collections.comparing((Person p) -> p.getLastName()));

And this can be shortened by allowing the compiler to infer the type of the lambda parameter, and importing the comparing method via a static import:

Collections.sort(people, comparing(p -> p.getLastName()));

The lambda in the above expression is simply a forwarder for the existing method getLastName. We can use method references to reuse the existing method in place of the lambda expression:

Collections.sort(people, comparing(Person::getLastName));

Finally, the use of an ancillary method like Collections.sort is undesirable for many reasons: it is more verbose; it can't be specialized for each data structure that implements List; and it undermines the value of the List interface since users can't easily discover the static sort method when inspecting the documentation for List. Default methods provide a more object-oriented solution for this problem:

people.sort(comparing(Person::getLastName));

Which also reads much more like to the problem statement in the first place: sort the people list by last name.

If we add a default method reverseOrder() to Comparator, which produces a Comparator that uses the same sort key but in reverse order, we can just as easily express a descending sort:

people.sort(comparing(Person::getLastName).reverseOrder());

Note that default methods in a functional interface don't count against its limit of one abstract method, so Comparator is still a functional interface despite having the default reverseOrder() method.