This is an informal overview of the enhancements to the Java programming language specified by JSR 335 and implemented in the OpenJDK Lambda Project. It refines the previous iteration posted in October 2010. A formal description of some of the language changes may be found in the Early Draft Specification for the JSR; an OpenJDK Developer Preview is also available. Additional design documents—in particular a more detailed examination of default methods—can be found at the OpenJDK project page. As all of these artifacts are works-in-progress, there may be inconsistencies between them, which will be resolved by the time the spec and implementation are finalized.
The high-level goal of Project Lambda is to allow programming patterns that require modeling code as data to be convenient and idiomatic in Java. The principal new language features include:
These are described and illustrated below.
Java is, primarily, an object-oriented programming language. In both object-oriented and functional languages, basic values can dynamically encapsulate program behavior: object-oriented languages have objects with methods, and functional languages have functions. This similarity may not be obvious, however, because Java objects tend to be relatively heavyweight: instantiations of separately-declared classes wrapping a handful of fields and many methods.
Yet it is not uncommon for some objects to essentially encode nothing more than a function. In a typical use case, a Java API defines an interface, informally called a "callback interface," and expects a user to provide an instance of the interface when invoking the API. For example:
public interface ActionListener {
void actionPerformed(ActionEvent e);
}
Rather than declaring a class that implements ActionListener
for the
sole purpose of allocating it once at an invocation site, a user
typically instantiates the implementing class inline, anonymously:
button.addActionListener(new ActionListener() {
public void actionPerformed(ActionEvent e) {
ui.dazzle(e.getModifiers());
}
});
Many useful libraries rely on this pattern. It is particularly important for parallel APIs, in which the code to execute must be expressed independently of the thread in which it will run. The parallel-programming domain is of special interest, because as CPU makers focus their efforts on improving performance through a proliferation of cores, serial APIs are limited to a shrinking fraction of available processing power.
Given the increasing relevance of callbacks and other functional-style idioms, it is important that modeling code as data in Java be as lightweight as possible. In this respect, anonymous inner classes are imperfect for a number of reasons, primarily:
this
This project addresses many of these issues. It eliminates (1) and (2) by introducing new, much more concise expression forms with local scoping rules, sidesteps (3) by defining the semantics of the new expressions in a more flexible, optimization-friendly manner, and ameliorates (4) by allowing the compiler to infer finality (allowing capture of effectively final local variables).
However, it is not a goal of this project to address all the problems of inner classes. Neither arbitrary capture of mutable variables (4) nor nonlocal control flow (5) are within this project's scope, but such features may be revisited in a future iteration of the language.
The anonymous inner class approach, despite its limitations, has the nice property of fitting very cleanly into Java's type system: a function value with an interface type. This is convenient for a number of reasons: interfaces are already an intrinsic part of the type system; they naturally have a runtime representation; and they carry with them informal contracts expressed by Javadoc comments, such as an assertion that an operation is commutative.
The interface ActionListener
, used above, has just one method. Many
common callback interfaces have this property, such as Runnable
and
Comparator
. We'll give all interfaces that have just one method a
name: functional interfaces. (These were previously called
SAM Types.)
Nothing special needs to be done to declare an interface as
functional—the compiler identifies it as such based on its
structure. (This identification process is a little more complex than
just counting method declarations. For example, an interface might
inherit methods from multiple parents that logically represent the same
method, or it might redundantly declare a method that is automatically
provided by the class Object
, like toString
.)
An alternative (or complementary) approach to function types, suggested
by some early proposals, would have been to introduce a new,
structural function type. A type like "function from a String
and an
Object
to an int
" might be expressed as (String,Object)->int
.
This idea was considered and rejected, at least for now, due to several
disadvantages:
m(T->U)
and m(X->Y)
.So, we have instead chosen to take the path of "use what you know"—since existing libraries use functional interfaces extensively, we codify and leverage this pattern.
To illustrate, here are some of the functional interfaces in Java SE 7 that are well-suited for being used with the new language features; the examples that follow illustrate the use of a few of them.
java.lang.Runnable
java.util.concurrent.Callable
java.security.PrivilegedAction
java.util.Comparator
java.io.FileFilter
java.nio.file.PathMatcher
java.lang.reflect.InvocationHandler
java.beans.PropertyChangeListener
java.awt.event.ActionListener
javax.swing.event.ChangeListener
The biggest pain point for anonymous inner classes is bulkiness. They
have what we might call a "vertical problem": the ActionListener
instance from section 1 uses five lines of source code to
encapsulate a single statement.
Lambda expressions are anonymous methods, aimed at addressing the "vertical problem" by replacing the machinery of anonymous inner classes with a lighter-weight mechanism.
Here are some examples of lambda expressions:
(int x, int y) -> x + y
() -> 42
(String s) -> { System.out.println(s); }
The first expression takes two integer arguments, named x
and y
, and
returns x+y
. The second takes no arguments and returns the integer
42
. The third takes a string and prints it to the console, returning
nothing.
The general syntax consists of an argument list, the arrow token ->
,
and a body. The body can either be a single expression or a statement
block. In the expression form, the body is simply evaluated and
returned. In the block form, the body is evaluated like a method
body—a return
statement returns control to the caller of the
anonymous method; break
and continue
are illegal at the top level,
but are of course permitted within loops; and if the body produces a
result, every control path must return something or throw an exception.
The syntax is optimized for the common case in which a lambda expression
is quite small, as illustrated above. For example, the
expression-body form eliminates the need for a return
keyword, which
could otherwise represent a substantial syntactic overhead relative to
the size of the expression.
It is also expected that lambda expressions will frequently appear in nested contexts, such as the argument to a method invocation or the result of another lambda expression. To minimize noise in these cases, unnecessary delimiters are avoided. However, for situations in which it is useful to set the entire expression apart, it can be surrounded with parentheses, just like any other expression.
Here are some examples of lambda expressions appearing in statements:
FileFilter java = (File f) -> f.getName().endsWith(".java");
String user = doPrivileged(() -> System.getProperty("user.name"));
new Thread(() -> {
connectToService();
sendNotification();
}).start();
Note that the name of a functional interface is not part of the lambda
expression syntax. So what kind of object does a lambda expression
represent? Its type is inferred from the surrounding context. For
example, the following lambda expression is an ActionListener
:
ActionListener l = (ActionEvent e) -> ui.dazzle(e.getModifiers());
An implication of this approach is that the same lambda expression can have different types in different contexts:
Callable<String> c = () -> "done";
PrivilegedAction<String> a = () -> "done";
In the first case, the lambda expression () -> "done"
represents an
instance of Callable
. In the second case, the same expression
represents an instance of PrivilegedAction
.
The compiler is responsible for inferring the type of each lambda expression. It uses the type expected in the context in which the expression appears; this type is called the target type. A lambda expression can only appear in a context that has a target type.
Of course, no lambda expression will be compatible with every possible target type. The compiler checks that the types used by the lambda expression are consistent with the target type's method signature. That is, a lambda expression is compatible with a target type T if all of the following conditions hold:
throws
clauseSince a functional interface target type already "knows" what types the lambda expression's formal parameters should have, it is often unnecessary to repeat them. The use of target typing often allows the lambda parameters' types to be inferred:
Comparator<String> c = (s1, s2) -> s1.compareToIgnoreCase(s2);
In addition, when there is just one parameter whose type is inferred (a very common case), the parentheses surrounding a single parameter name are optional:
FileFilter java = f -> f.getName().endsWith(".java");
button.addActionListener(e -> ui.dazzle(e.getModifiers()));
These enhancements further a desirable design goal: "Don't turn a vertical problem into a horizontal problem." We want the reader of the code to have to wade through as little syntax as possible before arriving at the "meat" of the lambda expression.
Lambda expressions are not the first Java expressions to have context-dependent types: generic method invocations and "diamond" constructor invocations, for example, are similarly type-checked based on an assignment's target type.
List<String> ls = Collections.emptyList();
List<Integer> li = Collections.emptyList();
Map<String,Integer> m1 = new HashMap<>();
Map<Integer,String> m2 = new HashMap<>();
We stated earlier that lambda expressions can only appear in contexts that have target types. So, what contexts have target types?
?:
In the first three cases, the target type is simply the type being assigned to or returned.
Comparator<String> c;
c = (String s1, String s2) -> s1.compareToIgnoreCase(s2);
public Runnable toDoLater() {
return () -> {
System.out.println("later");
};
}
Array initializer contexts are like assignments, except that the "variable" is an array component and its type is derived from the array's type.
runAll(new Callable<String>[]{ ()->"a", ()->"b", ()->"c" });
In the method argument case, things are more complicated: the target type is determined by two other language features, overload resolution and type argument inference. For each potentially-applicable method, the compiler determines whether the lambda expression is compatible with the corresponding target type, and also infers any type arguments. Once the best method declaration is chosen, that declaration provides the actual target type for the expression.
void invoke(Runnable r) { r.run(); }
<T> T invoke(Callable<T> c) { return c.call(); }
String s = invoke(() -> "done"); // invoke(Callable)
(If the choice of a best method declaration is ambiguous, casts can provide a workaround; see the below discussion.)
Lambda expressions themselves provide target types for their bodies, in this case by deriving that type from the outer target type. This makes it convenient to write functions that return other functions:
Callable<Runnable> c = () -> () -> { System.out.println("hi"); };
Similarly, conditional expressions can "pass down" a target type from the surrounding context:
Callable<Integer> c = flag ? (() -> 23) : (() -> 42);
Finally, cast expressions provide a mechanism to explicitly provide a lambda expression's type if none can be conveniently inferred from context:
// Illegal: Object o = () -> { System.out.println("hi"); };
Object o = (Runnable) () -> { System.out.println("hi"); };
Casts are also useful to help resolve ambiguity when a method declaration is overloaded with unrelated functional interface types.
The expanded role of target typing in the compiler is not limited to lambda expressions: generic method invocations and "diamond" constructor invocations can also take advantage of target types wherever they are available. The following declarations are illegal in Java SE 7 but valid under JSR 335:
List<String> ls =
Collections.checkedList(new ArrayList<>(), String.class);
Set<Integer> si = flag ? Collections.singleton(23)
: Collections.emptySet();
Determining the meaning of names (and this
) in inner classes is
significantly more difficult and error-prone than when classes are
limited to the top level. Inherited members—including methods of
class Object
—can accidentally shadow outer declarations, and
unqualified references to this
always refer to the inner class itself.
Lambda expressions are much simpler: they do not inherit any names from
a supertype, nor do they introduce a new level of scoping. Instead,
they are lexically scoped, meaning names in the body are interpreted
just as they are in the enclosing environment (with the addition of new
names for the lambda expression's formal parameters). As a natural
extension, the this
keyword and references to its members have the
same meaning as they would immediately outside the lambda expression.
To illustrate, the following program prints "Hello, world!"
twice to
the console:
public class Hello {
Runnable r1 = () -> { System.out.println(this); }
Runnable r2 = () -> { System.out.println(toString()); }
public String toString() { return "Hello, world!"; }
public static void main(String... args) {
new Hello().r1.run();
new Hello().r2.run();
}
}
The equivalent using anonymous inner classes would instead, perhaps to
the programmer's surprise, print something like Hello$1@5b89a773
and
Hello$2@537a7706
.
Of course, if this
inside a lambda expression refers to the enclosing
class, it cannot also be used to refer to the function value described
by the lambda expression. It is not usually necessary to do so, but on
some occasions—say, when defining a recursive function—it is
important.
Fortunately, by simply refining the "initialized-before-use" analysis that determines when variables can be referenced, the compiler can permit a lambda expression to mention the variable to which it is assigned.
final Runnable r = () -> {
// This reference to 'r' is legal:
if (!allDone) { workQueue.add(r); }
else { displayResults(); }
};
// For contrast:
// final Object[] objs =
// This reference to 'objs' is illegal (it's uninitialized):
// { "x", 23, objs };
When a lambda expression appears in any other context, such as a return
expression, there is no way for it to refer to itself. The proper
approach in such cases is to name the object with a variable declaration
and replace the original expression with a variable reference.
Consistent with the lexical-scoping approach, and following the pattern
set by other local parameterized constructs like for
loops and catch
clauses, the parameters of a lambda expression must not shadow any local
variables in the enclosing context.
The compiler check for references to local variables of enclosing
contexts in inner classes (captured variables) is quite restrictive in
Java SE 7: an error occurs if the captured variable is not declared
final
. We can relax this restriction—for both lambda
expressions and inner classes—by also allowing the capture of
effectively final local variables.
Informally, a local variable is effectively final if its initial value
is never changed—in other words, declaring it final
would not cause
a compilation failure.
Callable<String> helloCallable(String name) {
String hello = "Hello";
return () -> (hello + ", " + name);
}
References to this
—including implicit references through
unqualified field references or method invocations—are,
essentially, references to a final
local variable. Lambda bodies that
contain such references capture the appropriate instance of this
. In
other cases, no reference to this
is retained by the object.
This has a beneficial implication for memory management: while inner class instances always hold a strong reference to their enclosing instance, lambdas that do not capture members from the enclosing instance do not hold a reference to it. This characteristic of inner class instances can often be a source of memory leaks (the so-called lapsed listener problem).
It is our intent to prohibit capture of mutable local variables. The reason is that idioms like this:
int sum = 0;
list.forEach(e -> { sum += e.size(); });
are fundamentally serial; it is quite difficult to write lambda bodies like this that do not have race conditions. Unless we are willing to enforce—preferably at compile time—that such a function cannot escape its capturing thread, this feature may well cause more trouble than it solves.
A better approach is to up-level the computation and allow the
the libraries to manage the coordination between
threads; in this example, the user might invoke a reduce
method in
place of forEach
:
int sum = list.map(e -> e.size())
.reduce(0, (a, b) -> a+b);
The reduce
operation takes a base value (in case the list is empty)
and an operator (here, addition), and computes the following expression:
0 + list[0] + list[1] + list[2] + ...
Reduction can be done with other operations as well, such as minimum, maximum, product, etc, and if the operator is associative, is easily parallelized. So, rather than supporting an idiom that is fundamentally sequential and prone to data races (mutable accumulators), instead we choose to provide library support to express accumulations in a more parallelizable and less error-prone way.
Common operations (such as addition) can use method references to make reductions more readable:
int sum = list.map(e -> e.size())
.reduce(0, Integer::plus);
Lambda expressions allow us to define an anonymous method and treat it as an instance of a functional interface. It is often desirable to do the same with an existing method.
Method references are expressions which have the same treatment as lambda expressions (i.e., they require a target type and encode functional interface instances), but instead of providing a method body they refer to a method of an existing class or object.
For example, consider a Person
class that can be sorted by name or by
age. (The following example denotes a method reference using the syntax
ClassName::methodName; this syntax is provisional.)
class Person {
private final String name;
private final int age;
public static int compareByAge(Person a, Person b) { ... }
public static int compareByName(Person a, Person b) { ... }
}
Person[] people = ...
Arrays.sort(people, Person::compareByAge);
Here, the expression Person::compareByAge
can be considered shorthand
for a lambda expression whose formal parameter list is copied from
Comparator<String>.compare
and whose body calls Person.compareByAge
(though the actual implementation need not be identical).
Because the functional interface method's parameter types act as arguments in an implicit method invocation, the referenced method signature is allowed to manipulate the parameters—via widening, boxing, grouping as a variable-arity array, etc.—just like a method invocation.
interface Block<T> { void run(T arg); }
Block<Integer> b1 = System::exit; // void exit(int status)
Block<String[]> b2 = Arrays::sort; // void sort(Object[] a)
Block<String> b3 = MyProgram::main; // void main(String... args)
Runnable r = MyProgram::main; // void main(String... args)
The examples in the previous section all use static methods. There are actually three different kinds of method references, each with slightly different syntax:
For a static method reference, as illustrated in the previous section,
the class to which the method belongs precedes the ::
delimiter.
For a reference to an instance method of a particular object, that object precedes the delimiter:
class ComparisonProvider {
public int compareByName(Person p1, Person p2) { ... }
public int compareByAge(Person p1, Person p2) { ... }
}
...
Arrays.sort(people, comparisonProvider::compareByName);
Here, the implicit lambda expression would capture the
comparisonProvider
variable and the body would invoke compareByName
using that variable as the receiver.
The ability to reference the method of a specific object provides a convenient way to convert between different functional interface types:
Callable<Path> c = ...
PrivilegedAction<Path> a = c::call;
For a reference to an instance method of an arbitrary object, the type to which the method belongs precedes the delimiter, and the invocation's receiver is the first parameter of the functional interface method:
Arrays.sort(names, String::compareToIgnoreCase);
Here, the implicit lambda expression uses its first parameter as the
receiver and its second parameter as the compareToIgnoreCase
argument.
If the class of the instance method is generic, its type parameters can
be provided before the ::
delimiter or, in most cases, inferred from
the target type.
Note that the syntax for a static method reference might also be interpreted as a reference to an instance method of a class. The compiler determines which is intended by attempting to identify an applicable method of each kind (noting that the instance method has one less argument).
For all forms of method references, method type arguments are inferred
as necessary, or they can be explicitly provided following the ::
delimiter.
Constructors can be referenced in much the same was as static methods by
using the name new
:
SocketImplFactory factory = MySocketImpl::new;
If a class has multiple constructors, the target type's method signature is used to select the best match in the same way that a constructor invocation is resolved.
In order to create a new instance of an inner class, an additional
enclosing instance parameter is required. For a constructor
reference, this extra parameter may either be implicitly provided by an
enclosing this
at the site of the reference, or it may be the
functional interface method's first parameter (in the same way that the
first parameter for a method reference may act as an instance method's
receiver).
class Document {
class Cursor { ... }
// The enclosing instance, 'this', is implicit:
Factory<Cursor> cursorFactory = Cursor::new;
// The enclosing instance is the Mapper's parameter:
static Mapper<Document, Cursor> DOC_TO_CURSOR = Cursor::new;
}
No syntax supports explicitly providing an enclosing instance parameter at the site of the constructor reference.
If the class to instantiate is generic, type arguments can be provided
after the class name. If the constructor itself is generic, these type
arguments can follow the ::
token.
Lambda expressions and method references add a lot of expressiveness to the Java language, but the key to really achieving our goal of making code-as-data patterns "convenient and idiomatic" is to complement these new features with libraries tailored to take advantage of them.
Adding new functionality to existing libraries is somewhat difficult in Java SE 7. In particular, interfaces are essentially set in stone once they are published. The purpose of default methods (sometimes referred to as virtual extension methods or defender methods) is to enable interfaces to be evolved in a compatible manner after their initial publication.
To illustrate, the standard collections API obviously ought to provide
new lambda-friendly operations. For example, the removeAll
method
could be generalized to remove any of a collection's elements for which
an arbitrary property held, where the property was expressed as an
instance of a functional interface Predicate
. But where would this
new method be defined? We can't add an abstract method to the
Collection
interface—many existing implementations wouldn't know
about the change. We could make it a static method in the Collections
utility class, but that would relegate these new operations to a sort of
second-class status.
Instead, default methods provide a more object-oriented way to add concrete behavior to an interface. These are a new kind of method: an interface method can either be abstract, as usual, or declare a default implementation.
interface Iterator<E> {
boolean hasNext();
E next();
void remove();
void skip(int i) default {
for (; i > 0 && hasNext(); i--) next();
}
}
Given the above definition of Iterator
, all classes that implement
Iterator
would inherit a skip
method. From a client's perspective,
skip
is just another virtual method provided by the interface. Invoking
skip
on an instance of a subclass of Iterator
that does not provide a body for
skip
has the effect of invoking the default implementation:
calling hasNext
and next
up to a certain number of times. If a
class wants to override skip
with a better implementation—by
advancing a private cursor directly, for example, or incorporating an
atomicity guarantee—it is free to do so.
When one interface extends another, it can add, change, or remove the
default implementations of the superinterface's methods. To remove a
default, the clause default none;
is used. (The keyword none
here is
context-dependent; in every other context, none
is still interpreted
as an identifier, not a keyword.)
Default methods are inherited just like other methods; in most cases, the behavior is just as one would expect. However, in a few special circumstances, some explanation is called for.
First, when an interface redeclares a method of one of its supertypes—that is, it repeats the method's signature without mentioning a default—the default, if any, is inherited from the overridden declaration. Redeclaration is a common documentation practice, and we would not want the mere mention of a method that is already implicitly a member to have surprising side-effects.
Second, when a class's or interface's supertypes provide multiple methods with the same signature, the inheritance rules attempt to resolve the conflict. Two basic principles drive these rules:
Class method declarations are preferred to interface defaults. This is true
whether the class method is concrete or abstract. (Hence the default
keyword: default methods are a fallback if the class hierarchy doesn't
say anything.)
Methods that are already overridden by other candidates are ignored.
This circumstance can arise when supertypes share a common
ancestor. Say the Collection
and List
interfaces provided
different defaults for removeAll
; in the following implements
clause, the List
declaration would have priority over the Collection
declaration inherited by Queue
:
class LinkedList<E> implements List<E>, Queue<E>
In the event that two independently-defined defaults conflict, or
a default method conflicts with a default none
method, the programmer
must explicitly override the supertype methods. Often, this amounts to
picking the preferred default. An enhanced syntax for super
supports
the invocation of a particular superinterface's default implementation:
interface Robot implements Artist, Gun {
void draw() default { Artist.super.draw(); }
}
The name preceding super
must refer to a direct superinterface that
defines or inherits a default for the invoked method. This form of
method invocation is not restricted to simple disambiguation—it
can be used just like any other invocation, in both class and interface
bodies.
The language features for Project Lambda were designed to work together. To illustrate, we'll consider the task of sorting a list of people by last name.
Today we write:
Collections.sort(people, new Comparator<Person>() {
public int compare(Person x, Person y) {
return x.getLastName().compareTo(y.getLastName());
}
});
This is a very verbose way to write "sort people by last name"!
With lambda expressions, we can make this expression more concise:
Collections.sort(people,
(Person x, Person y) -> x.getLastName().compareTo(y.getLastName()));
However, while more concise, it is not any more abstract; it still
burdens the programmer with the need to do the actual comparison
(which is even worse when the sort key is a primitive). Small changes
to the libraries can help here, such as introducing a comparing
method,
which takes a function for mapping each value to a sort key and returns
an appropriate comparator:
public <T, U extends Comparable<? super U>>
Comparator<T> comparing(Mapper<T, ? extends U> mapper) { ... }
interface Mapper<T,U> { public U map(T t); }
Collections.sort(people, Collections.comparing((Person p) -> p.getLastName()));
And this can be shortened by allowing the compiler to infer the type of
the lambda parameter, and importing the comparing
method via a static import:
Collections.sort(people, comparing(p -> p.getLastName()));
The lambda in the above expression is simply a forwarder for the
existing method getLastName
. We can use method references to reuse
the existing method in place of the lambda expression:
Collections.sort(people, comparing(Person::getLastName));
Finally, the use of an ancillary method like Collections.sort
is
undesirable for many reasons: it is more verbose; it can't be specialized
for each data structure that implements List
; and it undermines the
value of the List
interface since users can't easily discover the
static sort
method when inspecting the documentation for List
.
Default methods provide a more object-oriented solution for this
problem:
people.sort(comparing(Person::getLastName));
Which also reads much more like to the problem statement in the first place: sort the
people
list by last name.
If we add a default method reverseOrder()
to Comparator
, which produces a Comparator
that uses the same sort key but in reverse order, we can just as easily express a descending sort:
people.sort(comparing(Person::getLastName).reverseOrder());
Note that default methods in a functional interface don't count against its limit of one
abstract method, so Comparator
is still a functional interface despite having the default
reverseOrder()
method.