Project Lambda: Straw-Man Proposal
Mark Reinhold
2009/12/10 13:11 -0800 [3]

This is a straw-man proposal to add first-class functions, function types, and lambda expressions (informally, “closures”) to Java.

This sketch is incomplete and, most likely, inconsistent and unimplementable in its present form. That’s acceptable: It’s intended to be a starting point for discussions on the Project Lambda mailing list which will, hopefully, lead to the formulation of a detailed proposal and a prototype implementation. This document is written in a tutorial style so as to be easily approachable by non-experts, and also to stress its informal nature.

This proposal builds upon the past work of Gilad Bracha, Neal Gafter, James Gosling, and Peter von der Ahé on BGGA; of Bob Lee, Doug Lea, and Josh Bloch on CICE; and of Stephen Colebourne and Stefan Shulz on FCM. There is little here that is new beyond the synthesis of selected elements of those, and related, proposals.

Detailed commentary, which may be skipped on first reading, is enclosed in gray boxes such as this one.

1. Lambda expressions

To start, we need a way to write anonymous functions. Here’s a trivial function, which takes no arguments and always returns 42:


The ‘#’ character introduces the lambda expression, i.e., a function literal. The first pair of parentheses is empty, meaning that the function takes no arguments. The second pair encloses an expression, which is evaluated when the function is invoked.

The concrete syntax shown here is strictly provisional. There will no doubt be a long and bitter debate as to the actual syntax.

This syntax is similar—though not identical—to that of the FCM proposal. As we’ll see later there are many other differences, both syntactic and semantic.

This function takes an integer value and returns its double:

#(int x)(x + x)

Here we declare the type and name of the single parameter in the initial pair of parentheses.

Functions of more than one parameter are written in the obvious manner. This function takes two integer values and returns the result of multiplying them:

#(int x, int y)(x * y)

Sometimes we need to write functions which do something more complex than evaluate a single expression, so the body of a lambda expression can alternatively be a block:

#(int x, int y){ int z = expensiveComputation(x, y);
                 if (z < 0) return x;
                 if (z > 0) return y;
                 return 0; }

If a lambda expression’s body does not return a value then the function is void.

These are local returns: They return control from the function rather than from the scope in which it is defined. There is no non-local transfer of control in this proposal, in contrast to BGGA. Within a function body the keywords break and continue have no meaning unless they occur within a do, for, switch, or while statement in that body.

2. Function types

Every expression in Java must have a type, so we introduce function types, with a syntax similar to that of function literals. This allows us to, among other things, bind functions to variables:

#int() fortyTwo = #()(42);
#int(int) doubler = #(int x)(x + x);
#int(int,int) multiplier = #(int x, int y)(x * y);

Functions bound to variables can be invoked:

assert fortyTwo() == 42;
assert doubler(fortyTwo()) == 84;
assert multiplier(3, fortyTwo()) == 126;

Method and variable names have distinct namespaces in Java, so as it stands the expression fortyTwo() refers to a method named fortyTwo rather than a variable of that name whose value is a function object. If we cannot devise a reasonable way to allow a variable name to occur in method-invocation position then some additional syntax will be necessary. One possibility would be to enclose the variable name in parentheses to make it clear that it’s a function expression, i.e., (fortyTwo)().

Functions can also be passed as arguments to methods. This method applies the given function to each element of the given array, returning a new array:

public int[] map(int[] a, #int(int) fn) {
  int[] b = new int[a.length];
  for (int i = 0; i < a.length; i++)
    b[i] = fn(a[i]);
  return b;

This draft takes no position as to whether function types should be treated as sugar for existing kinds of types or as a brand-new kind of type.

3. Function conversion

Many existing Java libraries define interfaces which declare just one method or abstract classes which declare just one abstract method (so-called “SAM” types). A function of appropriate type is converted to an anonymous instance of such an interface or abstract class as needed so that, for example,

Thread th = new Thread(new Runnable() {
                         public void run() {

can be more compactly written as

Thread th = new Thread(#(){ doSomeStuff(); doMoreStuff(); } )

For another example, the comparator defined here:

List<String> data = ...;
                 new Comparator<String>() {
                   public int compare(String a, String b) {
                     return a.length() - b.length();

can be rewritten to the equivalent

                 #(String a, String b)(a.length() - b.length()));

4. Variable capture

Consider this function-returning method:

public #int(int) adder(int x){ return #(int y)(x + y); }

Invoking adder(42) returns a function that will add 42 to its argument:

#int(int) a42 = adder(42);
assert a42(0) == 42;

This works because when a lambda expression is evaluated at runtime the definitions of any free effectively-final or shared variables in its body (in this case x) are copied from the enclosing lexical scope into the resulting function object, or closure. This way the body can be evaluated later, potentially long after that lexical scope (in this case that of the body of the adder method) has ceased to exist.

One may argue endlessly as to whether the evaluation of a lambda expression must capture every free variable and every implicit unnamed element of its enclosing lexical scope in order for it to be called a closure, or whether the degenerate case of capturing no variables is still a closure. I have no intention of doing so here.

An effectively-final variable is one that is either declared final or is definitely assigned-to exactly once somewhere in the enclosing scope, prior to the evaluation of the lambda expression and in a manner that is verifiable by the compiler in the usual way. It is a compile-time error to modify the value of such a variable within the body of a lambda expression.

In some cases it is useful to be able to share local state between the body of a lambda expression and its enclosing scope. A shared variable is a non-final variable declared with the new restricted keyword shared. A shared variable can be modified within the body of a lambda expression, although it must be definitely assigned-to before the lambda expression is evaluated.

For example,

shared int count = 0;
                 #(String a, String b){
                   return a.length() - b.length() });

will print the number of comparison operations used to sort the given list.

We use a restricted keyword here so as not to break existing source code that uses shared as an identifier.

It may be worth allowing shared variables also to be declared volatile.

5. Instance capture

Within a lambda expression occurring within an instance method the keyword this refers to the object upon which the method is invoked. Other members declared within the enclosing class are available in the usual way.

For example, given the class

class CountingSorter {
  int count = 0;
  void sort(List<String> data) {
                     #(String a, String b){
                       return a.length() - b.length(); });

then the code

CountingSorter cs = new CountingSorter();

will print the number of comparison operations used to sort all three lists.

It is a compile-time error to reference this or any non-static member within a lambda expression occurring in a static context.

The remaining sections of this proposal describe features which are not critical to the primary use case of supporting the convenient expression of computations upon parallel collections in terms of bulk-data operations. These features are intended to improve the fit between the existing language and libraries and the features described above. As such they deserve investigation as time allows.

6. Exception transparency

(To be written)

Most likely as in BGGA.

7. Method references

(To be written)

Yes, these would be handy. Most likely along the lines of BGGA rather than FCM.

8. Extension methods

The primary motivation for this proposal is to make it easy to write programs using the parallel array API in terms of the fundamental bulk-data operations of filter, map, and reduce. It would be awkward, to say the least, if we were not also able to use lambda expressions to express computations upon ordinary sequential collections in a similar fashion.

To compute the sum of the integers in a given Set<Integer>, e.g., we’d like to be able to write something like:

Set<Integer> s = ...;
int sum = s.reduce(#(int x, int b)(x + b), 0);

To achieve this we introduce extension methods, which allow the author of an existing interface to, in effect, add methods to that interface while preserving compatibility. To add filter, map, and reduce methods to the Set interface, e.g., we first define corresponding static methods in the Collections class:

class Collections {
  static <T> Set<T> filter(Set<T> s, #boolean(T) pred) { ... }
  static <S,T> Set<S> map(Set<T> s, #S(T) func) { ... }
  static <T> T reduce(Set<T> s, #T(T,T) op, T base) { ... }

We then use an enhanced form of the existing static import feature to make these methods available to users of the Set interface:

interface Set<T> extends Collection<T> {
  Set<T> filter(#boolean(T)) import static Collections.filter;
  <S> map(#S(T)) import static;
  T reduce(#T(T,T), T) import static Collections.reduce;

An invocation of an extension method is translated by the compiler into an ordinary invocation of the corresponding static method, passing the instance upon which it is invoked as the first argument. For example,

int sum = s.reduce(#(int x, int b)(x + b), 0);


int sum = Collections.reduce(s, #(int x, int b)(x + b), 0);

and the more complex

int ans = s.filter(#(int x)(x != 0))
           .map(#(int x)(x + 3))
           .reduce(#(int x, int b)(x + b));

is translated into

int ans = Collections.reduce(
                Collections.filter(s, #(int x)(x != 0)),
                #(int x)(x + 3)),
              #(int x, int b)(x + b), 0);

It is a compile-time error for conflicting extension-method declarations to occur within a class or interface, whether directly or via inheritance. An extension method in an interface is hidden in a class implementing that interface if the class defines a static or instance method with the same signature.

There aren’t really any better alternatives here. We can’t add bulk-data operations to the existing collection interfaces without breaking compatibility. We could introduce an additional layer of collection interfaces (e.g., Set2, List2, and so forth) which include these operations; that would preserve compatibility but be very difficult to work with. The best we could do without changing the language any further is just use static methods in the Collections class directly. That’s not so bad for simple cases but it quickly becomes unwieldy for more complex cases such as the final example shown above.

This form of declaration-site extension method was suggested by Peter Ahé. I agree with both Rémi Forax and Peter that this approach is a better fit for Java than use-site extension methods, as suggested by Neal Gafter.

Extension methods would not be needed if we instead added Scala-like traits to Java, as Howard Lovatt and others have suggested. That would, however, be a much more significant change to both the language and the VM.

Creative Commons License
Project Lambda: Straw-Man Proposal by Mark Reinhold is licensed under a Creative Commons Attribution-Share Alike 3.0 U.S. License.