Enhanced-For Statement Should Allow Streams

Stuart W. Marks, 28 Feb 2019

Abstract

Occasionally it’s useful to iterate a Stream using a conventional loop. However, the Stream interface doesn’t implement Iterable, and therefore streams cannot be used with the enhanced-for statement. This is a proposal to remedy that situation by introducing a new interface IterableOnce that is a subtype of Iterable, and then retrofitting the Stream interface to implement it. Other JDK classes will also be retrofitted to implement IterableOnce.

This work is covered by bug JDK-8148917.

Background

A fairly early decision in the streams design is that a stream can only be consumed once. Attempting to consume a stream the second and subsequent times results in an IllegalStateException. Calling the iterator() method on a Stream counts as consuming the stream.

The specification of Iterable interface is not very explicit, but it is a widely held assumption is that one can call its iterator() method multiple times, thereby obtaining multiple Iterator instances that can be operated upon independently. The collections classes, all of which implement Iterable, have this property.

Violating this assumption can easily lead to incorrect results. An Iterable that can only be iterated once might decide to return the same Iterator instance from subsequent calls. If it were used in another for-loop, that loop would terminate silently without processing any elements, because the Iterator would already have been exhausted. This is likely to give unexpected results without warning.

A notable example in the JDK is the NIO DirectoryStream class. (It was added in Java 7.) DirectoryStream does implement Iterable, and it allows only one call to the iterator() method. However, as with streams, the second and subsequent calls result in an IllegalStateException. It is recognized that this is unusual behavior for an Iterable, and as such, its class specification includes this warning:

While DirectoryStream extends Iterable, it is not a general-purpose Iterable as it supports only a single Iterator; invoking the iterator method to obtain a second or subsequent iterator throws IllegalStateException.

As things stood prior to Java 8, most Iterable instances were reusable, and very few were one-shot. In Java 8, we decided against having Stream interface implement Iterable, as we expected streams to be quite popular, and we didn’t want to create the situation where there were potentially many Iterable instances that were one-shot.

Despite the fact that Stream is structurally compatible with Iterable – it has an iterator() method that returns an Iterator – it is not semantically compatible. Indeed, the java.util.stream package specification draws an analogy between Stream and Iterator, not Iterable.

It is, however, a continual source of friction that it’s not possible to iterate a Stream using an enhanced-for loop. There are of course workarounds:

Stream<T> stream = ... ;
for (T t : (Iterable<T>)stream::iterator) {
    ...
}

The cast is necessary because the right-hand side of the enhanced-for statement doesn’t provide a target type that allows the method reference’s type to be inferred. This workaround is fairly ugly and is entirely unobvious. Creating a local variable to hold the Iterable suffers from similar problems.

Another alternative is to use a basic for statement:

for (Iterator<T> it = stream.iterator() ; it.hasNext() ; ) {
    T t = it.next();
    ...
}

This works, but it’s an uncomfortable step backwards, as the enhanced-for statement has largely supplanted the basic for statement except for cases where explicit use of the Iterator is required, for exmaple, to call the remove() method.

Proposal

  1. Add an interface IterableOnce as a subtype of Iterable, with a single iterator() method that returns Iterator. Add a requirement that this method returns an Iterator only the first time it’s called, with the second and subsequent calls resulting in IllegalStateException.

  2. Retrofit IterableOnce onto Stream, DirectoryStream, and Scanner. No further changes are necessary to Stream and DirectoryStream. A new iterator() method is added to Scanner, which already implements Iterator. This method simply returns this and sets a flag to throw IllegalStateException after the first call.

Analysis

The proposed changes allow streams to be used seamlessly in enhanced-for loops:

for (T t : stream) {
    ...
}

Adding IterableOnce as a subtype of Iterable is uncomfortable, as it can violate substitutability. In other words, it’s possible for errors to occur if an IterableOnce is passed to code that expects an Iterable and that also attempts to iterate it more than once. For example, using AssertJ, the following statement iterates the argument of assertThat(Iterable) twice and thus will throw IllegalStateException:

assertThat(Stream.of("a", "b", "c"))
    .containsOnlyElementsOf(List.of("a", "b", "c"));

Even though such errors can still occur, it’s still useful to define IterableOnce as a separate interface. It provides a place to document the one-shot behavior, and implementations can be denoted as implementing this interface, instead of this merely being noted in the specification text. Having a separate interface also might allow static analysis tools to be able to detect some cases where an instance of IterableOnce is mistakenly used multiple times.

Defining one-shot semantics into a new interface leaves room to strengthen the specification of Iterable regarding reuse. Its specification can be changed to recommend reusability more strongly, and to recommend returning new Iterator instances each time instead of returning the same one. (This can’t be required, though, since it would potentially invalidate existing implementations.) Changes to the specification of Iterable are covered separately under JDK-8186220.

Example

Consider an example of listing the files in a directory and generating a list of text files and computing their total size. One way to do this using streams is as follows:

void textFileSize1(Path dir) throws IOException {
    try (Stream<Path> paths = Files.list(DIR)) {
        var textFiles = new ArrayList<Path>();
        long totalSize = paths
             .filter(Files::isRegularFile)
             .filter(path -> path.toString().endsWith(".txt"))
             .peek(textFiles::add)
             .mapToLong(path -> {
                 try {
                     return Files.size(path);
                 } catch (IOException ioe) {
                     throw new UncheckedIOException(ioe);
                 }
             })
             .sum();
        System.out.println(textFiles);
        System.out.println("total size = " + totalSize);
    } catch (UncheckedIOException uioe) {
        throw uioe.getCause();
    }
}

This is fairly cumbersome, for a couple reasons. First, two results are being accumulated: the path list and the total size. Accumulating the list isn’t terribly bad, but it does require use of the much-reviled peek() operation in order to do the list accumulation. We do this because the downstream pipeline step maps the path to the file size, a long, which we then reduce using sum().

Unfortunately, finding the file’s size requires use of an inner try-catch statement to handle and convert the IOException into an UncheckedIOException. The outer try-with-resources statement is necessary to ensure that the stream of paths is closed, but it’s also required to have a catch clause to convert the UncheckedIOException back into an IOException.

A rewritten version, using a Stream in the enhanced-for statement, is as follows:

void textFileSize2(Path dir) throws IOException {
    try (Stream<Path> paths = Files.list(DIR)) {
        var textFiles = new ArrayList<Path>();
        long totalSize = 0L;
        Stream<Path> filtered =
            paths.filter(Files::isRegularFile)
                 .filter(path -> path.toString().endsWith(".txt"));
        for (Path path : filtered) {
            totalSize += Files.size(path);
            textFiles.add(path);
        }
        System.out.println(textFiles);
        System.out.println("total size = " + totalSize);
    }
}

This is essentially the same logic, but it’s much simplified by the removal of most of the exception handling. Also note that the accumulation and exception-throwing operations are separated from the stream pipeline and are placed into a trailing for-loop. Using an ordinary loop for these operations makes simple accumulation into multiple local variables quite convenient. A stream-based alternative would be to use Collectors.teeing() (new in Java 12), or a collector with a custom state object, but this level of complexity is justified only for much more demanding problems.

(Thanks to Erik Duveblad for the inspiration for this example.)

Alternatives

One alternative considered was to introduce IterableOnce as a supertype of Iterable instead of as a subtype. This makes some sense from an object design standpoint, as any Iterable can be iterated once and thus would appear to be an IterableOnce. This would prevent errors such as the AssertJ example above.

However, doing so would require weakening the contract of IterableOnce. To allow Iterable to be a subtype, the semantics of IterableOnce would need to change to allow iteration once and possibly more, instead of the currently proposed at-most-once semantics. Having IterableOnce be a subtype allows the specification to make a much stronger assertion.

Moreover, inserting IterableOnce as a supertype requires changing the language to allow acceptance of IterableOnce on the right-hand side of the enhanced-for loop. The language and the libraries are at “arms length” from each other, and the places where the language depends on the libraries are narrow and well-controlled. Changing this relationship should only be done where there is an overwhelming advantage in doing so compared to the alternatives. That’s not the case here; the proposal can still be quite useful and successful as it stands, without changing the language.

Another alternative is to add a utility method that adapts an Iterator to an Iterable, e.g.,

<T> Iterable<T> asIterable(Iterator<? extends T> iterator)

This would allow one to write

for (T t : asIterable(stream.iterator())) {
    ...
}

While this is cleaner than the cast we have to live with today, it’s not as nice as using a stream directly. Worse, though, it introduces an API that can be used to convert any Iterator into a poorly behaved (i.e., one-shot) Iterable.

An alternative adapter is to add a default method to Stream:

default Iterable<T> asIterable() { return this::iterator; }

for (T t : asIterable(stream)) {
    ...
}

This is slightly better at the call site, and it’s restricted to streams. But it’s still not as nice as IterableOnce and it facilitates creation of poorly-behaved Iterable instances.

A variation that was considered and rejected was to retrofit BaseStream instead of Stream. This introduces a method clash in IntStream between these methods:

forEach(Consumer<? super T>)  // from BaseStream
forEach(IntConsumer)          // from IntStream

as well as analogous clashes in LongStream and DoubleStream.

Another variation is to retrofit IntStream to implement IterableOnce<Integer>, and similar for Long and Double. This could be done, but it would result in implicit boxing of all primitive values if this were used an enhanced-for loop. This is quite a severe penalty. It’s preferable to have the caller make boxing explicit through use of the boxed() method. An alternative is for the caller to create an array with toArray() and use the array form of the enhanced-for statement. This avoids boxing, but it loses laziness. At least the caller has a choice of tradeoffs here.