Stuart W. Marks, 28 Feb 2019
Occasionally it’s useful to iterate a Stream
using a conventional loop. However, the Stream
interface doesn’t implement Iterable
, and therefore streams cannot be used with the enhanced-for statement. This is a proposal to remedy that situation by introducing a new interface IterableOnce
that is a subtype of Iterable
, and then retrofitting the Stream
interface to implement it. Other JDK classes will also be retrofitted to implement IterableOnce
.
This work is covered by bug JDK-8148917.
A fairly early decision in the streams design is that a stream can only be consumed once. Attempting to consume a stream the second and subsequent times results in an IllegalStateException
. Calling the iterator()
method on a Stream
counts as consuming the stream.
The specification of Iterable
interface is not very explicit, but it is a widely held assumption is that one can call its iterator()
method multiple times, thereby obtaining multiple Iterator
instances that can be operated upon independently. The collections classes, all of which implement Iterable
, have this property.
Violating this assumption can easily lead to incorrect results. An Iterable
that can only be iterated once might decide to return the same Iterator
instance from subsequent calls. If it were used in another for-loop, that loop would terminate silently without processing any elements, because the Iterator
would already have been exhausted. This is likely to give unexpected results without warning.
A notable example in the JDK is the NIO DirectoryStream
class. (It was added in Java 7.) DirectoryStream
does implement Iterable
, and it allows only one call to the iterator()
method. However, as with streams, the second and subsequent calls result in an IllegalStateException
. It is recognized that this is unusual behavior for an Iterable
, and as such, its class specification includes this warning:
While
DirectoryStream
extendsIterable
, it is not a general-purposeIterable
as it supports only a singleIterator
; invoking theiterator
method to obtain a second or subsequent iterator throwsIllegalStateException
.
As things stood prior to Java 8, most Iterable
instances were reusable, and very few were one-shot. In Java 8, we decided against having Stream
interface implement Iterable
, as we expected streams to be quite popular, and we didn’t want to create the situation where there were potentially many Iterable
instances that were one-shot.
Despite the fact that Stream
is structurally compatible with Iterable
– it has an iterator()
method that returns an Iterator
– it is not semantically compatible. Indeed, the java.util.stream
package specification draws an analogy between Stream
and Iterator
, not Iterable
.
It is, however, a continual source of friction that it’s not possible to iterate a Stream
using an enhanced-for loop. There are of course workarounds:
Stream<T> stream = ... ;
for (T t : (Iterable<T>)stream::iterator) {
...
}
The cast is necessary because the right-hand side of the enhanced-for statement doesn’t provide a target type that allows the method reference’s type to be inferred. This workaround is fairly ugly and is entirely unobvious. Creating a local variable to hold the Iterable
suffers from similar problems.
Another alternative is to use a basic for statement:
for (Iterator<T> it = stream.iterator() ; it.hasNext() ; ) {
T t = it.next();
...
}
This works, but it’s an uncomfortable step backwards, as the enhanced-for statement has largely supplanted the basic for statement except for cases where explicit use of the Iterator
is required, for exmaple, to call the remove()
method.
Add an interface IterableOnce
as a subtype of Iterable
, with a single iterator()
method that returns Iterator
. Add a requirement that this method returns an Iterator
only the first time it’s called, with the second and subsequent calls resulting in IllegalStateException
.
Retrofit IterableOnce
onto Stream
, DirectoryStream
, and Scanner
. No further changes are necessary to Stream
and DirectoryStream
. A new iterator()
method is added to Scanner
, which already implements Iterator
. This method simply returns this
and sets a flag to throw IllegalStateException
after the first call.
The proposed changes allow streams to be used seamlessly in enhanced-for loops:
for (T t : stream) {
...
}
Adding IterableOnce
as a subtype of Iterable
is uncomfortable, as it can violate substitutability. In other words, it’s possible for errors to occur if an IterableOnce
is passed to code that expects an Iterable
and that also attempts to iterate it more than once. For example, using AssertJ, the following statement iterates the argument of assertThat(Iterable)
twice and thus will throw IllegalStateException
:
assertThat(Stream.of("a", "b", "c"))
.containsOnlyElementsOf(List.of("a", "b", "c"));
Even though such errors can still occur, it’s still useful to define IterableOnce
as a separate interface. It provides a place to document the one-shot behavior, and implementations can be denoted as implementing this interface, instead of this merely being noted in the specification text. Having a separate interface also might allow static analysis tools to be able to detect some cases where an instance of IterableOnce
is mistakenly used multiple times.
Defining one-shot semantics into a new interface leaves room to strengthen the specification of Iterable
regarding reuse. Its specification can be changed to recommend reusability more strongly, and to recommend returning new Iterator
instances each time instead of returning the same one. (This can’t be required, though, since it would potentially invalidate existing implementations.) Changes to the specification of Iterable
are covered separately under JDK-8186220.
Consider an example of listing the files in a directory and generating a list of text files and computing their total size. One way to do this using streams is as follows:
void textFileSize1(Path dir) throws IOException {
try (Stream<Path> paths = Files.list(DIR)) {
var textFiles = new ArrayList<Path>();
long totalSize = paths
.filter(Files::isRegularFile)
.filter(path -> path.toString().endsWith(".txt"))
.peek(textFiles::add)
.mapToLong(path -> {
try {
return Files.size(path);
} catch (IOException ioe) {
throw new UncheckedIOException(ioe);
}
})
.sum();
System.out.println(textFiles);
System.out.println("total size = " + totalSize);
} catch (UncheckedIOException uioe) {
throw uioe.getCause();
}
}
This is fairly cumbersome, for a couple reasons. First, two results are being accumulated: the path list and the total size. Accumulating the list isn’t terribly bad, but it does require use of the much-reviled peek()
operation in order to do the list accumulation. We do this because the downstream pipeline step maps the path to the file size, a long
, which we then reduce using sum()
.
Unfortunately, finding the file’s size requires use of an inner try-catch statement to handle and convert the IOException
into an UncheckedIOException
. The outer try-with-resources statement is necessary to ensure that the stream of paths is closed, but it’s also required to have a catch clause to convert the UncheckedIOException
back into an IOException
.
A rewritten version, using a Stream
in the enhanced-for statement, is as follows:
void textFileSize2(Path dir) throws IOException {
try (Stream<Path> paths = Files.list(DIR)) {
var textFiles = new ArrayList<Path>();
long totalSize = 0L;
Stream<Path> filtered =
paths.filter(Files::isRegularFile)
.filter(path -> path.toString().endsWith(".txt"));
for (Path path : filtered) {
totalSize += Files.size(path);
textFiles.add(path);
}
System.out.println(textFiles);
System.out.println("total size = " + totalSize);
}
}
This is essentially the same logic, but it’s much simplified by the removal of most of the exception handling. Also note that the accumulation and exception-throwing operations are separated from the stream pipeline and are placed into a trailing for-loop. Using an ordinary loop for these operations makes simple accumulation into multiple local variables quite convenient. A stream-based alternative would be to use Collectors.teeing()
(new in Java 12), or a collector with a custom state object, but this level of complexity is justified only for much more demanding problems.
(Thanks to Erik Duveblad for the inspiration for this example.)
One alternative considered was to introduce IterableOnce
as a supertype of Iterable
instead of as a subtype. This makes some sense from an object design standpoint, as any Iterable
can be iterated once and thus would appear to be an IterableOnce
. This would prevent errors such as the AssertJ example above.
However, doing so would require weakening the contract of IterableOnce
. To allow Iterable
to be a subtype, the semantics of IterableOnce
would need to change to allow iteration once and possibly more, instead of the currently proposed at-most-once semantics. Having IterableOnce
be a subtype allows the specification to make a much stronger assertion.
Moreover, inserting IterableOnce
as a supertype requires changing the language to allow acceptance of IterableOnce
on the right-hand side of the enhanced-for loop. The language and the libraries are at “arms length” from each other, and the places where the language depends on the libraries are narrow and well-controlled. Changing this relationship should only be done where there is an overwhelming advantage in doing so compared to the alternatives. That’s not the case here; the proposal can still be quite useful and successful as it stands, without changing the language.
Another alternative is to add a utility method that adapts an Iterator
to an Iterable
, e.g.,
<T> Iterable<T> asIterable(Iterator<? extends T> iterator)
This would allow one to write
for (T t : asIterable(stream.iterator())) {
...
}
While this is cleaner than the cast we have to live with today, it’s not as nice as using a stream directly. Worse, though, it introduces an API that can be used to convert any Iterator
into a poorly behaved (i.e., one-shot) Iterable
.
An alternative adapter is to add a default method to Stream
:
default Iterable<T> asIterable() { return this::iterator; }
for (T t : asIterable(stream)) {
...
}
This is slightly better at the call site, and it’s restricted to streams. But it’s still not as nice as IterableOnce
and it facilitates creation of poorly-behaved Iterable
instances.
A variation that was considered and rejected was to retrofit BaseStream
instead of Stream
. This introduces a method clash in IntStream
between these methods:
forEach(Consumer<? super T>) // from BaseStream
forEach(IntConsumer) // from IntStream
as well as analogous clashes in LongStream
and DoubleStream
.
Another variation is to retrofit IntStream
to implement IterableOnce<Integer>
, and similar for Long
and Double
. This could be done, but it would result in implicit boxing of all primitive values if this were used an enhanced-for loop. This is quite a severe penalty. It’s preferable to have the caller make boxing explicit through use of the boxed()
method. An alternative is for the caller to create an array with toArray()
and use the array form of the enhanced-for statement. This avoids boxing, but it loses laziness. At least the caller has a choice of tradeoffs here.