Project Jigsaw: The Big Picture — DRAFT 1

Mark Reinhold
2011/12/20

This document is an overview of the current state of Project Jigsaw, an exploratory effort to design and implement a module system for the Java SE Platform and to apply that system to the Platform itself and to the JDK.

This document is not yet complete. Additional sections covering compilation, packaging, libraries, repositories, the module-system API, and the modularization of the JDK are in preparation.

Every feature mentioned here has been implemented in the main Jigsaw repository unless otherwise noted.

Comments to: jigsaw dash dev at openjdk dot java dot net

Design principles

The Jigsaw module system is designed to be both approachable and scalable: Approachable by all developers, yet sufficiently scalable to support the modularization of large legacy software systems in general and the JDK in particular. It aims to implement a set of general requirements; its detailed design has been further guided by the following principles:

Definitions

Modules

A module is a collection of Java types (i.e., classes and interfaces) with a name, an optional version number, and a formal description of its relationships to other modules. In addition to Java types a module can include resource files, configuration files, native libraries, and native commands. A module can be cryptographically signed so that its authenticity can be validated.

The most important type of inter-module relationship is that of dependence, in which one module declares that it depends upon some other module by specifying that module’s name and possibly also a constraint upon the range of allowable versions.

Resolution

A module dependence is not necessarily precise: Multiple modules with the same name but different version numbers might be available to satisfy it. Before a module can be used each of its dependences must be resolved to a specific module. Given an initial set of modules, resolution is the process of locating additional modules, as required, and constructing a superset of that set in which every dependence is optimally satisfied.

TODO: This compile-time preference for older versions is not yet implemented.

Phases

There are three principal phases in the lifetime of a module:

The phase in which resolution is performed determines how dependences are satisfied. At compile time the oldest available version of a module satisfying a dependence is preferred, while in later phases the newest version is preferred.

When compiling a module, the Java compiler writes class files into a module-structured classes directory. In this module-path layout there is one top-level directory for each module; the content of each module directory is structured as a normal classes directory, i.e., a tree of decomposed Java package names. In order to support interactive development, the Java launcher can run a modular application directly from a module-path directory. When doing so it performs resolution before invoking the application’s entry point, although the resulting configuration is not stored for future use.

The module system does not support general dynamic run-time resolution; i.e., it is not possible to add or remove dependences or modules after an application has started running. Sophisticated container-type programs such as application servers, IDEs, and test harnesses can achieve the effect of run-time resolution in a limited way by using the module-system API to install modules into a temporary module library and then run them from that library.

TODO: Finish implementing run-time module-path support.

TODO: Design and implement container support.

Module declarations

The Java programming language is extended to include module declarations for the purpose of defining modules, their content, and their relationships to other modules. A compilation unit containing a module declaration is, by convention, stored in a file named module-info.java and compiled into a file named module-info.class.

The simplest possible module declaration merely expresses the existence of a module with a specific name:

module foo { }

If a module has a version number then that is placed after the module name, preceded by an @ character:

module foo @ 1.0 { }

A version number starts with a digit and thereafter consists of Java identifier-part characters, periods ('.'), and dashes ('-').

Module names are qualified Java identifiers, just like Java package names:

module foo.bar { }

Module declarations cannot be annotated.

Source and class files for ordinary Java types do not specify the modules of which they are members.

Exports

An exports clause in a module declaration makes the public types in the package it names available to other modules:

module foo {
    exports foo;    // Export all public types in the foo package
}

Here the foo module exports all of the public types in its foo package, though not in any subpackages of foo. No other public types declared in the foo module are exported. There is no requirement that a module declare and export a package of the same name, though that is conventional for simple modules. It is not possible to export non-public types.

Multiple exports clauses are, of course, allowed:

module foo {
    exports foo;
    exports foo.spi;
    exports foo.util;
}

A module’s exports declarations govern the accessibility of the public types declared in the named packages. It is thus enforced at both compile time, by the Java compiler, and at run time, by the virtual machine.

TODO: Finish initial implementation.

ISSUE: Use package names that differ from the module names in these examples, to improve readability?

Dependences

The requires clause expresses the dependence of one module upon another:

module foo {
    exports foo;
}

module bar {
    requires foo;
}

Here the bar module depends upon the foo module, so at both compile time and run time the exported types declared in foo are both visible to and accessible by types declared in bar. If no foo module is available then bar cannot be compiled, and if bar is invokable then neither can it be installed or invoked.

At run time foo and bar will have distinct module class loaders, and bar’s loader will use foo’s loader to load the types exported by foo.

An exports clause in a module’s declaration only affects the availability of types declared in that module; it cannot be used to re-export types imported from other modules.

ISSUE: Do we need disjunctive dependences? Negative dependences?

Re-exports

When bar simply requires foo then the exported types in foo are available to bar but not to other modules that depend upon bar and not upon foo. Imported types can be re-exported via the public modifier of the requires clause:

module foo {
    exports foo;
}

module bar {
    requires public foo;    // Re-exports foo's exported types
}

module baz {
    requires bar;           // Can use foo's exported types
}

The public modifier makes the types imported into bar from foo available to any other module that depends directly upon bar.

Types can be re-exported through a chain of modules:

module foo {
    exports foo;
}

module bar {
    requires public foo;
}

module baz {
    requires public bar;   
}

module buz {
    requires baz;         // Can also use foo's exported types
}

In this case any other module that depends upon either bar or baz will be able to use public types exported by foo without depending upon foo itself.

Version constraints

A requires clause can include a version constraint:

module bar {
    requires foo @ 1.0;
}

This dependence of bar upon foo can be satisfied only by a foo module whose version is exactly 1.0. More-flexible constraints are useful in practice, so a constraint can be specified in terms of an exclusive or inclusive lower or upper bound:

module bar {
    requires foo @ >= 1.0;
    requires baz @ < 5.1a;
}

These dependences can be satisfied by any foo module with version 1.0 or later and any baz module with version no greater than 5.1a.

No specific semantics are imposed upon version numbers. Version numbers are compared using an algorithm similar to that of the Debian packaging system.

TODO: Support both lower and upper bounds in version constraints.

TODO: Re-examine the version-comparison algorithm.

Permits

In large software systems it is often useful to restrict the set of modules that can depend upon some other module. The permits clause expresses such a constraint:

module foo {
    exports foo;
    permits bar;
    permits baz;
}

Here the module foo can be required only by modules named bar or baz. A dependence from a module of some other name upon foo will not be resolvable at compile time, install time, or run time. If no permits clauses are present then there are no such constraints.

The bar and baz modules can re-export foo’s exported types via requires public clauses. Care must be taken, therefore, when writing permits clauses.

TODO: Controlling permits by module name alone is not sufficient, since an adversary can install a module of any given name. At the same time, for debugging the JDK itself it’s desirable to be able to install an experimental version of a JDK module into a local module library which delegates to the built-in module library of a pre-installed JDK, so simply limiting permitted modules to just those in the same module library is won’t work in general. We need to explore more alternatives here.

ISSUE: Should it be possible to restrict a permitted module from re-exporting a permitting module’s exported types?

Aliases

To support the refactoring of large modular systems, and also to allow the separation of module names corresponding to well-defined standards (e.g., java.base) from the names of modules implementing those standards (e.g., jdk.base), the provides clause declares an alternate name for a module:

module foo {
    provides bar;
}

Given this declaration, any dependence upon bar can be satisfied by foo. More than one provides clause can be present.

TODO: Implement aliases.

ISSUE: Should aliases have version numbers? The syntax currently allows them. They appear to be necessary to support refactoring by aggregation. In popular native packaging systems, however, the natural mapping of a module alias is to a virtual package, and virtual packages don’t have version numbers.

Entry points

If a module declares a class with a traditional public static void main entry point then it can be made into an application module via the class clause:

module foo {
    class foo.Main;     // Contains the main method
}

The java launcher can then be used to invoke the module:

$ java -m foo

in which case the foo.Main.main method is found and invoked in the usual fashion. Any remaining command-line arguments are passed to the main method as usual.

A module declaration can contain at most one class clause.

ISSUE: Should there be a way to suggest, if not specify, an external name for the entry point for use by external agents such as command shells?

ISSUE: Should entry points be expressed instead as services?

Optional dependences

A dependence from one module to another can be declared optional:

module bar {
    requires optional foo;
}

If no foo module is available then bar can still be installed and invoked. Code in bar that uses types from foo must be written defensively so that it operates properly when foo is not available.

A foo module must still be available when compiling bar since code in bar can depend upon types declared in foo.

Local dependences

A dependence from one module to another can be declared local:

module bar {
    requires local foo;
}

To resolve this dependence, foo must explicitly permits bar.

A local dependence allows two modules to define types in the same Java package:

module foo {
    permits bar;
    exports p;
}

module bar {
    requires local foo;
    exports p;
}

Such multi-module packages, also called split packages, are sometimes required when modularizing large legacy systems.

With a local dependence, types declared in the same package in each module can make use of public, protected, and even package-private types and members declared in the same package in the other module. The public types exported by each module are implicitly re-exported by the other. At run time this is all achieved by using the same module class loader for both modules.

More than two modules can be related by local dependences:

module foo {
    permits bar;
    exports p;
}

module bar {
    requires local foo;
    permits baz;
    exports p;
}

module baz {
    requires local bar;
    exports p;
}

In this case all three modules would, at run time, be loaded by the same module class loader.

ISSUE: Should requires local public be illegal?

ISSUE: Should each module in a set of modules related by local dependence be required explicitly to permit all the other modules? That is not the case today, but it is arguably safer.

Views

The bindings of a module are the types defined within it together with those imported from other modules via requires clauses. The view of a module is a subset of its bindings, namely the set of types that it exports, via exports and requires public clauses, and the set of modules to which those types are available, as constrained by any permits clauses.

module bar {
    requires foo;
    exports bar;
}

This bar module binds types defined locally, e.g., on the module path under the bar module directory, as well as all public types exported from the module foo. It defines a single view which exports all public types in the bar package to any other module.

In large software systems it is often useful to define multiple views of the same module. One view can, e.g., be declared for general use by any other module, while another provides access to internal interfaces intended only for use by a select set of closely-related modules.

A series of exports, requires public, and permits clauses at the top syntactic level of a module declaration defines the module’s default view. Further views of a module’s bindings can be defined using the view construct, which specifies a view name together with a bracketed list of exports and permits declarations:

module bar {
    requires foo;
    exports bar;
    view bar.internal {
        permits baz;
        exports bar.private;
    }
}

The bar module now defines two views. The default view, available by referencing the module name bar, is the same as before—it’s as if the declaration also said view bar { exports bar; }. The new view, named bar.internal, is available only to the baz module. It exports all public types in the bar.private package. It also exports all public types in the bar package because the non-default views of a module inherit the exports clauses of that module’s default view.

A non-default view never has requires clauses.

A non-default view cannot declare its version; it inherits the version, if any, of its containing module.

A non-default view does not inherit the permits clauses, if any, of its containing module.

In addition to declaring exports and entry points, a non-default view can also declare aliases and services.

A non-default view can, finally, also declare an entry point different from that of its containing module’s default view, so a single module can define multiple related entry points. For example, the declaration

module commands {
    view cat {
        class org.foo.commands.Cat;
    }
    view find {
        class org.foo.commands.Find;
    }
    view ls {
        class org.foo.commands.List;
    }
}

defines three entry points: cat, find, and ls.

HISTORICAL NOTE: Module views are not a new idea. The concept proposed here is very similar to that of structures in the module systems of Scheme 48 and Standard ML.

TODO: Finish initial implementation.

ISSUE: Should a non-default view instead not inherit the types exported by the default view of its containing module declaration? If so, should there be a way to declare explicitly that a view inherits the exported types of the default view, or perhaps some other view?

ISSUE: How do views map to native packaging systems such as RPM or Debian? Treating a module view as a virtual package would probably work but might not scale well. Another possibility is to structure the names of non-default views so that they always include the names of their containing modules, but that turns views into second-class entities.

The base module

The module system assumes the existence of a foundational module named java.base, which is the one module that must be present in every Java SE implementation. It is the module upon which all others depend, either implicitly or explicitly, somewhat akin to the implicit reference to the java.lang package by every compilation unit.

If a module does not declare an explicit dependence upon a java.base module, is not itself named java.base, and does not define an alias or view named java.base, then at compile time a synthesized dependence upon java.base is inserted into the compiled module declaration. The version constraint in this dependence is of the form >= N, where N is the version number given to the -target option of the Java compiler, if any, or else the version number of the Java SE Platform Specification implemented by the system of which the compiler is a part.

Services

A module can declare that it provides a service:

module foo {
    provides service mammals.Wombat with foo.WombatImpl;
}

Here the foo module declares that it implements the mammals.Wombat service using the class foo.WombatImpl.

To make use of a service, a module must first declare a dependence upon it:

module bar {
    requires service mammals.Wombat;
}

Code in the bar module can use an enhanced version of the ServiceLoader API to access instances of the Wombat service. The order in which instances are returned is not specified.

A module can declare a service dependence to be optional, in which case it is possible to use the module even when no provider of the service is available. As with optional module dependences, code in such modules must be written defensively so that it operates properly when no providers are present.

Services are not themselves versioned. A service is defined by a specific interface or abstract class, hence it is implicitly versioned by the version of the module that declares that type.

If a module defining a service also exports some types then those types are available only to modules that have regular module dependences upon it, either directly or indirectly. Classes that implement services are not exported implicitly, nor do they need to be exported explicitly. A class that implements a service can therefore remain both invisible and inaccessible to the clients of that service.

TODO: Finish working out the design and implementation.

ISSUE: Should permits clauses affect service lookup?

Still to come