jextract, distilled

jextract, distilled

November 2019: (v. 0.4)

Maurizio Cimadamore

The idea behind the jextract tool is to mechanically generate a set of interfaces from a C header file. Together, these interfaces can be thought of as Java bindings for the native library whose headers has been processed by jextract. These interfaces are annotated with metadata, which then allow a runtime component, called the binder to synthesize implementations on the fly.

In this document we will discuss some of the motivations which led Project Panama to consider investing into a tool such as jextract, who this tool is for, as well as some of the limitations associated with it; we will then show how to overcome these limitations, first by illustrating how lower-level bindings might better cater the varied need of the Java ecosystem, and then showing how a minimal extraction API could allow clients to inject the desired degree of customization right into the extraction process.

Why jextract?

The main avenue to access native libraries today is through Java Native Interface (JNI). JNI allows developers to access native libraries, by declaring a native method in their Java source code:

class HelloJNI {
    static native int getpid();
}

When javac compiles code containing native methods, it will automatically generate (assuming the -h option is provided) native stubs for such methods - in the case of the method above we will see something like:

//HelloJNI.h
JNIEXPORT jint JNICALL Java_HelloJNI_getpid
  (JNIEnv *, jclass);

The developer would then have to implement such an header, as follows:

#include "HelloJNI.h"
#include <unistd.h>

JNIEXPORT jint JNICALL Java_HelloJNI_getpid
  (JNIEnv *env, jclass cls) {
   return getpid();
}

Then, the header and its implementation have to be compiled into a shared library, and such library must be loaded (e.g. using System::loadLibrary) to make sure that the call to the native method TestJNI::getpid is correctly linked.

All these steps are tedious enough when working on a single function, as in this example. But it's easy to see how this approach simply can't scale to real world libraries containing hundreds of different functions; for each of those, the developer would have to define a Java native method, and its corresponding native implementation - which, as in the example above, will simply delegate to the library function that the developer wanted to access in the first place.

To make matters even worse, JNI only allow to model native functions whose input parameters and return types are primitive types (such as int). If a native function accepts a pointer, or, even worse a struct, some workarounds will have to be put in place for the access to take place. A commonly used workaround is to use long to encode pointers and (direct) ByteBuffer to encode the contents of a struct to be passed by value. The latter case is particularly worrisome: a client that wants to pass a struct to a native function using JNI would have to (i) marshal the struct data into a direct, off-heap buffer, (ii) pass the buffer onto the native method, then, on the native side (iii) unmarshal the buffer data so that the desired struct is re-created and finally passed to the native function.

This is a lot of work. And a lot of mechanical work too. There's nothing really clever about these JNI wrappers, and a typical JNI-based library port will have lots of them - which almost invariably result in a maintenance nightmare (e.g. the steps above would have to be repeated every time the underlying native library needs to be updated).

The goal of jextract (and, in fairness, that of other tools, such as JavaCPP and JNArator, to name a few) is precisely to automate away the tedious task of having to write JNI bindings by hand. One thing that is worth pointing out here is that, to be able to mechanically generate bindings, all such tools (including jextract) will have to make choices about how to represent non-primitive data types; this will be a recurring theme throughout the remainder of this document.

Who is the audience?

It is important, when reasoning about automated tools such as jextract to stop and think what audience such a tool might target - as this has a direct influence on how the tool might decide to translate a native library into a set of Java bindings. In the following sections we will characterize the different kinds of audiences who might want to use jextract.

Library consumers

Library consumers are, perhaps, the first audience that comes to mind; here, we find users that are interested in using one or more native libraries from their Java application. There can be cases where the library they want to use is already available (e.g. somebody else has made the effort of creating the Java bindings for it, either through JNI or some other framework). In other, more unfortunate cases, the library they want to access has never been ported to Java, which means these users will need a way to get started, either using jextract or some other framework/tool; these users are probably not going to be very picky about what comes out of the binding generator - they care about just being able to use the library they want from Java (with as little hassle as possible).

Library producers

Library producers, on the other hand, are the those very developers that create the Java bindings used by other developers (e.g. library consumers) to access native libraries from Java. Again, we have two sub-categories here. In one camp, we have relatively basic Java bindings, which are often, at least in part, mechanically generated using a variety of tools. Examples of these bindings are JCuda and LwJGL. Other library developers chose a very different approach, by hiding the gory details of the underlying native library, and opting instead for an higher-level binding which is better integrated with the Java platform as a whole. Examples of these bindings are Tensorflow and onnxruntime.

The two broad categories sketched above also have a wildly different set of requirements. Producers of C-oriented bindings might want to offload generation of such bindings to a tool as much as possible, given that they are after a low-level, C-looking API. This API will likely not be very advanced in terms of the Java features it relies upon. That said, developers of C-oriented bindings might still want to tweak the output of binding generators, to make sure it conforms to some of the domain classes that the API might introduce (more on this later). Conversely, producers of Java-friendly API attempt to hide the complexity of the underlying bindings as much as possible (this typically involve some creative process), and return an API that is much smaller in footprint and makes much better use of the features available in the Java language. As such, it is fair to assume that what these developers are after is a simple set of bindings they can use to reach to the underlying native library - but that the bindings themselves are part of the library implementation and will never be exposed as is in the public API they are developing. Furthermore, such library producers will probably deeply care about direct-ness and efficiency of the generated bindings ---- even if that translates in lower-level bindings.

As an example of the creative process which might be involved when creating an high-level API, let's consider the libclang API; this API exposes several functions to manipulate cursors and types, respectively. Looking at these functions it is hard not to view these through our dear Object-Oriented-programming lenses: libclang's cursors and types are classes with certain instance methods, which then leads - on the Java side - to a pleasing and natural programming model. But, in such cases, it can be very hard for a tool to make such an inference.

Finally, library producers, especially producers of low-level, C-oriented libraries might need ways to control which set of abstractions are mechanically generated by a tool such as jextract. In fact, some of the abstractions the tool relies upon might overlap with some of the abstractions made available by the library. Perhaps the most striking example is the pointer abstraction; nearly every available C-oriented library has its own definition of what a pointer is, examples of this are JCuda and LwJGL. This is also true when we look at existing frameworks providing native interop, examples of this are JNR, JavaCPP and Graal native image. This poses challenges, as developers of these libraries and frameworks will have to work around the abstractions generated by tools such as jextract, by having to marshal/unmarshal domain types into mechanically generated types and back.

JDK developers

There is also a third, perhaps less obvious, kind of audience: developers hacking on the JDK itself. This use case deserves a special mention because of the many constraints that JDK developers have to face. For instance, early phases of JDK initialization cannot take full advantage of all the features available in the platform (e.g. annotation processing). As such, it is likely that JDK developers will prefer low-level and efficient native bindings, which have as few dependencies on the rest of the platform as possible.

... And jextract for All

By this time, it should be evident that there are two facets to the problem of extracting native bindings; one relatively mechanical --- going e.g. through a list of header files and enumerating all the foreign functions and data structures used by a given library; one is an inherently creative process, which, given the sea of foreign functions and data structures constituting a native library, attempts to distill some useful abstractions which lead to a pleasing and natural programming model.

While a tool is a good fit for expressing the mechanical part of the extraction process, it is hard to imagine a tool providing enough knobs (e.g. command-line flags) to control every aspect of the creative process described above. In fact, a much better choice when designing a tool such as jextract is to get out of the creative loop entirely, and to target efficient, low-level bindings --- this is possible, since we now have primitives to describe both memory access (through VarHandles) and foreign function access (through MethodHandles).

Handle-ized bindings: a first (rough) stab

VarHandles and MethodHandles represent a credible efficint and low-level target for the bindings generated by jextract. More specifically, we can define a simple (albeit naive, as we shall see later) strategy which:

While we are not seriously proposing to expose VarHandles and MethodHandles in the generated bindings as is, it is still useful, for illustrative purposes, to get a sense of how such a basic translation target might work. To do that, let's consider this simple C header:

struct Point {
   int x;
   int y;
};

double distance(struct Point p1, struct Point p2);

Given these declarations, jextract could generate something like this:

class Point_h {
    public static final MethodHandle distance = SystemABI.getInstance().downcallHandle(
        lookup("distance"),
        MethodType.fromMethodDescriptorString("(Ljdk/incubator/foreign/MemorySegment;Ljdk/incubator/foreign/MemorySegment;)D", LOADER),
        FunctionDescriptor.of(MemoryLayouts.SysV.C_DOUBLE, false,
            MemoryLayout.ofStruct(
                MemoryLayouts.SysV.C_INT.withName("x"),
                MemoryLayouts.SysV.C_INT.withName("y")
            ).withName("Point"),
            MemoryLayout.ofStruct(
                MemoryLayouts.SysV.C_INT.withName("x"),
                MemoryLayouts.SysV.C_INT.withName("y")
            ).withName("Point")
        )
    );
    public static final MemoryLayout Point$LAYOUT = MemoryLayout.ofStruct(
        MemoryLayouts.SysV.C_INT.withName("x"),
        MemoryLayouts.SysV.C_INT.withName("y")
    ).withName("Point");
    public static final VarHandle Point$x = Point$LAYOUT.varHandle(int.class, PathElement.groupElement("x"));
    public static final VarHandle Point$y = Point$LAYOUT.varHandle(int.class, PathElement.groupElement("y"));
}

As it can be seen, the bindings above entirely consist of constants: a MethodHandle for the function distance, two VarHandles for the Point struct fields (x and y, respectively), plus a Layout constant for the Point struct. Unfortunately, because the bindings are expressed as Java source code, some repetition is unavoidable (see for example that the Point struct layout is repeated for all the arguments of the distance function). That said, it's not hard to imagine a translation strategy that goes directly to bytecode and uses ldc and CONSTANT_Dynamic pool entries, to minimize the initialization cost associated with these constants.

How would a client interact with such bindings? Let's try to answer that question with an example:

import static point_h.*;

...

try (MemorySegment p1 = MemorySegment.ofNative(Point$LAYOUT) ;
     MemorySegment p2 = MemorySegment.ofNative(Point$LAYOUT)) {
    Point$x.set(p1.baseAddress(), 1);
    Point$y.set(p1.baseAddress(), 2);
    Point$x.set(p2.baseAddress(), 3);
    Point$y.set(p2.baseAddress(), 4);
    distance.invokeExact(p1, p2);
}

Indeed, this is not bad --- although, as anticipated, there are few usability issues which could start to bite when trying this approach on a larger scale:

A better approach: wrapping handles

VarHandle and MethodHandle seem to represent a natural low-level target for generating native bindings, but using them directly ca lead to usability issues --- most of which arise from the fact that clients are interacting with polymorphic signature methods directly. Luckily, there are few simple moves we could do in order to greatly improve the usability story, without negatively impacting the performance model:

To get a sense of how the client code would be improved by the above changes, let's compare the code given in the previous section with the snippet below:

import static point_h.*;

...

try (MemorySegment p1 = MemorySegment.ofNative(Point$LAYOUT) ;
     MemorySegment p2 = MemorySegment.ofNative(Point$LAYOUT)) {
    Point$x$set(p1, 1);
    Point$y$set(p1, 2);
    Point$x$set(p2, 3);
    Point$y$set(p2, 4);
    distance(p1, p2);
}

The results are very pleasing, and the code is not too different from the one that we'd write by using the vanilla jextract. Gone are the usability issues described above, and we have a very tight, basic mapping which allows Java developers to call into a native library without using JNI. While the bindings are still relatively low-level (there's no first-class support for structs and/or pointers), they are hardly less usable than their JNI counterparts - with the obvious advantages that (i) all the code we need here is 100% Java, and that these bindings provide (ii) a predictable performance model (since they are very thin wrappers around highly optimized VarHandles and MethodHandles).

Wrapper upper!

Imagine a library producer wanting to expose the Point struct discussed above as a first-class entity in some API; such a library producer would perhaps want to model Point using a Java class, so that clients of such a class could ignore details related to e.g. the management of memory segments --- and perhaps make the class AutoCloseable, so that it could be used inside a try-with-resources. It is indeed very easy to code such an abstraction on top of the low-level bindings shown in the previous section, as demonstrated in the following example:

class Point implememnts AutoCloseable {

    private final MemorySegment _segment;

    Point() {
        _segment = MemorySegment.ofNative(Point$LAYOUT);
    }

    int get$x() {
        return Point$x$get(_segment);
    }

    int get$y() {
        return Point$y$get(_segment);
    }

    int set$x(int x) {
        return Point$x$set(_segment, x);
    }

    int set$y(int y) {
        return Point$y$set(_segment, y);
    }

    public void close() {
        _segment.close();
    }

    static double distance(Point p1, Point p2) {
        return distance(p1._segment, p2._segment);
    }
}

Now that we have wrapped all the bindings for Point into a neat, Java-friendly abstraction, the code snippet shown in the previous sections can be greatly simplified, as follows:

try (Point p1 = new Point() ;
     Point p2 = new Point()) {
    p1.set(1);
    p1.set(2);
    p2.set(3);
    p2.set(4);
    distance(p1, p2);
}

By looking at this example, it is easy to see how library producers could come up with various ways to expose the generated bindings in a way that makes sense to Java developers.

Is this all we need?

Let's assume that jextract started to generate the low-level static wrappers described in the previous section; would that be a satisfactory restacking of the Panama story? To answer this question, we need to go back at jextract audience. As we pointed out previously, we have very different classes of developers wanting to use a tool like jextract in very different ways.

First, consumers of libraries for which no Java bindings exist will probably be happy with such a solution, even if low-level. They still get 100% Java bindings and no need to round-trip through JNI, which is a big usability boost.

Secondly, JDK developers will be very happy with this solution, as VarHandle and MethodHandle are low-level targets which do not depend on higher-level abstractions (e.g. annotation processing) and can therefore easily be used --- even during the JDK bootstrapping phase.

When it comes to library producers, the situation is a bit more convoluted; producers of high-level, Java-friendly bindings, will probably be very happy with an approach such as the one described - not only they will be able to get the set of knobs required to access a native library without the need to write any native code, but the fact that the bindings generated by jextract are relatively minimal also guaranteees that such developers will be free to decide how to expose the generated bindings at the higher level, with no added cost (e.g. they won't have to translate jextract's own pointer abstraction into an higher-level one) --- as demonstrated in the previous section.

On the other hand, producers of low-level, C-oriented libraries might be less happy with such a minimal set of bindings - the fact that structs and pointers are not modelled as first-class entities in the generated bindings might be perceived as a disadvantage.

More generally, any fixed choice in the way jextract generates bindings will favor some use cases while discouraging others. It is important to realize that there is no silver bullet here - a solution that works well for a certain type of audience is not guaranteed to work well for all kinds of audiences. This creates an obvious tension; while targeting lower-level bindings makes it easier to repackage those bindings into higher-level abstractions, targeting lower-level bindings will also likely translate into a stream of requests to add more features (e.g. first class struct support), perhaps via custom jextract flags, which will ultimately result in poor maintainability of the tool and poor (and fragmented) user experience.

Jextract as an API

As mentioned previously, different audiences might have different requirements when it comes to native bindings. And maintaining a tool which is configurable enough to provide escape-hatches for most of the requests developers might have, quickly turns into an endless whack-a-mole exercise. At the same time, command line flags are, sometimes, not the best way to describe the kind of transformations which a developer might want to apply to get to the desired native binding shapes; that is, some of these transformations are better described programmatically, in terms of code.

This approach has been pioneered by JavaCPP presets, which allow the extraction of a Java API from a set of headers to be specified programmatically (see e.g. this for an example).

This approach has clear advantages over trying to do everything in a tool: first and foremost, it is far easier to add extra API points than it is to add extra command-line options; but more importantly, once the moving parts of the extraction process are captured in an API, it can be easier for clients to define non-trivial extractions of native libraries which might embed domain-specific knowledge.

Parsing headers

In the case of jextract, we think that there is, in particular, one area where the definition of an extraction API might pay dividends: parsing native headers. Parsing headers is not a trivial task; while some support and tooling exists to do this (e.g. Swig, libclang), very often what's available is not very usable and needs a lot of tinkering in order to provide a satisfactory user experience. For instance, the libclang API often makes some arbitrary (and often counter-intuitive) decisions when it comes to exposing the cursors and types back to clients. Often, extraction tools such as jextract end up wrapping the libclang API into an higher level API, as a way to improve the stability of the code base. Developing this wrapper is a painful exercise, which sometimes requires changes in the guts of the libclang implementation itself (e.g. to expose features that are not available through the API, such as macro evaluation, or C++ template processing), and it is an exercise that nearly every other language providing native bindings via libclang had to go through (see Kotlin and Rust). It would indeed be a shame to keep this functionality locked away --- a first-class header parsing API would in fact allow other, richer tools (besides jextract) to be easily built atop it.

A sketch of such a parsing API is available here; it is indeed a very simple API, which is made up of two main abstractions: foreign declarations and foreign types, modeled by the Declaration and Type interfaces, respectively. To parse one or more headers, a JextractTask has to be created (the static factory takes as input the set of header files to be parsed). After a task has been created, headers can be parsed using the parse method, which returns a toplevel Declaration instance. Since both Declaration and Type support the visitor pattern, it is then easy for a client to define custom transforms on the parsed headers (e.g. generate a given snippet of Java source code for each visited foreign function).

For instance, the following code can be used to parse the OpenGL headers on Ubuntu 18.04:

JextractTask task = JextractTask.newTask(true, "/usr/include/GL/glut.h");
Declaration.Scoped toplevel = task.parse("-I/usr/include");

Once we have a declaration, it is extremely easy to define processing steps, such as filtering declarations; the following code shows a sketch of a declaration filters which only keeps declaration belong to a path containing a certain strings:

class DeclarationFilter {

    public static Declaration.Scoped filter(Declaration.Scoped decl, String... validNames) {
        Declaration[] newMembers = decl.members().stream()
                .filter(d -> filterDecl(d, validNames))
                .toArray(Declaration[]::new);
        return Declaration.toplevel(decl.pos(), newMembers);
    }

    static boolean filterDecl(Declaration d, String... validNames) {
        if (d.pos() == Position.NO_POSITION) {
            return false;
        } else {
            for (String s : validNames) {
                String pathName = d.pos().path().toString();
                if (pathName.contains(s)) {
                    return true;
                }
            }
            return false;
        }
    }
}

Mapping foreign types into Java types is easily achieved by writing a type visitor, as shown below:

public class TypeTranslator implements Type.Visitor<Class<?>, Void> {
    @Override
    public Class<?> visitPrimitive(Type.Primitive t, Void aVoid) {
        if (t.layout().isEmpty()) {
            return void.class;
        } else {
            return layoutToClass(isFloatingPoint(t), t.layout().orElseThrow(UnsupportedOperationException::new));
        }
    }

    private boolean isFloatingPoint(Type.Primitive t) {
        switch (t.kind()) {
            case Float:
            case Float128:
            case HalfFloat:
            case Double:
            case LongDouble:
                return true;
            default:
                return false;
        }
    }

    private Class<?> layoutToClass(boolean fp, MemoryLayout layout) {
        switch ((int)layout.bitSize()) {
            case 8: return byte.class;
            case 16: return short.class;
            case 32: return !fp ? int.class : float.class;
            case 64:
            case 128: return !fp ? long.class : double.class;
            default:
                throw new UnsupportedOperationException();
        }
    }

    //other visitor methods
    ...
}

These two examples show how straightforward it is to programmatically define a mechanical translation which generates Java bindings from a set of native headers.

IDE support

Having a native header parsing facility in the JDK, such as the one shown in the previous section, could not only make it easier for library producers to customize what jextract does --- in fact jextract at this point becomes just a thin client built around this parsing API; such an API might also lower the effort required e.g. by IDE vendors in order to write an interactive Java extraction plugin. A similar move was played --- rather successfully --- during the development of the jshell tool, which similarly consists of a REPL API, as well as a tool defined using such an API.

Backwards-compatible bindings

As a bonus point, the API-based approach unlocks some use cases that were previously impossible to handle, such as providing an extraction strategy that (for compatibility reasons) is completely based on JNI bindings --- this is possible since now we have separated the extraction process from the JDK it runs on - that is, while the default extraction strategy might rely on VarHandle and MethodHandle which are made available by the memory access and SystemABI APIs, an extractor is not required to generated bindings that have those dependencies.

C++ support

Adding C++ support to the extraction story is a daunting task: C++ is not covered by the same portability guarantees as C (that is, C++ is not constrained by the system ABI) --- as a result, interfacing with a C++ library is a task that only a C++ compiler can solve in full.

For this very reason, the sanest way to support C++ is to lower C++ into a set of plain, ABI-compatible, C bindings, which can then be used to perform various C++ functionalities (e.g. create a class, call an overloaded function, etc.). This approach has been described here, but most of the tools supporting C++ adopt similar strategies.

If C++ support is added to jextract, how deep should C++ be integrated in the parsing API we expose? Here we think that the C++ to C lowering should happen at the very beginning of the exraction step, perhaps using a pluggable mechanism (the JDK will, of course, provide a standard lowering plugin). Then, the resulting C API points generated by this pre-processing step can be decorated to add extra information (e.g. a function might be marked in a special way to denote the fact that it comes from a C++ operator overload) - via extensible attributes which can be attached to the abstractions returned by the parser API.

It is then up to the extraction process to decide how to surface these knobs back to the user; low-level extractors might, again, decide to make these knobs available via plain static methods; other higher-level extractors might want to glue the pieces back together and have e.g. a closer mapping between C++ classes and Java classes.

Summing up

As this document attempted to show, it has become increasingly clear that the current annotation-based bindings generated by jextract will not work well in all cases (most notably, the JDK itself), and that the extraction process is not a one-size-fits-all problem --- different audiences might have very different requirements on the kind of bindings generated by jextract.

Hence, while it still makes sense to provide a simple, out-of-the-box extracton tool as part of the JDK (pretty much as javah has always been part of the JDK along with JNI), it seems reasonable for such a tool to target efficient, low-level bindings which are simple static wrappers around the VarHandles and MethodHandles used to access foreign data and functions, respectively. Such bindings can be used as-is, or easily bundled together into higher-level abstractions by library producers striving for a tighter cohesion with the Java platform and programming model. To put it another way, jextract should strive to address the mechanical side of the extraction process, while leaving the creative side of the process to humans.

Finally, since low-level bindings will work in some cases, but not in others, it makes sense for Panama to provide a minimal parsing API which will help developers (especially producers of C-oriented bindings) to guide the extraction process and to generate the bindings they need, should what jextract generates not be suitable; at the same time, such an API might enable other players (most notably IDE vendors) to provide useful additional tooling to developers, which will further improving the user experience when interacting with native libraries from Java code --- which is really what project Panama is about.