Memory access: the missing link

Memory access: the missing link

April 2019: (v. 0.3)

Maurizio Cimadamore

The Panama Pointer API is a powerful, high-level API which allows client applications to access native memory using an API similar to that implied by C pointer types, as well as providing high-level marshalling/unmarshalling operations to convert raw bytes into high-level carrier types (for more details, refer to this document) - all while providing spatial and temporal safety (for more details about the kind of temporal safety provided by Panama pointers, please refer to this document).

In this document we will attempt to tease apart the various aspects of the Pointer API, to see if we can isolate a stable, low-level abstraction - namely bound memory addresses, upon which we can derive higher-level functionalities, such as rich foreign data/function support provided by the Panama binder.

Overview of Panama pointers

In Panama, pointers are modeled using the following interface (for the sake of simplicity we will just show the relevant portions of the full interface):

interface Pointer<X> {
   long addr() throws IllegalAccessException
   LayoutType<X> type();
   <X> Pointer<Z> cast(LayoutType<Z> newType)
   Scope scope();
   Pointer<X> offset(long elements);
   X get();
   void set(X x);
}

That is, a Panama pointer is a typeful view over a region of memory. In terms of safety, a Scope object is used to provide temporal as well as thread-confinement safety and, since a pointer implementation is also typically associated with some bounds, spatial safety is also guaranteed.

While this API is undoubtedly successful at (safely) capturing use cases originating from native interop (e.g. modeling C pointers), there are some issues, which will be discussed in more details in the following sections.

Sticky types

The type information associated with Panama pointers is sticky. That is, a pointer is created with a given LayoutType which defines once and for all the semantics associated with the pointer dereference operations (Pointer::get, Pointer::set) as well as the element-wise behavior of pointer-arithmetic operations (Pointer::offset). This is typically handy when modeling C interop - after all C pointers are typeful too - but doesn't scale well when considering other use cases.

For instance, when working with pointers modeling heterogeneous buffers, it is very frequent to see code like this:

Pointer<?> sourcePtr = ....
long targetOffset = ...
Pointer<Integer> pInt = sourcePtr.cast(NativeTypes.UINT8)
                                 .offset(targetOffset)
                                 .cast(NativeTypes.INT32);

It is easy to see how, in the above snippet, the typeful nature of Panama pointers makes memory access rather cumbersome; note that, without any help from Valhalla value types, the code above would end up allocating three pointers just to move the original pointer to the right location, and associate it with the correct carrier type. Is is therefore questionable as to whether Panama pointers are the right building block for frameworks which use off-heap access as a way to provide efficient caching (see ignite, mapDB, memcached), as some features of the Pointer API seem to work against such use cases!

Pointer, binder and system ABI

Panama pointers are a relatively high-level concept; a pointer's LayoutType defines a getter/setter pair of method handles which is used to convert raw bytes into an instance of the carrier type associated with a given pointer. This is very useful when modeling pointers to composite data: if structs are modeled as annotated interfaces, it is possible to define a pair of getter/setter method handle which turn raw struct bits into an instance of the (annotated) struct interface, and vice-versa.

At the same time, however, this introduces a tight and unfortunate coupling between the Pointer API and the binder itself. This has concrete consequences in the design of the Panama SystemABI interface (see this discussion), as the invoker classes used by the Panama foreign function support must now rely on binder-related annotated carrier types.

Pointers as addresses

The Pointer interface shown above models a concept that in C is referred to as object pointer, that is a pointer to some data, as opposed to a function type pointer, used to model pointers to executable code. The Panama API models the latter using a separate abstraction, called Callback:

interface Callback<F> {
    Pointer<?> entryPoint();
    F asFunction();
}

That is, a Panama Callback is a pointer to some executable code coupled with a functional interface carrier - which can be used to invoke the native code snippet associated with the callback object from Java. Note how to model the callback's entry point we had to use a Pointer instance - that is, in Panama, function type pointers are modeled in terms of object pointers!

This is a pragmatic compromise, but also a telling sign of the dual nature of the Panama Pointer API: on the one hand pointers are used as high-level carrier types, which can be targeted by native interop tools such as jextract; on the other hand, pointers are also used as a low-level primitive (e.g. in Callback and Library.Symbol) for modeling memory addresses.

Encapsulating memory access

In the previous sections we have seen how Panama pointers provide a way to model safe memory access. But Panama pointers also do so much more, as they capture (via their LayoutType object) both the layout associated to the region of memory they are pointing to and the Java carrier type to be used when dereferencing the pointer. All this machinery makes them an ideal candidate to model native C pointers, but, at the same time, it seems to undermine the generality of Panama pointers, and their applicability to use cases that have little to do with native interop.

It seems like what we are after is a mechanism to encapsulate (safe) memory access, and then build native, C-style pointers on top of said mechanism. Let's start by defining a way to model bound memory addresses:

interface MemoryAddress {
   MemoryAddress offset(long l);
   MemoryAddress narrow(long newSize);
   MemoryScope scope();
   void copyTo(MemoryAddress dest, long bytes);
}

A MemoryAddress is, at its core, an address with some associated bound information. Note that we do not expose how the address is implemented, which makes MemoryAddress a suitable carrier for both on-heap and off-heap addresses. There are primitives to move the address by a given offset (see MemoryAddress::offset) as well as ways to resize the memory region associated with the address (see MemoryAddress::narrow), and to perform bulk-copy of memory contents from one address to another. As before, temporal and thread-confinement safety is provided by means of a MemoryScope abstraction, shown below:

interface MemoryScope extends AutoCloseable {
    long charateristics();
    MemoryScope parent();
    //allocation
    MemoryAddress allocate(Layout layout);        
    //lifecycle management
    public MemoryScope fork(long charateristics);
    public void close();
    public void merge();
}

This is, in other words, a stripped down version of a Panama scope: it supports a single allocation method, which takes a Layout object and returns a fresh MemoryAddress pointing at the newly allocated storage. As with regular Panama scopes, we have primitives to fork a scope off an existing one, as well as terminal operations for closing the scope and merging the scope into the parent scope (for more details refer to this document). In order to capture some of the properties of the memory region covered by the scope, we have included a charateristics mask, which can be used, e.g. to fork a new sub-scope with given properties. Below are we list some interesting properties that a MemoryScope might wish to capture:

In general, some of the properties might be sticky and not allow a forked scope to override them, when doing so would compromise memory access safety (this is true e.g. for aligned access).

Dereference and layout paths

The attentive reader might have noticed that the above MemoryAddress interface provides no derefence API points. Now, in principle we could add methods like:

int getAsInt(Layout layout);
void setAsint(Layout layout, int value);
//repeat for all primitive types

Here, we would like instead to explore the possibility of relating MemoryAddress with VarHandles. That is, have a VarHandle which works on any MemoryAddress instance (given suitable a layout and carrier), not just on ByteBuffers.

But how would we obtain such a VarHandle ? In general, memory access is well-specified with respect to a layout path; that is, given a layout L describing the contents of a memory region, and a selector S which singles out one or more layout elements in the enclosing layout L, the pair (L, S) models a set of so called layout paths, each encapsulating all the necessary coordinates to perform memory access given a MemoryAddress (and therefore is sufficient to generate the required dereference VarHandle object).

Layout paths can be obtained using the the following API:

interface LayoutPath {
    Layout layout();
    LayoutPath enclosing();
    long offset();
    VarHandle asVarHandle(Class<?> carrier); 
   
    static Stream<LayoutPath> lookup(Layout layout,
                                     Predicate<? super Layout> p);
}

In other words, layout paths can be obtained via the LayoutPath::lookup method; given a layout path, the method LayoutPath::VarHandle can then be used to produce a VarHandle with coordinate types CT1, CT2 ... CTn where:

Valid carrier types for memory access VarHandle are: byte, char, short, int, long, float, double and MemoryAddress itself. Since only basic carriers are supported, it follows that only value layouts (e.g. non-composite) should be supported (at least initially). For a more detailed description of the allowed carrier types and the semantics associated with memory access, please refer to the appendix.

The set of coordinates C2, C3 ... Cn is crucial in order to provide access to elements that are nested inside sequence layouts. Let's consider the following layout:

[4: [5: [10: x32 i32(elem)]]]

Here we have a 3-dimensional matrix of integer elements; as such, to get to any of the elements annotated with the elem name, we need to provide three indices - the position into the first, second and third sequence layout, respectively. In other words, the number of additional access coordinates exposed by a dereference VarHandle depends on the number of sequence layouts a given layout path traverses.

Finally, note how these memory access VarHandle generalize the ones obtained via MethodHandle::byteArrayViewVarHandle; since the new VarHandle factory is expressed in terms of the Layout API, it can express many desired access modes at once - e.g. alignment, endianness, sign-extension - whereas the existing, ByteBuffer-based API features an hard-wired endianness parameter, which would fail to address more complex cases. Note also that, by exposing dereference via the VarHandle API, we automatically acquire all the access capabilities this API provides (e.g. atomic access).

Here's an example on how a client could convert a native array with non-conventional layout into a Java array using the API we have shown so far:

Sequence seq = Sequence.of(20,
                Group.struct(
                        Value.ofSignedInt(32).withName("elem"),
                        Padding.of(8)
                )); // [ 20 [ i32(elem) x8 ] ]

VarHandle elemhandle = Layouts.lookup(seq, l -> l.name().equals("elem"))
         .findFirst().get().asVarHandle(int.class);

try (MemoryScope s = MemoryScope.globalScope()) {
   MemoryAddress ptr = s.allocate(seq);
   int[] arr = LongStream.range(0, seq.elementsSize())
            .mapToInt(idx -> (int)elemhandle.get(ptr, idx))
            .toArray();
}

Note how everything is pleasingly static: both the Layout object and the dereference VarHandle are effectively constant which should translate in better inlining guarantees. Since we are accessing a layout element inside a sequence layout, an extra index needs to be provided in order for the memory access to be well-defined. Also, relying on layout paths allows clients to avoid painful manual offset computation, which makes for code that, while being low level, is still relatively easy to read.

Nail, hammer, address

Now that we have encapsulated all memory access using the new MemoryAddress abstraction, let's see how such an abstraction would enable us to build other abstractions such as the ones required by the Panama foreign support.

System ABI

Since MemoryAddress is capable of describing both addresses and sequence of bytes (stored at that address), we now have a low-level candidate carrier on top of which foreign interface support can be built. Let's revisit the (relevant portions of the) existing SystemABI low-level interface, to see how MemoryAddress might fit in:

public interface SystemABI {
    MethodHandle downcallHandle(MemoryAddress entry,
                                Function function, MethodType type);
    MemoryAddress upcallStub(MethodHandle target,
                             Function function, MethodType type);
}

Where Function is the usual aggregate for argument/return layouts. This is pleasing, as we can express all relevant native types (albeit at a much lower level), as shown in the following table:

C type vs. Java carrier mapping in SystemABI
C type Layout class Java carrier
primitives Value (e.g. i32) primitive
object pointer Address (e.g. u64:i32) MemoryAddress
function pointer Address (e.g. u64:(i32)v) MemoryAddress
array Sequence (e.g. [ 5 i32 ] MemoryAddress
struct Group (e.g. [ i32 x8 f32 ]) MemoryAddress

Since we can now support all C types with a handful of carrier types (Java primitives, plus MemoryAddress), the logic for generating upcalls/downcalls can be made much, much simpler - and in a way that is completely orthogonal to the binder; that is, in this formulation, native invokers no longer depend on binder-related annotations or constructs.

Another improvement is that, now that we have a proper address abstraction, we can use it to model downcall/upcall entry points, instead of resorting to more indirect abstractions such as Library.Symbol.

Note also how this closes the loop: in the previous sections we have shown how memory access can be modeled using VarHandle; here we show how SystemABI makes use of MethodHandle to model foreign function calls.

Below is an example of how to use the low-level system ABI in order to call a system library function (strlen):

static SystemABI abi = SystemABI.getInstance();

static Layout intLayout = Value.ofSignedInt(32);
static Layout byteLayout = Value.ofUnsignedInt(8);
static Layout bytePtrLayout = Address.ofLayout(64, byteLayout);

static VarHandle byteArrHandle = arrayHandle(byteLayout, byte.class);

static MemoryAddress strlen = LibraryLookup.ofDefault().lookup("strlen");
static MethodHandle strlenHandle = abi.downcallHandle(strlen,
         MethodType.methodType(int.class, MemoryAddress.class),
         Function.of(intLayout, false, bytePtrLayout));

void strlen(String value) {
   try (MemoryScope scope = MemoryScope.globalScope().fork()) {
      MemoryAddress str = toNativeString(scope, value);
      System.err.println(String.format("length of \"%s\" = ",
            value, (int)strlenHandle.invokeExact(str)));
   }
}

static MemoryAddress toNativeString(MemoryScope scope, String value) {
   Layout strLayout = Sequence.of(value.length + 1, byteLayout);
   MemoryAddress addr = scope.allocate(strLayout);
   for (int i = 0 ; i < value.length() ; i++) {
      byteArrHandle.set(addr, i, (byte)value.charAt(i));
   }
   byteArrHandle.set(addr, (long)value.length(), (byte)0);
   return addr;
}

As it can be seen, almost everything is a constant, the VarHandle used to index the string byte array, the MethodHandle to perform the strlen invocation; this should allow for good inlining guarantees. At the same time, note how, in principle, one could easily obtain a higher-level MethodHandle by combining the basic strlen handle with the handle for toNativeString, using the MethodHandle::filterArguments combinator.

Modelling jextract types

The main higher-level, non-primitive, carrier types used by the jextract tool are described in the table below:

high-level jextract carriers
C type Java carrier
structs/unions Struct<S>
object pointer Pointer<X>
function pointer Callback<F>
array Array<X>

Since MemoryAddress allows us to cleanly model addresses, it is easy to see how both Pointer and Callback can now be built on top of a MemoryAddress. By doing so we address two problems of the existing API:

As for Pointer, we can keep the existing LayoutType machinery in place, that will still work here; note how, when modeling pointers to primitive types, the getter/setter method handles will simply piggy back on the underlying VarHandles - in fact, such VarHandles can be turned into LayoutType-friendly getter/setter pairs using the MethodHandles::varHandleExactInvoker factory.

This way, marshalling/unmarshalling between low-level types (primitives and MemoryAddress) and higher-level types (Pointer, Array, Callback and Struct) is treated as an high-level adaptation facility which sits on LayoutType, and, more importantly, on the outskirts of System ABI support. Since LayoutType uses method handles to model transformations between raw bytes in memory and Java carrier types, it is easy to see how such method handles can be combined with the low-level method handles provided by System ABI in order to expose signatures which match those generated by jextract.

This looks like a winning move - lower levels (most importantly the invoker API and the JVM) only need to worry about a very restricted set of carrier types; higher levels can introduce additional language-related carriers, which can be supported using the method handle combinator API.

Higher-level scopes? No thanks!

The MemoryScope abstraction we have shown earlier is, obviously, very low-level, as it only allows allocation of a memory region of a given size. In order to support C interop, we need to provide higher-level allocation functionalities, so that clients can allocate e.g. a C struct given a class literal denoting an annotated binder interface.

While we could, in principle, have an higher-level scope abstraction, which is built on top of the low-level one, in this document we would like to explore the option of not exposing such abstraction, and instead, introduce static allocation factories on the higher-level construct, which take a MemoryScope parameter. Here are some examples:

interface Pointer<X> {
   <Z> Pointer<Z> allocate(MemoryScope scope, LayoutType<Z> type)
}
interface Array<X> {
   <Z> Array<Z> allocate(MemoryScope scope,
                         LayoutType<Z> type, long size)
}
interface Struct<X extends Struct<X>> {
   <Z extends Struct<Z>> Z allocate(MemoryScope scope,
                                    Class<Z> type)
}

This indeed looks desirable for a number of reasons: it exposes fewer abstraction to the user (there's just one scope abstraction, namely MemoryScope). It also exposes the allocation factories where the user would expect them. In terms of conciseness, we do not lose much either - what used to be:

scope.allocatePointer(NativeTypes.INT32)

Would now become, with a bit of reshuffling:

Pointer.allocate(scope, NativeTypesINT32)

This model looks simpler overall and more composable - C abstractions can be allocated on top of any MemoryScope regardless of whether on- or off-heap. At the same time, we do not seem to lose much in terms of expressiveness and/or conciseness.

Conclusions & loose ends

We have shown how, by encapsulating memory access behind the MemoryAddress abstraction, we can have a much cleaner separation between low-level memory access (typically achieved via VarHandle) and higher-level memory access (typically achieved through combinators on LayoutType). This is desirable, for a number of reasons: first, high-performance clients which desire to perform low-level, off-heap memory access can do so, without paying the penalty for abstractions mainly designed for language interop (such as LayoutType) which they don't need. Secondly, this separation allows for a much simpler formulation of the SystemABI support, which now only depends on a handful of binder-free carriers (Java primitives and MemoryAddress), thus significantly reducing the surface area of foreign function support in the JVM.

Region vs. address duality

The attentive reader might have noted how MemoryAddress is in fact playing a dual role; on the one hand, it embodies a memory address, but it also represents the region of memory pointed to by the address. This region vs. address duality might be confusing and might addressed by splitting the API for representing regions, from the API representing addresses. We could for example define a MemoryRegion interface as follows:

interface MemoryRegion {
   MemoryScope scope();
   MemoryAddress base()
   long size();
}

This will allow to represent addresses as offset and region pairs. The MemoryScope API would need to be tweaked too in order to return a region, rather than an address (from there a client can easily obtain the address using the MemoryRegion::base method). One thing to note here, is that if two addresses share the same region, then they can easily be compared by looking at the offset part. This allows to implement partial orders on addresses, which should simplify bound checks.

Safety

As it can be seen from some of the examples shown in this document, some parts of the APIs outlined here are inherently unsafe. Unsafety arises every time a new memory address is forged without using a scope - e.g. because we are calling a native function returning a pointer, or because we are reading a memory address whose contents is a pointer. It is not hard to think of malicious ways to use these capabilities in order to forge arbitrary pointers; because of that, some provisions should be made so that the Panama API (or the unsafe parts of the API) would not be accessible by default. This could be done e.g. by making the Panama API part of a module that is not resolved by default so that using this module would always require a command line option.

Variadic calls

Variadic calls pose a problem when it comes to implementing the low-level ABI support. The main issue is that we cannot, in the low-level view, infer layouts from arguments, in the same way we do in the current Panama prototype (where we have binder carriers for each possible shape of argument layout). At the same time though, it is clear that, even in the currently implemented prototype, modeling variadic calls using Java varargs only work for downcalls, and doesn't scale at all for upcalls, where the layout and size of additional arguments is not known. To support these cases it is likely that we will introduce a low-level, ABI-related valist-like construct, which can be used to perform variadic downcalls and upcalls (and also to interact with native calls accepting native valist).

Performances

Preliminary experiments have shown that the VarHandle foreign memory access can be made really fast (even in the absence of any VM support), and it's some 10-15% slower than using Unsafe directly, but with the obvious benefits of providing more safety (bound checks, liveness checks). We are currently investigating ways to better optimize the foreign memory access in the VM, so that users won't have to pay for moving away from Unsafe.

Appendix: memory access details

The tables in this section illustrate how memory access is performed by the VarHandle obtained via the LayoutPath::asVarHandle factory. For each layout, we show the allowed Java carrier types, as well as the semantics associated with the operation (typically expressed in terms of an Unsafe call). In the following tables we will use the following helper function, namely swap which can be informally described using the following pseudo-code:

T swap(Endianness targetEndianness, T value) {
   return (target != platformEndianness)
         boxedType(T).reverseBytes(value) : value;
}

We also assume that the following unsafe routine are available to convert from raw addresses to MemoryAddress and back (these routine will throw if the underlying scope is not serializable):

MemoryAddress fromRawAddress(MemoryScope scope, long addr);
long toRawAddress(MemoryAddress addr);

Finally, for simplicity, we will assume that each MemoryAddress instance has fictional base and offset fields which can be used in conjunction with unsafe operations such as Unsafe::getInt.

In general, memory access adheres to the following principles:

For the sake of readability, we have organized the tables along three axes, namely the type of memory access (read vs. write), and the layout endianness (big-endian vs. little-endian). Note also that in the semantics column of the below tables we have omitted the MemoryScope liveness checks, as well as other checks which might be required to preserve safety, and we decided instead to focus on how values are obtained/stored from/to memory.

One final note on alignment; in the following, we assume that aligned Unsafe primitives will be used for memory access; in reality, depending on the information contained in the layout path which originated the VarHandle it is possible for unaligned primitives to be used as well. In the general case, when performing memory access, the VarHandle has to check the expected alignment constraints (as derived from the layout path) against the actual alignment of the MemoryAddress instance through which memory access occurs (and, in case of mismatches, an error should be reported accordingly).

memory read, big-endian
Layout Carrier Semantics
I8 byte swap(Endianness.BE, Unsafe.getByte(addr.base, addr.limit))
I16 short swap(Endianness.BE, Unsafe.getShort(addr.base, addr.limit))
char swap(Endianness.BE, Unsafe.getChar(addr.base, addr.limit))
I32 int swap(Endianness.BE, Unsafe.getInt(addr.base, addr.limit))
I64 long swap(Endianness.BE, Unsafe.getLong(addr.base, addr.limit))
F32 float Float.intBitsToFloat(swap(Endianness.BE, Unsafe.getInt(addr.base, addr.limit)))
F64 double Double.longBitsToDouble(swap(Endianness.BE, Unsafe.getLong(addr.base, addr.limit)))
U64:v MemoryAddress Unsafe.toAddress(addr.scope(), swap(Endianness.BE, Unsafe.getLong(addr.base, addr.limit)))
memory read, little-endian
Layout Carrier Semantics
i8 byte swap(Endianness.LE, Unsafe.getByte(addr.base, addr.limit))
i16 short swap(Endianness.LE, Unsafe.getShort(addr.base, addr.limit))
char swap(Endianness.LE, Unsafe.getChar(addr.base, addr.limit))
i32 int swap(Endianness.LE, Unsafe.getInt(addr.base, addr.limit))
i64 long swap(Endianness.LE, Unsafe.getLong(addr.base, addr.limit))
f32 float Float.intBitsToFloat(swap(Endianness.LE, Unsafe.getInt(addr.base, addr.limit)))
f64 double Double.longBitsToDouble(swap(Endianness.LE, Unsafe.getLong(addr.base, addr.limit)))
u64:v MemoryAddress Unsafe.fromRawAddress(addr.scope(), swap(Endianness.LE, Unsafe.getLong(addr.base, addr.limit)))
memory write, big-endian
Layout Carrier Semantics
I8 byte Unsafe.putByte(addr.base, addr.limit, swap(Endianness.BE, value))
I16 short Unsafe.putShort(addr.base, addr.limit, swap(Endianness.BE, value))
char Unsafe.putChar(addr.base, addr.limit, swap(Endianness.BE, value))
I32 int Unsafe.putInt(addr.base, addr.limit, swap(Endianness.BE, value))
I64 long Unsafe.putLong(addr.base, addr.limit, swap(Endianness.BE, value))
F32 float Unsafe.putInt(addr.base, addr.limit, swap(Endianness.BE, Float.floatToRawIntBits(value)))
F64 double Unsafe.putLong(addr.base, addr.limit, swap(Endianness.BE, Double.doubleToRawLongBits(value)))
U64:v MemoryAddress Unsafe.putLong(addr.base, addr.limit, swap(Endianness.BE, Unsafe.toRawAddress(value)))
memory write, little-endian
Layout Carrier Semantics
i8 byte Unsafe.putByte(addr.base, addr.limit, swap(Endianness.LE, value))
i16 short Unsafe.putShort(addr.base, addr.limit, swap(Endianness.LE, value))
char Unsafe.putChar(addr.base, addr.limit, swap(Endianness.LE, value))
i32 int Unsafe.putInt(addr.base, addr.limit, swap(Endianness.LE, value))
i64 long Unsafe.putLong(addr.base, addr.limit, swap(Endianness.LE, value))
f32 float Unsafe.putInt(addr.base, addr.limit, swap(Endianness.LE, Float.floatToRawIntBits(value)))
f64 double Unsafe.putLong(addr.base, addr.limit, swap(Endianness.LE, Double.doubleToRawLongBits(value)))
u64:v MemoryAddress Unsafe.putLong(addr.base, addr.limit, swap(Endianness.LE, Unsafe.toRawAddress(value)))