## Lifetimes in the Foreign Function & Memory API

January 2023

The Foreign Function & Memory API (FFM in short) in centered around the idea of explicit lifetime management. That is, memory segments allocated using the FFM API are assigned a lifetime (known as SegmentScope), which determines when can the segments be accessed (e.g. when is their backing region of memory still available), and by whom (e.g. which threads can access the memory segment). In this document we show why existing approaches to explicit memory management such as malloc/free are not sufficient for the FFM API, and how reasoning about lifetimes helps programs using the FFM API avoid pesky temporal bugs (also known as use-after-free).

### Why not just malloc?

When designing an API to manage explicit allocation and deallocation of off-heap regions of memory it is very tempting to start from C's malloc/free memory management primitives. This is indeed what the first iteration of the FFM API attempted to do, as shown below:

try (MemorySegment segment = MemorySegment.allocateNative(100)) { // allocate off-heap memory here (malloc)    // use segment} // memory deallocated here (free)

This approach is certainly easy to understand - there's a MemorySegment factory that can be used to allocate new off-heap memory. MemorySegment implements AutoCloseable so that it can be used in a try-with-resources construct. When the memory segment is closed, its backing region of off-heap memory will be deallocated.

Unfortunately malloc/free happen to be the wrong primitives to manage off-heap memory resources, for reasons that will be explored further in the following sections.

#### Allocation granularity

With malloc/free each allocated region of memory gets its own lifetime which has to be managed independently. Real-world native code often coordinates the allocation of logically related regions of memory, so that they can safely refer to each other, without the fear of use-after-free errors. A notable example of this approach is an arena - a custom allocator that allocates several segments that are backed by the same underlying memory region (although other implementations are possible) and where all the allocated segments can be freed at once. We will come back to this point later in this section.

Consider the case where an array of Java string has to be converted into a single native segment (e.g. a char** in C). The resulting segment contains several pointers (one for each Java string to be converted), each referring to a null-terminated C string. We could, in principle, allocate a new native segment for each new string stored inside the array, as follows:

MemorySegment toStringArray(String[] strings) {    MemorySegment array = MemorySegment.allocateNative(ADDRESS.byteSize() * strings.length);    for (int i = 0 ; i < strings.length ; i++) {        var cString = MemorySegment.allocateNative(strings[i].length + 1)                                   .setUtf8String(string[i]);        array.setAtIndex(ADDRESS, i, cString);    }    return array;}

While this would work, the above code ends up creating N + 1 segments (one for each string to convert plus the enclosing pointer array), each of which features its own independent lifetime. This means that it would be possible, for instance, to close the segment corresponding to the array before any of the underlying string segments are closed:

MemorySegment array = toStringArray("Hello", "World", "!");array.close(); // memory leak

The lifetimes associated with the various segments created in the above example are captured in the diagram below:

As can be seen, each segment (represented as a black circle) gets its own independent lifetime (represented as a grey box). Since these lifetimes are independent, it is possible for applications to close these lifetimes in any order, even in ways that might lead to use-after-free bugs.

What might seem like a contrived example, is in reality quite the norm in languages that (unlike Java) do not provide automatic memory deallocation. Manipulating many common data structures (e.g. linked lists) can lead to problems that are not too dissimilar from the ones shown in the previous example. Summing up: while it's easy to allocate memory with malloc, it is not as easy to know when memory can be freed (or at least known when that can be done safely).

An alternative approach to malloc/free, known as region-based allocation, has been known for a very long time, and has even inspired some safe C dialects. With region-based memory deallocation, memory is allocated in regions (sometimes also known as areas or arenas). When a region is released, all the allocations obtained from that region will atomically become invalid. It is easy to see how region-based memory management solves the problem of managing complex web of pointers: in our example above, we can imagine our array segment, as well as the string segments contained in it, to belong to the same region (and thus, featuring the same lifetime). Closing the region of the array segment would automatically release the memory associated with the string segments associated with the array. In other words, with region-based memory allocation, our lifetime diagram would look like this:

That is, there is a single lifetime for all the segments created by the toStringArray method. When that lifetime ends, all the segments are deallocated, atomically.

#### Allocation safety

Another problem with malloc/free is that anybody can free a pointer, as long as they have access to it. It is often down to API documentation to specify who should be responsible for freeing a given pointer - with the underlying assumption that violating such guidelines would result in undefined behavior. Consider the following method:

xlong distance(MemorySegment point) {    try (point) {        int point_x = point.get(JAVA_INT, 0);        int point_y = point.get(JAVA_INT, 4);        return Math.hypot(point_x, point_y);    } // memory deallocate here}

The method implementation is rather straightforward: it accepts a memory segment, which presumably has been allocated by some other client, and it computes the distance of the point coordinates from the origin. Crucially, the code is surrounded by a try-with-resources block, which means that the memory segment will be closed after a distance value has been computed. Now, this might be exactly how the point segment was intended to be used in the first place but (as with malloc/free in C) there is no way for us to know that. In fact, it could be that the distance method is accidentally releasing the point segment, a mistake that will show up at a later point, in the form of a crash, or silent memory corruption.

In other words, with a design inspired by malloc/free, clients get no protection when sharing memory segments. Every segment can be closed at any time and by anyone. While there are things the FFM API can do to prevent this - e.g. opt-out from closeability (e.g. MemorySegment::asNonCloseable) - arguably, we should strive for a safer design.

#### External allocation

So far we have only discussed regions of memory that are allocated in Java, using one of the allocation primitives provided by the FFM API. But, especially when interacting with native code, this is not always the case. The standard C library defines several pairs of functions that can be used to allocate and release resources. Some examples:

• malloc/free
• map/unmap
• dlopen/dlclose
• fopen/fclose
• opendir/closedir

Of these, the dlopen/dlclose example is particularly interesting. dlopen is a function that can be used to open a shared library with given name. If the call completes normally, the function returns an opaque library handle, which can be used to lookup library symbols using dlsym. When a client no longer needs a library, it can unload it: this can be done by calling dlclose with the associated library handle.

A typical usage of dlopen is shown below:

void *libc = dlopen("libc.so.6"); // library loaded herevoid *qsort  = dlsym(libc, "qsort");void *strlen = dlsym(libc, "strlen");void *printf = dlsym(libc, "printf");...dlclose(libc); // library unloaded here

The attentive reader might have noticed that usage of dlopen/dlsym/dlclose follows a similar pattern as the one in the array example given above. That is, a library is loaded with dlopen. All the symbols obtained from the library (using dlsym), are only valid as long as they are accessed before the library is closed (using dlclose). In other words, the library handle and all the symbols obtained by it share the same lifetime. In other words, the intended lifetimes of the pointers used in the above code can be captured in the following diagram:

In this diagram, we only have a single lifetime, which is used to manage the libc library handle, as well as all the library symbols obtained from it.

It is then perhaps obvious then that, as in the array case, an API design excessively biased towards malloc/free would be insufficient to completely address this use case. One could imagine to create a stateful library lookup object, such as this (for the sake of clarity, downcall method handles have been replaced with pseudo-code):

​xclass LibraryLookup {    final MemorySegment handle;    final List<MemorySegment> symbols = new ArrayList<>();
LibraryLookup(String name) {        this.handle = <dlopen>    }        MemorySegmemt dlsym() {        MemorySegment sym = <dlsym>(handle);        symbols.add(sym);        return sym;    }        void close() {        symbols.forEach(MemorySegment::close);        <dlclose>(handle);        handle.close();    }}

Something like this might work, but would also be inefficient: instead of using a single lifetime to manage all the native segments created by the lookup object, we have to use (and track) multiple independent lifetimes. This means that the looked up segments need to be stored in a list, so that we can close them when the library lookup object is closed. More subtly, the semantics of the close method lacks atomicity guarantees, and it might be possible for other threads to observe partially closed states.

If this might seem like a contrived example, note that pairs of constructor/destructor functions are a very common pattern in C/C++ API, even beyond standard libraries. For instance, the Tensorflow C API has several symmetric function pairs, such as TF_AllocateTensor/TF_DeleteTensor. While not all such API points might be securable, some (like dlopen) might, and the FFM API should have a story for this.

The underlying problem with malloc/free is that they are pointer-centric primitives. While this approach works fine in simple cases, it fails to scale to complex webs of inter-related regions of memory, as each region has to be managed in isolation, with many risks of memory leaks (if only some regions are deallocated) or, use-after-free bugs (if some regions are deallocated too early).

Ideally, a right-sized primitive would instead be lifetime-centric. Such a primitive would, ideally, let us reason about the lifetime of a native allocation. The same lifetime could then be shared across multiple regions of memory that are logically related.

For this reason, the FFM API has an abstraction, called SegmentScope which is used to model the lifetime of one or more memory segments. There are many kinds of scopes:

• the global scope: segments associated with this scope are never deallocated;
• the automatic scope: segments associated with this scope are deallocated, automatically, by the garbage collector;
• the arena scope: an arena scope is created with an Arena and all the segments associated with this scope are invalidated when the arena is closed (using the Arena::close method).

When a memory segment is created, clients need to specify its scope, using an explicit SegmentScope parameter. For instance, the code below allocates a native segment that should be deallocated automatically by the garbage collector:

void processData() {    MemorySegment data = MemorySegment.allocateNative(100, SegmentScope.auto());    ... // use the 'data' segment}  // The GC will take care of deallocation

If a client requires timely deallocation, they can create a new Arena, and perform allocation using the arena scope, as follows:

try (Arena arena = Arena.openConfined()) {    var segment = arena.allocate(100);} // memory deallocated here

Clients can obtain the scope of a memory segment, using the MemorySegment::scope accessor. Not only this is handy to query the lifetime of a memory segment, but it can also be used to allocate new memory segments that feature the same lifetime as that of another memory segment, as we shall see in a later example.

#### Grouping segments

With the primitives provided by the FFM API we can now revisit our code for creating an array segment containing several string pointers, as follows:

MemorySegment toStringArray(String[] strings, SegmentScope scope) {    MemorySegment array = MemorySegment.allocateNative(ADDRESS.byteSize() * strings.length, scope);    for (int i = 0 ; i < strings.length ; i++) {        var cString = MemorySegment.allocateNative(strings[i].length + 1)                                   .setUtf8String(string[i], scope);        array.setAtIndex(ADDRESS, i, cString);    }    return array;}

This code is remarkably similar to the one shown previously. The only difference is that the method now accepts a SegmentScope parameter, which is used to indicate the lifetime that should be associated to the segments allocated within the method. Since now all the segments are allocated in the same lifetime, they will be either all alive or all invalid, thus reducing opportunities for memory leaks and use-after-free bugs.

#### Securing segments

If we go back to the distance function seen above, it is easy to see how now segments are safe-by-default. A memory segment only exposes its scope, and a scope cannot be used directly to perform deallocation. This means that the client that performs allocation using an Arena, owns the allocated segments. It is up to that client to decide when to close the arena. Methods that simply consume memory segments (such as distance) cannot release memory, unless the owner of said memory also shares the corresponding Arena object, as demonstrated below:

xlong distance(MemorySegment point) {    int point_x = point.get(JAVA_INT, 0);    int point_y = point.get(JAVA_INT, 4);    double result = Math.hypot(point_x, point_y);    point.scope().close(); // <--- ERROR no close() method on SegmentScope!    return result;}

#### Unsafe segments

Reasoning about lifetimes is so crucial to prevent use-after-free bugs that, it would be desirable to be able to retroactively attach lifetimes to allocations that occurred outside Java code, i.e. in a native library. For instance, the dlopen and dlclose functions naturally denote a lifetime, but one that we cannot capture using a pointer-centric API such as malloc/free.

With SegmentScope, we can now create unsafe segments from raw pointers, by giving them a size, a scope and a (optional) cleanup action. This is a crucial capability which allows us to define a safe library lookup abstraction on top of dlopen/dlclose:

class LibraryLookup {    final MemorySegment handle;
LibraryLookup(String name, SegmentScope scope) {        MemorySegment rawHandle = <dlopen>        this.handle = MemorySegment.ofAddress(rawHandle.address(), 0, scope,                                               () -> <dlclose>(rawHandle));    }        MemorySegmemt dlsym() {        MemorySegment rawSym = <dlsym>(handle);                return MemorySegment.ofAddress(rawSym.address(), 0, handle.scope());    }}

In this revised code, the LibraryLookup constructor now accepts a scope, which is used to model the lifetime of the loaded library. That scope is used to create an unsafe segment that wraps the library handle address. This means that the handle will only be accessible while the scope is alive. Moreover, all the segments returned by dlsym are also attached to the same scope: this means that when the handle scope is closed, all the symbols derived from it will also be made invalid, atomically. There's also no need to provide a close method: the library is unloaded automatically when the provided scope becomes invalid (the cleanup action associated with the library handle will take care of calling dlclose on the handle).

It is easy to see how this is a vastly superior solution to the one proposed in a previous section. Not only this version is far more efficient (there is no need to track symbols in a separate list), but it is also safer and more atomic. Note also how MemorySegment::ofAddress unifies all the allocation primitives in the FFM API: MemorySegment::allocateNative is a simple wrapper around MemorySegment::ofAddress, which calls malloc and wrap the resulting segment in the provided scope. Other primitives such as FileChannel::map or Linker::upcallStub can be defined in a similar fashion.

#### Custom allocators

A crucial use case for the FFM API is the ability to define custom allocation policies. When using custom allocators, it is very common to create a pool of native segments, which are then recycled across clients. When defining such allocators it is again critical to reason about the lifetime of an allocator, since that lifetime determines when memory can be safely reused across clients.

In this section we will show how to build a memory pool which is backed by a single native segment. Slices of the segment are recycled across multiple clients. This can be achieved by having the clients interact with a custom arena:

class SlicingPool {    final MemorySegment pool = MemorySegment.allocateNative(1024, SegmentScope.auto());    boolean isAcquired = false;
public Arena acquire(MemorySession session) {        if (isAcquired) {            throw new IllegalStateException("An allocator is already in use");        }        isAcquired = true;                return new SlicingPoolAllocator();    }        static class SlicingPoolAllocator implements Arena {        final Arena arena = Arena.openConfined();        final SegmentAllocator slicing = SegmentAllocator.slicingAllocator(pool);                   public MemorySegment allocate(long byteSize, long byteAlignment) {            MemorySegment segment = slicing.allocate(byteSize, byteAlignment);            return MemorySegment.ofAddress(segment.address(), byteSize, arena.scope());        }                    public void close() {            isAcquired = false;            arena.close();        }    }}

Clients might use a slicing pool as follows:

SlicingPool pool = new SlicingPool();...try (Arena slicingArena1 = pool.acquire()) {    MemorySegment segment1 = slicingArena1.allocate(100);} // 'segment1' becomes invalid here...try (Arena slicingArena2 = pool.acquire()) {    MemorySegment segment2 = slicingArena2.allocate(100);} // 'segment2' becomes invalid here

This is relatively straightforward. A client creates a slicing pool. Then, it obtains a new arena (slicingArena1) from the pool (this is done by calling the pool's acquire method). Crucially, since the scope of the pool segment is the automatic scope, the pool segment will be kept alive as long as the slicing arena is alive. The lifetime relationship between the slicing arena and the parent pool can be represented as follows:

After performing some allocations, the first slicing arena is closed, and all the segments created by it will become invalid. Crucially, since the arena lifetime is no longer valid, the pool memory can now safely be reused by another client (see slicingArena2). This new state is captured in the following diagram:

Note that SlicingPool also prevents clients from obtaining multiple slicing arenas at the same time. As such, the following code would fail as expected:

SlicingPool pool = new SlicingPool();...try (Arena slicingArena1 = pool.acquire()) {    MemorySegment segment1 = slicingArena1.allocate(100);    try (SegmentAllocator slicingArena2 = pool.acquire()) { // error!        ...    }}

Since each slicing arena creates a new slicing allocator from the same segment, having two slicing arenas open at the same time would lead to memory aliasing bugs. These issues are taken care of by the custom arena implementations shown above.

### Conclusions

In this document we have shown how the C malloc/free functions do not provide a solid foundation to build a truly general FFM API. These primitives are too fine-grained, as they allow for each new memory region to be managed independently, which can often lead to temporal bugs such as memory leaks and/or use-after free. Instead, a more robust approach is that to capture lifetimes as a first-class abstraction in the FFM API, namely SegmentScope. We can then model inter-related regions of memory as native memory segments that share the same scope. Crucially, by exposing scopes in the API, we allow for retroactively attaching lifetimes to native allocation occurring outside Java code. Thus, not only the FFM API provides a safe view over foreign memory that is allocated and deallocated from Java code, but it also allows developers interacting with native libraries to create safe views of memory regions created by said libraries. Finally, the ability of associating lifetimes to allocators comes in handy when defining custom allocators which need to recycle memory across multiple clients in a safe and efficient fashion.