Sketch of a multi-field feature for hyper-long types

value class Shape128 {
  @MultiField(2) long data;
  //long data#1 ⇐ allocated contiguously
}
value class Shape256 {
  @MultiField(4) long data;
  //long data#{1,2,3} ⇐ allocated contiguously
}
value class Shape512 {
  @MultiField(8) long data;
  //long data#{1,2,3,4,5,6,7} ⇐ allocated contiguously
}

Annotation is internal and privileged in jdk.internal.vm.annotation, like @ForceInline, @Stable, etc.

Only one type required per shape. For SVE, one type required for each possible size, to be dynamically selected based on SVE process configuration.

Given the above restrictions, the JVM can always position the storage for a multi-field at the very end of its containing object. This is not logically necessary (nor are the restrictions) but it seems helpful to align the layout of an object with a multi-field with the layout of an array. When the optimizer does alias analysis on array objects, all access to offsets at or beyond the first array element are classified as a single indexed set, while smaller offsets point to the array’s header fields (length, class, mark word). Likewise, an object with a multi-field would be accessed using either short offsets in the header (or perhaps ad hoc instance fields), or else starting at the first multi-field instance and after.

@RegisterAllocation(128) value class Shape128 {
  @MultiField(2) long data; …
}
@RegisterAllocation(256) value class Shape256 {
  @MultiField(4) long data; …
}
@RegisterAllocation(512) value class Shape512 {
  @MultiField(8) long data; …
}

Var handle operates on a memory-resident instance. The optimizer has to clean up the messy copy chains.

value class Shape512 {
  @MultiField(8) long data;
  private static final VarHandle VH_data;
  static {
    Field F_data = Shape512.class.getField("data");
    long OFF_data = U.objectFieldOffset(F_data);
    VH_data = new VarHandleMultiLongs(Shape512.class, OFF_data, 8);
    // addressing mode:  VH_data(this, i) ⇒ getLong(this, OFF_data + i*8)
  }
  public long data(int i) {
    // spills ‘this’ to heap or stack in order to use Unsafe.getLong
    return (long) VH_data.get(this, i);
  }
  public Shape512 dataUpdate(int i, long x) {
    Shape512 pbuf = U.makePrivateBuffer(this);
    VH_data.set(pbuf, i, x);  // ⇒ putLong(pbuf, OFF_data + i*8, x)
    return U.finishPrivateBuffer(pbuf);
  }
}

Side path: Maybe we need a new var handle access mode, update (or with), for changing one field of a value object, returning not void but rather the updated value.

public Shape512 dataUpdate(int i, long x) {
  return (Shape512) VH_data.‘update’(this, i, x);
}

Other lane types can be stored on top of the basic long type, or can be given their own “native” multi-field types.

value class Shape512 {
  @MultiField(8) long data;
  private static final VarHandle VH_data;
  private static final VarHandle VH_dataAsFloats, VH_dataAsBytes, …;
  static {
    Field F_data = Shape512.class.getField("data");
    long OFF_data = U.objectFieldOffset(F_data);
    VH_data = new VarHandleMultiLongs(Shape512.class, OFF_data, 8);
    // addressing mode:  VH_data(this, i) ⇒ getLong(this, OFF_data + i*8)
    VH_dataAsFloats = new VarHandleMultiFloats(Shape512.class, OFF_data, 16);
    VH_dataAsBytes = new VarHandleMultiBytes(Shape512.class, OFF_data, 64);
    …
  }
  …
  public float dataAsFloat(int i) {
    return (float) VH_dataAsFloats.get(this, i);
  }
}

// or else, less preferably:
value class Shape512Longs { @MultiField(8) long data; }
value class Shape512Floats { @MultiField(16) float data; }
value class Shape512Bytes { @MultiField(64) byte data; }

The more complex choice would (in principle) allow the optimizer to more readily track individual field values. This might possibly be useful, but in fact it seems less useful than for other value types. This is because, unlike regular value types, vectors are usually processed as bitwise memory images, either in memory or in the vector register file; they are not usually found in scalar registers.

Direct transcoding between vector and scalar registers uses insert/extract lane instructions, which are not much faster than to L1 memory loads and stores (3 cycles for single lane access vs. 4 and 5 cycles load-use and store-reload latencies quoted for Haswell). This suggests that data movement from vector registers directly to memory (whether stack or heap) is going to be at least as favorable as direct data movement between the vector unit and the scalar registers.

Synthetic multi-vector types are straightforward to define as well, merely by increasing the replication count by a multiple:

@RegisterAllocation(128) value class Shape128x2 {
  @MultiField(2*2) long data;
}
@RegisterAllocation(256) value class Shape256x3 {
  @MultiField(4*3) long data;
}
@RegisterAllocation(512) value class Shape512x4 {
  @MultiField(8*4) long data;
}

// Nested representation would be more complex to implement
value class Shape256x5 {
  @MultiField(5) Shape256 nestedVector;
  // requires carefully contiguous layout of the nestedVectors
}

None of the above addresses alignment; this is a difficult problem because the GC does not provide alignment services beyond a set maximum, such as 64 or 128 bits. Aligning 512-bit vectors would require special allocation paths (and reallocation/copying paths) in the GC. This seems worth thinking about, but not pushing on any time soon. Alignment constraints could be readily encoded as annotations:

value class Shape512 {
  @AlignField(512) @MultiField(8) long data;
  //long data#{1,2,3,4,5,6,7} ⇐ allocated contiguously and aligned
}