Package jdk.incubator.vector


Incubating Feature. Will be removed in a future release.

Classes to express vector computations that, given suitable hardware and runtime ability, are accelerated using vector hardware instructions.

Vector computations consist of a sequence of operations on vectors. A vector is a fixed sequence of scalar values; a scalar value is a single unit of value such as an int, a long, a float and so on. Operations on vectors typically perform the equivalent scalar operation on all scalar values of the participating vectors, usually generating a vector result. When run on a supporting platform, these operations can be executed in parallel by the hardware. This style of parallelism is called Single Instruction Multiple Data (SIMD) parallelism.

The abstract class Vector represents an ordered immutable sequence of values of the same element type 'e' that is one of the following primitive types - byte, short, int, long, float, or double. The type variable E corresponds to the boxed element type, specifically the class that wraps a value of e in an object (such as Integer class that wraps a value of int).

Vector declares a set of vector operations (methods) that are common to all element types (such as addition). Subclasses of Vector corresponding to a specific element type declare further operations that are specific to that element type (such as access to element values in lanes, logical operations on values of integral elements types, or transcendental operations on values of floating point element types). There are six abstract subclasses of Vector corresponding to the supported set of element types: ByteVector, ShortVector, IntVector, LongVector, FloatVector, and DoubleVector. In addition to element type, vectors are parameterized by their shape, which is their length. The supported shapes are represented by the enum VectorShape. The combination of element type and shape determines a vector species, represented by VectorSpecies. The various typed vector classes expose static constants corresponding to the supported species, and static methods on these types generally take a species as a parameter. For example, FloatVector.fromArray() creates and returns a float vector of the specified species, with elements loaded from the specified float array.

The species instance for a specific combination of element type and shape can be obtained by reading the appropriate static field, as follows:

VectorSpecies<Float> s = FloatVector.SPECIES_256;

Code that is agnostic to species can request the "preferred" species for a given element type, where the optimal size is selected for the current platform:

VectorSpecies<Float> s = FloatVector.SPECIES_PREFERRED;

Here is an example of multiplying elements of two float arrays a and b using vector computation and storing result in array c.


 static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_512;

 void vectorMultiply(float[] a, float[] b, float[] c) {
   int i = 0;
   // It is assumed array arguments are of the same size
   for (; i < (a.length & ~(SPECIES.length() - 1));
            i += SPECIES.length()) {
         FloatVector va = FloatVector.fromArray(SPECIES, a, i);
         FloatVector vb = FloatVector.fromArray(SPECIES, b, i);
         FloatVector vc = va.mul(vb)
         vc.intoArray(c, i);
   }

   for (; i < a.length; i++) {
     c[i] = a[i] * b[i];
   }
 }
 
The scalar computation after the vector computation is required to process the tail of elements, the length of which is smaller than the species length. The example above uses vectors hardcoded to a concrete shape (512-bit). Instead, we could use preferred species as shown below, to make the code dynamically adapt to optimal shape for the platform on which it runs.

 static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED;
 

Vector operations

We use the term lanes when defining operations on vectors. The number of lanes in a vector is the number of scalar elements it holds. For example, a vector of type Float and shape VectorShape.S_256_BIT has eight lanes. Vector operations can be grouped into various categories and their behavior generally specified as follows:
  • A lane-wise unary operation operates on one input vector and produce a result vector. For each lane of the input vector the lane element is operated on using the specified scalar unary operation and the element result is placed into the vector result at the same lane. The following pseudocode expresses the behavior of this operation category, where e is the element type and EVector corresponds to the primitive Vector type:
    
     EVector a = ...;
     e[] ar = new e[a.length()];
     for (int i = 0; i < a.length(); i++) {
         ar[i] = scalar_unary_op(a.get(i));
     }
     EVector r = EVector.fromArray(a.species(), ar, 0);
     
    Unless otherwise specified the input and result vectors will have the same element type and shape.
  • A lane-wise binary operation operates on two input vectors to produce a result vector. For each lane of the two input vectors, a and b say, the corresponding lane elements from a and b are operated on using the specified scalar binary operation and the element result is placed into the vector result at the same lane. The following pseudocode expresses the behavior of this operation category:
    
     EVector a = ...;
     EVector b = ...;
     e[] ar = new e[a.length()];
     for (int i = 0; i < a.length(); i++) {
         ar[i] = scalar_binary_op(a.get(i), b.get(i));
     }
     EVector r = EVector.fromArray(a.species(), ar, 0);
     
    Unless otherwise specified the two input and result vectors will have the same element type and shape.
  • Generalizing from unary and binary operations, a lane-wise n-ary operation operates on n input vectors to produce a result vector. N lane elements from each input vector are operated on using the specified n-ary scalar operation and the element result is placed into the vector result at the same lane. Unless otherwise specified the n input and result vectors will have the same element type and shape.
  • A vector reduction operation operates on all the lane elements of an input vector, and applies an accumulation function to all the lane elements to produce a scalar result. If the reduction operation is associative then the result may be accumulated by operating on the lane elements in any order using a specified associative scalar binary operation and identity value. Otherwise, the reduction operation specifies the behavior of the accumulation function. The following pseudocode expresses the behavior of this operation category if it is associative:
    
     EVector a = ...;
     e r = <identity value>;
     for (int i = 0; i < a.length(); i++) {
         r = assoc_scalar_binary_op(r, a.get(i));
     }
     
    Unless otherwise specified the scalar result type and element type will be the same.
  • A lane-wise binary test operation operates on two input vectors to produce a result mask. For each lane of the two input vectors, a and b say, the the corresponding lane elements from a and b are operated on using the specified scalar binary test operation and the boolean result is placed into the mask at the same lane. The following pseudocode expresses the behavior of this operation category:
    
     EVector a = ...;
     EVector b = ...;
     boolean[] ar = new boolean[a.length()];
     for (int i = 0; i < a.length(); i++) {
         ar[i] = scalar_binary_test_op(a.get(i), b.get(i));
     }
     VectorMask<E> r = VectorMask.fromArray(a.species(), ar, 0);
     
    Unless otherwise specified the two input vectors and result mask will have the same element type and shape.
  • The prior categories of operation can be said to operate within the vector lanes, where lane access is uniformly applied to all vectors, specifically the scalar operation is applied to elements taken from input vectors at the same lane, and if appropriate applied to the result vector at the same lane. A further category of operation is a cross-lane vector operation where lane access is defined by the arguments to the operation. Cross-lane operations generally rearrange lane elements, for example by permutation (commonly controlled by a VectorShuffle) or by blending (commonly controlled by a VectorMask). Such an operation explicitly specifies how it rearranges lane elements.

If a vector operation does not belong to one of the above categories then the operation explicitly specifies how it processes the lane elements of input vectors, and where appropriate expresses the behavior using pseudocode.

Many vector operations provide an additional mask-accepting variant. The mask controls which lanes are selected for application of the scalar operation. Masks are a key component for the support of control flow in vector computations.

For certain operation categories the mask accepting variants can be specified in generic terms. If a lane of the mask is set then the scalar operation is applied to corresponding lane elements, otherwise if a lane of a mask is not set then a default scalar operation is applied and its result is placed into the vector result at the same lane. The default operation is specified as follows:

  • For a lane-wise n-ary operation the default operation is a function that returns it's first argument, specifically the lane element of the first input vector.
  • For an associative vector reduction operation the default operation is a function that returns the identity value.
  • For lane-wise binary test operation the default operation is a function that returns false.
Otherwise, the mask accepting variant of the operation explicitly specifies how it processes the lane elements of input vectors, and where appropriate expresses the behavior using pseudocode.

For convenience, many vector operations of arity greater than one provide an additional scalar-accepting variant (such as adding a constant scalar value to all lanes of a vector). This variant accepts compatible scalar values instead of vectors for the second and subsequent input vectors, if any. Unless otherwise specified the scalar variant behaves as if each scalar value is transformed to a vector using the appropriate vector broadcast operation, and then the vector accepting vector operation is applied using the transformed values.

Performance notes

This package depends on the runtime's ability to dynamically compile vector operations into optimal vector hardware instructions. There is a default scalar implementation for each operation which is used if the operation cannot be compiled to vector instructions.

There are certain things users need to pay attention to for generating optimal vector machine code:

  • The shape of vectors used should be supported by the underlying platform. For example, code written using IntVector of Shape S_512_BIT will not be compiled to vector instructions on a platform which supports only 256 bit vectors. Instead, the default scalar implementation will be used. For this reason, it is recommended to use the preferred species as shown above to write generically sized vector computations.
  • Classes defined in this package should be treated as value-based classes. Use of identity-sensitive operations (including reference equality (==), identity hash code, or synchronization) will limit generation of optimal vector instructions.