Package jdk.incubator.vector
Incubating Feature. Will be removed in a future release.
Classes to express vector computations that, given suitable hardware and runtime ability, are accelerated using vector hardware instructions.
Vector computations consist of a sequence of operations on vectors. A vector is a fixed sequence of scalar values; a scalar value is a single unit of value such as an int, a long, a float and so on. Operations on vectors typically perform the equivalent scalar operation on all scalar values of the participating vectors, usually generating a vector result. When run on a supporting platform, these operations can be executed in parallel by the hardware. This style of parallelism is called Single Instruction Multiple Data (SIMD) parallelism.
The abstract class Vector
represents an ordered immutable sequence of
values of the same element type 'e' that is one of the following primitive types -
byte, short, int, long, float, or double. The type variable E corresponds to the
boxed element type, specifically the class that wraps a value of e in an object
(such as Integer class that wraps a value of int).
Vector declares a set of vector operations (methods) that are common to
all element types (such as addition). Subclasses of Vector corresponding to
a specific element type declare further operations that are specific to that element type
(such as access to element values in lanes, logical operations on values of integral
elements types, or transcendental operations on values of floating point element
types). There are six abstract subclasses of Vector
corresponding to the supported set of
element types: ByteVector
, ShortVector
,
IntVector
, LongVector
,
FloatVector
, and DoubleVector
.
In addition to element type, vectors are parameterized by their shape,
which is their length. The supported shapes are
represented by the enum VectorShape
.
The combination of element type and shape determines a vector species,
represented by VectorSpecies
. The various typed
vector classes expose static constants corresponding to the supported species,
and static methods on these types generally take a species as a parameter.
For example,
FloatVector.fromArray()
creates and returns a float vector of the specified species, with elements
loaded from the specified float array.
The species instance for a specific combination of element type and shape can be obtained by reading the appropriate static field, as follows:
VectorSpecies<Float> s = FloatVector.SPECIES_256
;
Code that is agnostic to species can request the "preferred" species for a given element type, where the optimal size is selected for the current platform:
VectorSpecies<Float> s = FloatVector.SPECIES_PREFERRED
;
Here is an example of multiplying elements of two float arrays a and b
using vector computation
and storing result in array c
.
static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_512;
void vectorMultiply(float[] a, float[] b, float[] c) {
int i = 0;
// It is assumed array arguments are of the same size
for (; i < (a.length & ~(SPECIES.length() - 1));
i += SPECIES.length()) {
FloatVector va = FloatVector.fromArray(SPECIES, a, i);
FloatVector vb = FloatVector.fromArray(SPECIES, b, i);
FloatVector vc = va.mul(vb)
vc.intoArray(c, i);
}
for (; i < a.length; i++) {
c[i] = a[i] * b[i];
}
}
The scalar computation after the vector computation is required to process the tail of
elements, the length of which is smaller than the species length.
The example above uses vectors hardcoded to a concrete shape (512-bit). Instead, we could use preferred
species as shown below, to make the code dynamically adapt to optimal shape for the platform on which it runs.
static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED;
Vector operations
We use the term lanes when defining operations on vectors. The number of lanes in a vector is the number of scalar elements it holds. For example, a vector of typeFloat
and shape VectorShape.S_256_BIT
has eight lanes.
Vector operations can be grouped into various categories and their behavior
generally specified as follows:
-
A lane-wise unary operation operates on one input vector and produce a
result vector.
For each lane of the input vector the
lane element is operated on using the specified scalar unary operation and
the element result is placed into the vector result at the same lane.
The following pseudocode expresses the behavior of this operation category,
where
e
is the element type andEVector
corresponds to the primitive Vector type:
Unless otherwise specified the input and result vectors will have the same element type and shape.EVector a = ...; e[] ar = new e[a.length()]; for (int i = 0; i < a.length(); i++) { ar[i] = scalar_unary_op(a.get(i)); } EVector r = EVector.fromArray(a.species(), ar, 0);
-
A lane-wise binary operation operates on two input
vectors to produce a result vector.
For each lane of the two input vectors,
a and b say, the corresponding lane elements from a and b are operated on
using the specified scalar binary operation and the element result is placed
into the vector result at the same lane.
The following pseudocode expresses the behavior of this operation category:
Unless otherwise specified the two input and result vectors will have the same element type and shape.EVector a = ...; EVector b = ...; e[] ar = new e[a.length()]; for (int i = 0; i < a.length(); i++) { ar[i] = scalar_binary_op(a.get(i), b.get(i)); } EVector r = EVector.fromArray(a.species(), ar, 0);
- Generalizing from unary and binary operations, a lane-wise n-ary operation operates on n input vectors to produce a result vector. N lane elements from each input vector are operated on using the specified n-ary scalar operation and the element result is placed into the vector result at the same lane. Unless otherwise specified the n input and result vectors will have the same element type and shape.
-
A vector reduction operation operates on all the lane
elements of an input vector, and applies an accumulation function to all the
lane elements to produce a scalar result.
If the reduction operation is associative then the result may be accumulated
by operating on the lane elements in any order using a specified associative
scalar binary operation and identity value. Otherwise, the reduction
operation specifies the behavior of the accumulation function.
The following pseudocode expresses the behavior of this operation category
if it is associative:
Unless otherwise specified the scalar result type and element type will be the same.EVector a = ...; e r = <identity value>; for (int i = 0; i < a.length(); i++) { r = assoc_scalar_binary_op(r, a.get(i)); }
-
A lane-wise binary test operation operates on two input vectors to produce a
result mask. For each lane of the two input vectors, a and b say, the
the corresponding lane elements from a and b are operated on using the
specified scalar binary test operation and the boolean result is placed
into the mask at the same lane.
The following pseudocode expresses the behavior of this operation category:
Unless otherwise specified the two input vectors and result mask will have the same element type and shape.EVector a = ...; EVector b = ...; boolean[] ar = new boolean[a.length()]; for (int i = 0; i < a.length(); i++) { ar[i] = scalar_binary_test_op(a.get(i), b.get(i)); } VectorMask<E> r = VectorMask.fromArray(a.species(), ar, 0);
-
The prior categories of operation can be said to operate within the vector
lanes, where lane access is uniformly applied to all vectors, specifically
the scalar operation is applied to elements taken from input vectors at the
same lane, and if appropriate applied to the result vector at the same lane.
A further category of operation is a cross-lane vector operation where lane
access is defined by the arguments to the operation. Cross-lane operations
generally rearrange lane elements, for example by permutation (commonly
controlled by a
VectorShuffle
) or by blending (commonly controlled by aVectorMask
). Such an operation explicitly specifies how it rearranges lane elements.
If a vector operation does not belong to one of the above categories then the operation explicitly specifies how it processes the lane elements of input vectors, and where appropriate expresses the behavior using pseudocode.
Many vector operations provide an additional mask
-accepting
variant.
The mask controls which lanes are selected for application of the scalar
operation. Masks are a key component for the support of control flow in
vector computations.
For certain operation categories the mask accepting variants can be specified in generic terms. If a lane of the mask is set then the scalar operation is applied to corresponding lane elements, otherwise if a lane of a mask is not set then a default scalar operation is applied and its result is placed into the vector result at the same lane. The default operation is specified as follows:
- For a lane-wise n-ary operation the default operation is a function that returns it's first argument, specifically the lane element of the first input vector.
- For an associative vector reduction operation the default operation is a function that returns the identity value.
- For lane-wise binary test operation the default operation is a function that returns false.
For convenience, many vector operations of arity greater than one provide
an additional scalar-accepting variant (such as adding a constant scalar
value to all lanes of a vector). This variant accepts compatible
scalar values instead of vectors for the second and subsequent input vectors,
if any.
Unless otherwise specified the scalar variant behaves as if each scalar value
is transformed to a vector using the appropriate vector broadcast
operation, and
then the vector accepting vector operation is applied using the transformed
values.
Performance notes
This package depends on the runtime's ability to dynamically compile vector operations into optimal vector hardware instructions. There is a default scalar implementation for each operation which is used if the operation cannot be compiled to vector instructions.There are certain things users need to pay attention to for generating optimal vector machine code:
- The shape of vectors used should be supported by the underlying platform. For example,
code written using
IntVector
of Shape S_512_BIT will not be compiled to vector instructions on a platform which supports only 256 bit vectors. Instead, the default scalar implementation will be used. For this reason, it is recommended to use the preferred species as shown above to write generically sized vector computations. - Classes defined in this package should be treated as
value-based classes.
Use of identity-sensitive operations (including reference equality
(
==
), identity hash code, or synchronization) will limit generation of optimal vector instructions.
-
Interface Summary Interface Description VectorSpecies<E> Interface supporting vectors of same element type,E
andshape
. -
Class Summary Class Description ByteVector A specializedVector
representing an ordered immutable sequence ofbyte
values.DoubleVector A specializedVector
representing an ordered immutable sequence ofdouble
values.FloatVector A specializedVector
representing an ordered immutable sequence offloat
values.IntVector A specializedVector
representing an ordered immutable sequence ofint
values.LongVector A specializedVector
representing an ordered immutable sequence oflong
values.ShortVector A specializedVector
representing an ordered immutable sequence ofshort
values.Vector<E> AVector
is designed for use in computations that can be transformed by a runtime compiler, on supported hardware, to Single Instruction Multiple Data (SIMD) computations leveraging vector hardware registers and vector hardware instructions.VectorMask<E> AVectorMask
represents an ordered immutable sequence ofboolean
values.VectorShuffle<E> AVectorShuffle
represents an ordered immutable sequence ofint
values. -
Enum Summary Enum Description VectorShape