/* * Copyright (c) 2017, 2019, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. * * This code is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License version 2 only, as * published by the Free Software Foundation. Oracle designates this * particular file as subject to the "Classpath" exception as provided * by Oracle in the LICENSE file that accompanied this code. * * This code is distributed in the hope that it will be useful, but WITHOUT * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License * version 2 for more details (a copy is included in the LICENSE file that * accompanied this code). * * You should have received a copy of the GNU General Public License version * 2 along with this work; if not, write to the Free Software Foundation, * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. * * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA * or visit www.oracle.com if you need additional information or have * questions. */ package jdk.incubator.vector; import jdk.internal.misc.Unsafe; import jdk.internal.vm.annotation.ForceInline; import jdk.internal.vm.annotation.Stable; import java.lang.reflect.Array; import java.nio.ByteBuffer; import java.nio.ByteOrder; import java.util.Arrays; import java.util.List; import java.util.Objects; import java.util.function.Function; import java.util.function.IntUnaryOperator; import java.util.function.UnaryOperator; import jdk.incubator.vector.*; /** * A * * * sequence of a fixed number of lanes, * all of some fixed * {@linkplain Vector#elementType() element type} * such as {@code byte}, {@code long}, or {@code float}. * Each lane contains an independent value of the element type. * Operations on vectors are typically * lane-wise, * distributing some scalar operator (such as * {@linkplain Vector#add(Vector) addition}) * across the lanes of the participating vectors, * * usually generating a vector result whose lanes contain the various * scalar results. When run on a supporting platform, lane-wise * operations can be executed in parallel by the hardware. This style * of parallelism is called Single Instruction Multiple Data * (SIMD) parallelism. * *
In the SIMD style of programming, most of the operations within * a vector lane are unconditional, but the effect of conditional * execution may be achieved using * masked operations * such as {@link Vector#blend(Vector,VectorMask) blend()}, * under the control of an associated {@link VectorMask}. * Data motion other than strictly lane-wise flow is achieved using * cross-lane * operations, often under the control of an associated * {@link VectorShuffle}. * Lane data and/or whole vectors can be reformatted using various * kinds of lane-wise * {@linkplain Vector#convert(VectorOperators.Conversion,int) conversions}, * and byte-wise reformatting * {@linkplain Vector#reinterpretShape(VectorSpecies,int) reinterpretations}, * often under the control of a reflective {@link VectorSpecies} * object which selects an alternative vector format different * from that of the input vector. * *
{@code Vector Public subtypes of {@code Vector}
* correspond to specific
* element types. These declare further operations that are specific
* to that element type, including unboxed access to lane values,
* bitwise operations on values of integral element types, or
* transcendental operations on values of floating point element
* types.
*
* This package contains a public subtype of {@link Vector}
* corresponding to each supported element type:
* {@link ByteVector}, {@link ShortVector},
* {@link IntVector}, {@link LongVector},
* {@link FloatVector}, and {@link DoubleVector}.
*
*
*
* The {@linkplain #elementType element type} of a vector,
* sometimes called {@code ETYPE}, is one of the primitive types
* {@code byte}, {@code short}, {@code int}, {@code long}, {@code
* float}, or {@code double}.
*
* The type {@code E} in {@code Vector The {@linkplain #length() length} of a vector is the number of
* lanes it contains.
*
* This number is also called {@code VLENGTH} when the context makes
* clear which vector it belongs to. Each vector has its own fixed
* {@code VLENGTH} but different instances of vectors may have
* different lengths. {@code VLENGTH} is an important number, because
* it estimates the SIMD performance gain of a single vector operation
* as compared to scalar execution of the {@code VLENGTH} scalar
* operators which underly the vector operation.
*
* Some Java platforms give special support to only one shape,
* while others support several. A typical platform is not likely
* to support all the shapes described by this API. For this reason,
* most vector operations work on a single input shape and
* produce the same shape on output. Operations which change
* shape are clearly documented as such shape-changing,
* while the majority of operations are shape-invariant,
* to avoid disadvantaging platforms which support only one shape.
* There are queries to discover, for the current Java platform,
* the {@linkplain VectorShape#preferredShape() preferred shape}
* for general SIMD computation, or the
* {@linkplain VectorShape#largestShapeFor(Class) largest
* available shape} for any given lane type. To be portable,
* code using this API should start by querying a supported
* shape, and then process all data with shape-invariant
* operations, within the selected shape.
*
* Each unique combination of element type and vector shape
* determines a unique
* {@linkplain #species() vector species}.
* A vector species is represented by a fixed instance of
* {@link VectorSpecies VectorSpecies<E>}
* shared in common by all vectors of the same shape and
* {@code ETYPE}.
*
* Unless otherwise documented, lane-wise vector operations
* require that all vector inputs have exactly the same {@code VSHAPE}
* and {@code VLENGTH}, which is to say that they must have exactly
* the same species. This allows corresponding lanes to be paired
* unambiguously. The {@link #check(VectorSpecies) check()} method
* provides an easy way to perform this check explicitly.
*
* Vector shape, {@code VLENGTH}, and {@code ETYPE} are all
* mutually constrained, so that {@code VLENGTH} times the
* {@linkplain #elementSize() bit-size of each lane}
* must always match the bit-size of the vector's shape.
*
* Thus, {@linkplain #reinterpretShape(VectorSpecies,int) reinterpreting} a
* vector via a cast may double its length if and only if it either
* halves the lane size, or else changes the shape. Likewise,
* reinterpreting a vector may double the lane size if and only if it
* either halves the length, or else changes the shape of the vector.
*
* As an example of static constants defined by the typed vector classes,
* constant {@link FloatVector#SPECIES_256 FloatVector.SPECIES_256}
* is the unique species whose lanes are {@code float}s and whose
* vector size is 256 bits. Again, the constant
* {@link FloatVector#SPECIES_PREFERRED} is the species which
* best supports processing of {@code float} vector lanes on
* the currently running Java platform.
*
* As another example, a broadcast scalar value of
* {@code (double)0.5} can be obtained by calling
* {@link DoubleVector#broadcast(VectorSpecies,double)
* DoubleVector.broadcast(dsp, 0.5)}, but the argument {@code dsp} is
* required to select the species (and hence the shape and length) of
* the resulting vector.
*
* Most operations on vectors are lane-wise, which means the operation
* is composed of an underlying scalar operator, which is repeated for
* each distinct lane of the input vector. If there are additional
* vector arguments of the same type, their lanes are aligned with the
* lanes of the first input vector. (They must all have a common
* {@code VLENGTH}.) The output resulting from a lane-wise operation
* will have a {@code VLENGTH} which is equal to the {@code VLENGTH}
* of the input(s) to the operation. Thus, lane-wise operations are
* length-invariant, in their basic definitions.
*
* The principle of length-invariance is combined with another
* basic principle, that lane-wise operations are always
* shape-invariant, meaning that the inputs and the output of
* a lane-wise operation will have a common {@code VSHAPE}. When the
* principles conflict, because a logical result (with an invariant
* {@code VLENGTH}), does not fit into the invariant {@code VSHAPE},
* the resulting expansions and contractions are handled explicitly
* with
* special conventions.
*
* Vector operations can be grouped into various categories and
* their behavior can be generally specified in terms of underlying
* scalar operators. In the examples below, {@code ETYPE} is the
* element type of the operation (such as {@code int.class}) and
* {@code EVector} is the corresponding concrete vector type (such as
* {@code IntVector.class}).
*
* Unlike other lane-wise operations, conversions can change lane
* type, from the input (domain) type to the output (range) type. The
* lane size may change along with the type. In order to manage the
* size changes, lane-wise conversion methods can product partial
* results, under the control of a {@code part} parameter, which
* is explained elsewhere.
*
* The following pseudocode illustrates the behavior of this
* operation category in the specific example of a conversion from
* {@code int} to {@code double}, retaining lower lanes to maintain shape-invariance:
*
*
* If a vector operation does not belong to one of the above categories then
* the method documentation explicitly specifies how it processes the lanes of
* input vectors, and where appropriate illustrates the behavior using
* pseudocode.
*
*
* Most lane-wise binary and comparison operations offer convenience
* overloadings which accept a scalar as the second input, in place of a
* vector. In this case the scalar value is promoted to a vector by
* {@linkplain Vector#broadcast(long) broadcasting it}
* into the same lane structure as the first input.
*
* For example, to multiply all lanes of a {@code double} vector by
* a scalar value {@code 1.1}, the expression {@code v.mul(1.1)} is
* easier to work with than an equivalent expression with an explicit
* broadcast operation, such as {@code v.mul(v.broadcast(1.1))}
* or {@code v.mul(DoubleVector.broadcast(v.species(), 1.1))}.
*
* Unless otherwise specified the scalar variant always behaves as if
* each scalar value is first transformed to a vector of the same
* species as the first vector input, using the appropriate
* {@code broadcast} operation.
*
* Many vector operations accept an optional
* {@link VectorMask mask} argument, selecting which lanes participate
* in the underlying scalar operator. If present, the mask argument
* appears at the end of the method argument list.
*
* Each lane of the mask argument is a boolean which is either in
* the set or unset state. For lanes where the mask
* argument is unset, the underlying scalar operator is suppressed.
* In this way, masks allow vector operations to emulate scalar
* control flow operations, without losing SIMD parallelism, except
* where the mask lane is unset.
*
* An operation suppressed by a mask will never cause an exception
* or side effect of any sort, even if the underlying scalar operator
* can potentially do so. For example, an unset lane that seems to
* access an out of bounds array element or divide an integral value
* by zero will simply be ignored. Values in suppressed lanes never
* participate or appear in the result of the overall operation.
*
* Result lanes corresponding to a suppressed operation will be
* filled with a default value which depends on the specific
* operation, as follows:
*
* (Note: Memory effects such as race conditions never occur for
* suppressed lanes. That is, implementations will not secretly
* re-write the existing value for unset lanes. In the Java Memory
* Model, reassigning a memory variable to its current value is not a
* no-op; it may quietly undo a racing store from another
* thread.)
*
* As an example, a masked binary operation on two input vectors
* {@code a} and {@code b} suppresses the binary operation for lanes
* where the mask is unset, and retains the original lane value from
* {@code a}. The following pseudocode illustrates this behavior:
* Temporal terminology works well for vectors because they
* (usually) represent small fixed-sized segments in a long sequence
* of workload elements, where the workload is conceptually traversed
* in time order from beginning to end. (This is a mental model: it
* does not exclude multicore divide-and-conquer techniques.) Thus,
* when a scalar loop is transformed into a vector loop, adjacent
* scalar items (one earlier, one later) in the workload end up as
* adjacent lanes in a single vector (again, one earlier, one later).
* At a vector boundary, the last lane item in the earlier vector is
* adjacent to (and just before) the first lane item in the
* immediately following vector.
*
* Vectors are also sometimes thought of in spatial terms, where
* the first lane is placed at an edge of some virtual paper, and
* subsequent lanes are presented in order next to it. When using
* spatial terms, all directions are equally plausible: Some vector
* notations present lanes from left to right, and others from right
* to left; still others present from top to bottom or vice versa.
* Using the language of time (before, after, first, last) instead of
* space (left, right, high, low) is often more likely to avoid
* misunderstandings.
*
* As second reason to prefer temporal to spatial language about
* vector lanes is the fact that the terms "left", "right", "high" and
* "low" are widely used to describe the relations between bits in
* scalar values. The leftmost or highest bit in a given type is
* likely to be a sign bit, while the rightmost or lowest bit is
* likely to be the arithmetically least significant, and so on.
* Applying these terms to vector lanes risks confusion, however,
* because it is relatively rare to find algorithms where, given two
* adjacent vector lanes, one lane is somehow more arithmetically
* significant than its neighbor, and even in those cases, there is no
* general way to know which neighbor is the the more significant.
*
* Putting the terms together, we view the information structure
* of a vector as a temporal sequence of lanes ("first", "next",
* "earlier", "later", "last", etc.) of bit-strings which are
* internally ordered spatially (either "low" to "high" or "right" to
* "left"). The primitive values in the lanes are decoded from these
* bit-strings, in the usual way. Most vector operations, like most
* Java scalar operators, treat primitive values as atomic values, but
* some operations reveal the internal bit-string structure.
*
* When a vector is loaded from or stored into memory, the order
* of vector lanes is always consistent with the inherent
* ordering of the memory container. This is true whether or not
* individual lane elements are subject to "byte swapping" due to
* details of byte order. Thus, while the scalar lane elements of
* vector might be "byte swapped", the lanes themselves are never
* reordered, except by an explicit method call that performs
* cross-lane reordering.
*
* When vector lane values are stored to Java variables of the
* same type, byte swapping is performed if and only if the
* implementation of the vector hardware requires such swapping. It
* is therefore unconditional and invisible.
*
* As a useful fiction, this API presents a consistent illusion
* that vector lane bytes are composed into larger lane scalars in
* little endian order. This means that storing a vector
* into a Java byte array will reveal the successive bytes of the
* vector lane values in little-endian order on all platforms,
* regardless of native memory order, and also regardless of byte
* order (if any) within vector unit registers.
*
* This hypothetical little-endian ordering also appears when a
* {@linkplain #reinterpretShape(VectorSpecies,int) reinterpretation cast} is
* applied in such a way that lane boundaries are discarded and
* redrawn differently, while maintaining vector bits unchanged. In
* such an operation, two adjacent lanes will contribute bytes to a
* single new lane (or vice versa), and the sequential order of the
* two lanes will determine the arithmetic order of the bytes in the
* single lane. In this case, the little-endian convention provides
* portable results, so that on all platforms earlier lanes tend to
* contribute lower (rightward) bits, and later lanes tend to
* contribute higher (leftward) bits. The {@linkplain #reinterpretAsBytes()
* reinterpretation casts} between {@link ByteVector}s and the
* other non-byte vectors use this convention to clarify their
* portable semantics.
*
* The little-endian fiction for relating lane order to per-lane
* byte order is slightly preferable to an equivalent big-endian
* fiction, because some related formulas are much simpler,
* specifically those which renumber bytes after lane structure
* changes. The earliest byte is invariantly earliest across all lane
* structure changes, but only if little-endian convention are used.
* The root cause of this is that bytes in scalars are numbered from
* the least significant (rightmost) to the most significant
* (leftmost), and almost never vice-versa. If we habitually numbered
* sign bits as zero (as on some computers) then this API would reach
* for big-endian fictions to create unified addressing of vector
* bytes.
*
* Byte order for lane storage is chosen such that the stored
* vector values can be read or written as single primitive values,
* within the array or buffer that holds the vector, producing the
* same values as the lane-wise values within the vector.
* This fact is independent of the convenient fiction that lane values
* inside of vectors are stored in little-endian order.
*
* For example,
* {@link FloatVector#fromArray(VectorSpecies, float[], int)
* FloatVector.fromArray(fsp,fa,i)}
* creates and returns a float vector of some particular species {@code fsp},
* with elements loaded from some float array {@code fa}.
* The first lane is loaded from {@code fa[i]} and the last lane
* is initialized loaded from {@code fa[i+VL-1]}, where {@code VL}
* is the length of the vector as derived from the species {@code fsp}.
* Then, {@link FloatVector#add(Vector As a basic principle, lane-wise operations are
* length-invariant. Length-invariance simply means that
* if {@code VLENGTH} lanes go into an operation, the same number
* of lanes come out, with nothing discarded and no extra padding.
*
* As a second principle, sometimes in tension with the first,
* lane-wise operations are also shape-invariant, unless
* clearly marked otherwise.
*
* Shape-invariance means that {@code VSHAPE} is constant for typical
* computations. Keeping the same shape throughout a computation
* helps ensure that scarce vector resources are efficiently used.
* (On some hardware platforms shape changes could cause unwanted
* effects like extra data movement instructions, round trips through
* memory, or pipeline bubbles.)
*
* Tension between these principles arises when an operation
* produces a logical result that is too large for the
* required output {@code VSHAPE}. In other cases, when a logical
* result is smaller than the capacity of the output {@code VSHAPE},
* the positioning of the logical result is open to question, since
* the physical output vector must contain a mix of logical result and
* padding.
*
* In the first case, of a too-large logical result being crammed
* into a too-small output {@code VSHAPE}, we say that data has
* expanded. In other words, an expansion operation
* has caused the output shape to overflow. Symmetrically, in the
* second case of a small logical result fitting into a roomy output
* {@code VSHAPE}, the data has contracted, and the
* contraction operation has required the output shape to pad
* itself with extra zero lanes.
*
* In both cases we can speak of a parameter {@code M} which
* measures the expansion ratio or contraction ratio
* between the logical result size (in bits) and the bit-size of the
* actual output shape. When vector shapes are changed, and lane
* sizes are not, {@code M} is just the integral ratio of the output
* shape to the logical result. (With the possible exception of
* the {@linkplain VectorShape#S_Max_BIT maximum shape}, all vector
* sizes are powers of two, and so the ratio {@code M} is always
* an integer. In the hypothetical case of a non-integral ratio,
* the value {@code M} would be rounded up to the next integer,
* and then the same general considerations would apply.)
*
* If the logical result is larger than the physical output shape,
* such a shape change must inevitably drop result lanes (all but
* {@code 1/M} of the logical result). If the logical size is smaller
* than the output, the shape change must introduce zero-filled lanes
* of padding (all but {@code 1/M} of the physical output). The first
* case, with dropped lanes, is an expansion, while the second, with
* padding lanes added, is a contraction.
*
* Similarly, consider a lane-wise conversion operation which
* leaves the shape invariant but changes the lane size by a ratio of
* {@code M}. If the logical result is larger than the output (or
* input), this conversion must reduce the {@code VLENGTH} lanes of the
* output by {@code M}, dropping all but {@code 1/M} of the logical
* result lanes. As before, the dropping of lanes is the hallmark of
* an expansion. A lane-wise operation which contracts lane size by a
* ratio of {@code M} must increase the {@code VLENGTH} by the same
* factor {@code M}, filling the extra lanes with a zero padding
* value; because padding must be added this is a contraction.
*
* It is also possible (though somewhat confusing) to change both
* lane size and container size in one operation which performs both
* lane conversion and reshaping. If this is done, the same
* rules apply, but the logical result size is the product of the
* input size times any expansion or contraction ratio from the lane
* change size.
*
* For completeness, we can also speak of in-place
* operations for the frequent case when resizing does not occur.
* With an in-place operation, the data is simply copied from logical
* output to its physical container with no truncation or padding.
* The ratio parameter {@code M} in this case is unity.
*
* Note that the classification of contraction vs. expansion
* depends on the relative sizes of the logical result and the
* physical output container. The size of the input container may be
* larger or smaller than either of the other two values, without
* changing the classification. For example, a conversion from a
* 128-bit shape to a 256-bit shape will be a contraction in many
* cases, but it would be an expansion if it were combined with a
* conversion from {@code byte} to {@code long}, since in that case
* the logical result would be 1024 bits in size. This example also
* illustrates that a logical result does not need to correspond to
* any particular platform-supported vector shape.
*
* Although lane-wise masked operations can be viewed as producing
* partial operations, they are not classified (in this API) as
* expansions or contractions. A masked load from an array surely
* produces a partial vector, but there is no meaningful "logical
* output vector" that this partial result was contracted from.
*
* Some care is required with these terms, because it is the
* data, not the container size, that is expanding
* or contracting, relative to the size of its output container.
* Thus, resizing a 128-bit input into 512-bit vector has the effect
* of a contraction. Though the 128 bits of payload hasn't
* changed in size, we can say it "looks smaller" in its new 512-bit
* home, and this will capture the practical details of the situation.
*
* If a vector method might expand its data, it accepts an extra
* {@code int} parameter called {@code part}, or the "part number".
* The part number must be in the range {@code [0..M-1]}, where
* {@code M} is the expansion ratio. The part number selects one
* of {@code M} contiguous disjoint equally-sized blocks of lanes
* from the logical result and fills the physical output vector
* with this block of lanes.
*
* Specifically, the lanes selected from the logical result of an
* expansion are numbered in the range {@code [R..R+L-1]}, where
* {@code L} is the {@code VLENGTH} of the physical output vector, and
* the origin of the block, {@code R}, is {@code part*L}.
*
* A similar convention applies to any vector method that might
* contract its data. Such a method also accepts an extra part number
* parameter (again called {@code part}) which steers the contracted
* data lanes one of {@code M} contiguous disjoint equally-sized
* blocks of lanes in the physical output vector. The remaining lanes
* are filled with zero, or as specified by the method.
*
* Specifically, the data is steered into the lanes numbered in the
* range {@code [R..R+L-1}, where {@code L} is the {@code VLENGTH} of
* the logical result vector, and the origin of the block, {@code R},
* is again a multiple of {@code L} selected by the part number,
* specifically {@code |part|*L}.
*
* In the case of a contraction, the part number must be in the
* non-positive range {@code [-M+1..0]}. This convention is adopted
* because some methods can perform both expansions and contractions,
* in a data-dependent manner, and the extra sign on the part number
* serves as an error check. If vector method takes a part number and
* is invoked to perform an in-place operation (neither contracting
* nor expanding), the {@code part} parameter must be exactly zero.
* Part numbers outside the allowed ranges will elicit an indexing
* exception. Note that in all cases a zero part number is valid, and
* corresponds to an operation which preserves as many lanes as
* possible from the beginning of the logical result, and places them
* into the beginning of the physical output container. This is
* often a desirable default, so a part number of zero is safe
* in all cases and useful in most cases.
*
* The various resizing operations of this API contract or expand
* their data as follows:
* Some vector operations are not lane-wise, but rather move data
* across lane boundaries. Such operations are typically rare in SIMD
* code, though they are sometimes necessary for specific algorithms
* that manipulate data formats at a low level, and/or require SIMD
* data to move in complex local patterns. (Local movement in a small
* window of a large array of data is relatively unusual, although
* some highly patterned algorithms call for it.) In this API such
* methods are always clearly recognizable, so that simpler lane-wise
* reasoning can be confidently applied to the rest of the code.
*
* In some cases, vector lane boundaries are discarded and
* "redrawn from scratch", so that data in a given input lane might
* appear (in several parts) distributed through several output lanes,
* or (conversely) data from several input lanes might be consolidated
* into a single output lane. The fundamental method which can redraw
* lanes boundaries is
* {@link #reinterpretShape(VectorSpecies,int) reinterpretShape()}.
* Built on top of this method, certain convenience methods such
* as {@link #reinterpretAsBytes() reinterpretAsBytes()} or
* {@link #reinterpretAsInts() reinterpretAsInts()} will
* (potentially) redraw lane boundaries, while retaining the
* same overall vector shape.
*
* Operations which produce or consume a scalar result can be
* viewed as very simple cross-lane operations. Methods in the
* {@link #reduceLanesToLong(VectorOperators.Associative)
* reduceLanes()} family fold together all lanes (or mask-selected
* lanes) of a method and return a single result. As an inverse, the
* {@link #broadcast(long) broadcast} family of methods can be thought
* of as crossing lanes in the other direction, from a scalar to all
* lanes of the output vector. Single-lane access methods such as
* {@code lane(I)} or {@code withLane(I,E)} might also be regarded as
* very simple cross-lane operations.
*
* Likewise, a method which moves a non-byte vector to or from a
* byte array could be viewed as a cross-lane operation, because the
* vector lanes must be distributed into separate bytes, or (in the
* other direction) consolidated from array bytes.
*
* @implNote
*
* This API will also work correctly even on Java platforms which
* do not include specialized hardware support for SIMD computations.
* The Vector API is not likely to provide any special performance
* benefit on such platforms.
*
* Once created, a vector is never mutated, not even if only
* {@linkplain IntVector#withLane(int,int) a single lane is changed}.
* A new vector is always created to hold a new configuration
* of lane values. The unavailability of mutative methods is a
* necessary consequence of suppressing the object identity of
* all vectors, as value-based classes.
*
* With {@code Vector},
*
*
* identity-sensitive operations such as {@code ==} may yield
* unpredictable results, or reduced performance. Oddly enough,
* {@link Vector#equals(Object) v.equals(w)} is likely to be faster
* than {@code v==w}, since {@code equals} is not an identity
* sensitive method. It is also reasonable to use, on vectors, the
* {@code toString} and {@code hashCode} methods of {@code Object}.
*
* Also, these objects can be stored in locals and parameters and as
* {@code static final} constants, but storing them in other Java
* fields or in array elements, while semantically valid, may incur
* performance penalties.
*
*
* @param FIXME: Write about the unary operators here.
*
* @apiNote
* Subtypes improve on this method by sharpening
* the method return type.
*
* @return the result of applying the operation lane-wise
to the input vector
* @throws UnsupportedOperationException if this vector does
* not support the requested operation
* @see #lanewise(VectorOperators.Unary,VectorMask)
* @see #lanewise(VectorOperators.Binary,Vector)
* @see #lanewise(VectorOperators.Ternary,Vector,Vector)
*/
public abstract Vector FIXME: Write about the binary operators here.
* Shift counts are reduced (as unsigned values) modulo
* {@code ESIZE}, so the shift is always in the range
* {@code [0..ESIZE-1]}.
* It is as if the shift value were subjected to a
* bitwise logical {@code AND} operator ({@code &})
* with the mask value {@code ESIZE-1}.
*
* @apiNote
* Subtypes improve on this method by sharpening
* the method return type.
*
* @param v the input vector
* @return the result of applying the operation lane-wise
* to the two input vectors
* @throws UnsupportedOperationException if this vector does
* not support the requested operation
* @see #lanewise(VectorOperators.Binary,Vector,VectorMask)
* @see #lanewise(VectorOperators.Unary)
* @see #lanewise(VectorOperators.Ternary,Vector, Vector)
*/
public abstract Vector
* Subtypes improve on this method by sharpening
* the method return type and
* the type of the scalar parameter {@code e}.
*
* @param e the input scalar
* @return the result of applying the operation lane-wise
* to the input vector and the scalar
* @throws UnsupportedOperationException if this vector does
* not support the requested operation
* @throws IllegalArgumentException
* if the given {@code long} value cannot
* be represented by the right operand type
* of the vector operation
* @see #broadcast(long)
* @see #lanewise(VectorOperators.Binary,long,VectorMask)
*/
public abstract Vector
* Subtypes improve on this method by sharpening
* the method return type and
* the type of the scalar parameter {@code e}.
*
* @param e the input scalar
* @param m the mask controlling lane selection
* @return the result of applying the operation lane-wise
* to the input vector and the scalar
* @throws UnsupportedOperationException if this vector does
* not support the requested operation
* @throws IllegalArgumentException
* if the given {@code long} value cannot
* be represented by the right operand type
* of the vector operation
* @see #broadcast(long)
* @see #lanewise(VectorOperators.Binary,Vector,VectorMask)
*/
public abstract Vector FIXME: Write about the ternary operators here.
* For now it's only {@code FMA} and {@code BITWISE_BLEND}.
*
* @apiNote
* Subtypes improve on this method by sharpening
* the method return type.
*
* @param v1 the second input vector
* @param v2 the third input vector
* @return the result of applying the operation lane-wise
* to the three input vectors
* @throws UnsupportedOperationException if this vector does
* not support the requested operation
* @see #lanewise(VectorOperators.Unary)
* @see #lanewise(VectorOperators.Binary,Vector)
* @see #lanewise(VectorOperators.Ternary,Vector,Vector,VectorMask)
*/
public abstract Vector
* As a full-service named operation, this method
* comes in masked and unmasked overloadings, and
* (in subclasses) also comes in scalar-broadcast
* overloadings (both masked and unmasked).
*
* @param v a second input vector
* @return the result of adding this vector to the second input vector
* @see #add(Vector,VectorMask)
* @see IntVector#add(int)
* @see VectorOperators#ADD
* @see #lanewise(VectorOperators.Binary,Vector)
* @see IntVector#lanewise(VectorOperators.Binary,int)
*/
public abstract Vector
* As a full-service named operation, this method
* comes in masked and unmasked overloadings, and
* (in subclasses) also comes in scalar-broadcast
* overloadings (both masked and unmasked).
*
* @param v the second input vector
* @param m the mask controlling lane selection
* @return the result of adding this vector to the given vector
* @see #add(Vector)
* @see IntVector#add(int,VectorMask)
* @see VectorOperators#ADD
* @see #lanewise(VectorOperators.Binary,Vector,VectorMask)
* @see IntVector#lanewise(VectorOperators.Binary,int,VectorMask)
*/
public abstract Vector
* As a full-service named operation, this method
* comes in masked and unmasked overloadings, and
* (in subclasses) also comes in scalar-broadcast
* overloadings (both masked and unmasked).
*
* @param v a second input vector
* @return the result of subtracting the second input vector from this vector
* @see #sub(Vector,VectorMask)
* @see IntVector#sub(int)
* @see VectorOperators#SUB
* @see #lanewise(VectorOperators.Binary,Vector)
* @see IntVector#lanewise(VectorOperators.Binary,int)
*/
public abstract Vector
* As a full-service named operation, this method
* comes in masked and unmasked overloadings, and
* (in subclasses) also comes in scalar-broadcast
* overloadings (both masked and unmasked).
*
* @param v the second input vector
* @param m the mask controlling lane selection
* @return the result of subtracting the second input vector from this vector
* @see #sub(Vector)
* @see IntVector#sub(int,VectorMask)
* @see VectorOperators#SUB
* @see #lanewise(VectorOperators.Binary,Vector,VectorMask)
* @see IntVector#lanewise(VectorOperators.Binary,int,VectorMask)
*/
public abstract Vector
* As a full-service named operation, this method
* comes in masked and unmasked overloadings, and
* (in subclasses) also comes in scalar-broadcast
* overloadings (both masked and unmasked).
*
* @param v a second input vector
* @return the result of multiplying this vector by the second input vector
* @see #mul(Vector,VectorMask)
* @see IntVector#mul(int)
* @see VectorOperators#MUL
* @see #lanewise(VectorOperators.Binary,Vector)
* @see IntVector#lanewise(VectorOperators.Binary,int)
*/
public abstract Vector
* As a full-service named operation, this method
* comes in masked and unmasked overloadings, and
* (in subclasses) also comes in scalar-broadcast
* overloadings (both masked and unmasked).
*
* @param v the second input vector
* @param m the mask controlling lane selection
* @return the result of multiplying this vector by the given vector
* @see #mul(Vector)
* @see IntVector#mul(int,VectorMask)
* @see VectorOperators#MUL
* @see #lanewise(VectorOperators.Binary,Vector,VectorMask)
* @see IntVector#lanewise(VectorOperators.Binary,int,VectorMask)
*/
public abstract Vector
* If the underlying scalar operator does not support
* division by zero, but is presented with a zero divisor,
* an {@code ArithmeticException} will be thrown.
*
*
* As a full-service named operation, this method
* comes in masked and unmasked overloadings, and
* (in subclasses) also comes in scalar-broadcast
* overloadings (both masked and unmasked).
*
* @param v a second input vector
* @return the result of dividing this vector by the second input vector
* @throws ArithmeticException if any lane
* in {@code v} is zero
* and {@code ETYPE} is not {@code float} or {@code double}.
* @see #div(Vector,VectorMask)
* @see DoubleVector#div(double)
* @see VectorOperators#DIV
* @see #lanewise(VectorOperators.Binary,Vector)
* @see IntVector#lanewise(VectorOperators.Binary,int)
*/
public abstract Vector
* If the underlying scalar operator does not support
* division by zero, but is presented with a zero divisor,
* an {@code ArithmeticException} will be thrown.
*
*
* As a full-service named operation, this method
* comes in masked and unmasked overloadings, and
* (in subclasses) also comes in scalar-broadcast
* overloadings (both masked and unmasked).
*
* @param v a second input vector
* @param m the mask controlling lane selection
* @return the result of dividing this vector by the second input vector
* @throws ArithmeticException if any lane selected by {@code m}
* in {@code v} is zero
* and {@code ETYPE} is not {@code float} or {@code double}.
* @see #div(Vector)
* @see DoubleVector#div(double,VectorMask)
* @see VectorOperators#DIV
* @see #lanewise(VectorOperators.Binary,Vector,VectorMask)
* @see DoubleVector#lanewise(VectorOperators.Binary,double,VectorMask)
*/
public abstract Vector
* This is not a full-service named operation like
* {@link #add(Vector) add()}. A masked version of
* version of this operation is not directly available
* but may be obtained via the masked version of
* {@code lanewise}. Subclasses define an additional
* scalar-broadcast overloading of this method.
*
* @param v a second input vector
* @return the lanewise maximum of this vector and the second input vector
* @see IntVector#max(int)
* @see VectorOperators#MAX
* @see #lanewise(VectorOperators.Binary,Vector)
* @see #lanewise(VectorOperators.Binary,Vector,VectorMask)
*/
public abstract Vector
* If no elements are selected, an operation-specific identity
* value is returned.
*
* The result is the same as
* {@code this.compare(op, this.broadcast(s))}.
* That is, the scalar may be regarded as broadcast to
* a vector of the same species, and then compared
* against the original vector, using the selected
* comparison operation.
*
* @apiNote
* The {@code long} value {@code e} must be accurately
* representable by the {@code ETYPE} of this vector's species,
* so that {@code e==(long)(ETYPE)e}. This rule is enforced
* by the implicit call to {@code broadcast()}.
*
* Subtypes improve on this method by sharpening
* the type of the scalar parameter {@code e}.
*
* @param e the input scalar
* @return the mask result of testing lane-wise if this vector
* compares to the input, according to the selected
* comparison operator
* @throws IllegalArgumentException
* if the given {@code long} value cannot
* be represented by the vector's {@code ETYPE}
* @see #broadcast(long)
* @see #compare(VectorOperators.Comparison,Vector)
*/
public abstract VectorMask
* Subtypes improve on this method by sharpening
* the type of the scalar parameter {@code e}.
*
* @param e the input scalar
* @param m the mask controlling lane selection
* @return the mask result of testing lane-wise if this vector
* compares to the input, according to the selected
* comparison operator,
* and only in the lanes selected by the mask
* @throws IllegalArgumentException
* if the given {@code long} value cannot
* be represented by the vector's {@code ETYPE}
* @see #broadcast(long)
* @see #compare(VectorOperators.Comparison,Vector)
*/
public abstract VectorMask
* Subtypes improve on this method by sharpening
* the type of the scalar parameter {@code e}.
*
* @param e the input scalar, containing the replacement lane value
* @param m the mask controlling lane selection of the scalar
* @return the result of blending the lane elements of this vector with
* the scalar value
*/
public abstract Vector The scale must not be so large, and the element size must
* not be so small, that that there would be an overflow when
* computing any of the {@code N*scale} or {@code VLENGTH*scale},
* when the the result is represented using the vector
* lane type {@code ETYPE}.
*
*
* The following pseudocode illustrates this behavior:
* This is a cross-lane operation that shifts lane elements
* to the front, from the current vector and the second vector.
* Both vectors can be viewed as a combined "background" of length
* {@code 2*VLENGTH}, from which a slice is extracted.
*
* The lane numbered {@code N} in the output vector is copied
* from lane {@code origin+N} of the input vector, if that
* lane exists, else from lane {@code origin+N-VLENGTH} of
* the second vector (which is guaranteed to exist).
*
* The {@code origin} value must be in the inclusive range
* {@code 0..VLENGTH}. As limiting cases, {@code v.slice(0,w)}
* and {@code v.slice(VLENGTH,w)} return {@code v} and {@code w},
* respectively.
*
* @apiNote
*
* This method may be regarded as the inverse of
* {@link #unslice(int,Vector,int) unslice()},
* in that the sliced value could be unsliced back into its
* original position in the two input vectors, without
* disturbing unrelated elements, as in the following
* pseudocode:
* This method also supports a variety of cross-lane shifts and
* rotates as follows:
* This is a cross-lane operation that shifts lane elements
* to the front, from the current vector and the second vector.
* Both vectors can be viewed as a combined "background" of length
* {@code 2*VLENGTH}, from which a slice is extracted.
*
* The returned result is equal to the expression
* {@code broadcast(0).blend(slice(origin,v1),m)}.
*
* @apiNote
* This method may be regarded as the inverse of
* {@code #unslice(int,Vector,int,VectorMask) unslice()},
* in that the sliced value could be unsliced back into its
* original position in the two input vectors, without
* disturbing unrelated elements, as in the following
* pseudocode:
* This is a cross-lane operation that permutes the lane
* elements of the current vector toward the back and inserts them
* into a logical pair of background vectors. Only one of the
* pair will be returned, however. The background is formed by
* duplicating the second input vector. (However, the output will
* never contain two duplicates from the same input lane.)
*
* The lane numbered {@code N} in the input vector is copied into
* lane {@code origin+N} of the first background vector, if that
* lane exists, else into lane {@code origin+N-VLENGTH} of the
* second background vector (which is guaranteed to exist).
*
* The first or second background vector, updated with the
* inserted slice, is returned. The {@code part} number of zero
* or one selects the first or second updated background vector.
*
* The {@code origin} value must be in the inclusive range
* {@code 0..VLENGTH}. As limiting cases, {@code v.unslice(0,w,0)}
* and {@code v.unslice(VLENGTH,w,1)} both return {@code v}, while
* {@code v.unslice(0,w,1)} and {@code v.unslice(VLENGTH,w,0)}
* both return {@code w}.
*
* @apiNote
* This method supports a variety of cross-lane insertion
* operations as follows:
* This is a cross-lane operation that permutes the lane
* elements of the current vector forward and inserts its lanes
* (when selected by the mask) into a logical pair of background
* vectors. As with the
* {@code #unslice(int,Vector,int) unmasked version} of this method,
* only one of the pair will be returned, as selected by the
* {@code part} number.
*
* For each lane {@code N} selected by the mask, the lane value
* is copied into
* lane {@code origin+N} of the first background vector, if that
* lane exists, else into lane {@code origin+N-VLENGTH} of the
* second background vector (which is guaranteed to exist).
* Background lanes retain their original values if the
* corresponding input lanes {@code N} are unset in the mask.
*
* The first or second background vector, updated with set lanes
* of the inserted slice, is returned. The {@code part} number of
* zero or one selects the first or second updated background
* vector.
*
* @param origin the first output lane to receive the slice
* @param w the background vector that (as two copies) will receive
* the inserted slice, if they are set in {@code m}
* @param part the part number of the result (either zero or one)
* @param m the mask controlling lane selection from the current vector
* @return either the first or second part of a pair of
* background vectors {@code w}, updated by inserting
* selected lanes of this vector at the indicated origin
* @throws ArrayIndexOutOfBoundsException if {@code origin}
* is negative or greater than {@code VLENGTH},
* or if {@code part} is not zero or one
* @see #unslice(int,Vector,int)
* @see #slice(int,Vector)
*/
public abstract Vector This method returns the value of this pseudocode:
* This method returns the value of this pseudocode:
* This method returns the value of this expression:
* {@code EVector.broadcast(this.species(), (ETYPE)e)}, where
* {@code EVector} is the vector class specific to this
* vector's element type {@code ETYPE}.
*
*
* The {@code long} value {@code e} must be accurately
* representable by the {@code ETYPE} of this vector's species,
* so that {@code e==(long)(ETYPE)e}.
*
* If this rule is violated the problem is not detected
* statically, but an {@code IllegalArgumentException} is thrown
* at run-time. Thus, this method somewhat weakens the static
* type checking of immediate constants and other scalars, but it
* makes up for this by improving the expressiveness of the
* generic API. Note that an {@code e} value in the range
* {@code [-128..127]} is always acceptable, since every
* {@code ETYPE} will accept every {@code byte} value.
*
* @apiNote
* Subtypes improve on this method by sharpening
* the method return type and
* and the type of the scalar parameter {@code e}.
*
* @param e the value to broadcast
* @return a vector where all lane elements are set to
* the primitive value {@code e}
* @throws IllegalArgumentException
* if the given {@code long} value cannot
* be represented by the vector's {@code ETYPE}
* @see VectorSpecies#broadcast(long)
* @see IntVector#broadcast(int)
* @see FloatVector#broadcast(float)
*/
public abstract Vector
* This method returns the value of this expression:
* {@code species().maskAll(bit)}.
*
* @param bit the given mask bit to be replicated
* @return a mask where each lane is set or unset according to
* the given bit
* @see VectorSpecies#maskAll(boolean)
*/
public abstract VectorMask
* This method behaves as if it returns the result of creating a shuffle
* given an array of the vector elements, as follows:
* Depending on the selected species, this operation may
* either expand or contract
* its logical result, in which case a non-zero {@code part}
* number can further control the selection and steering of the
* logical result into the physical output vector.
*
*
* The underlying bits of this vector are copied to the resulting
* vector without modification, but those bits, before copying,
* may be truncated if the this vector's bit-size is greater than
* desired vector's bit size, or filled with zero bits if this
* vector's bit-size is less than desired vector's bit-size.
*
* If the old and new species have different shape, this is a
* shape-changing operation, and may have special
* implementation costs.
*
* The method behaves as if this vector is stored into a byte
* buffer or array using little-endian byte ordering and then the
* desired vector is loaded from the same byte buffer or array
* using the same ordering.
*
* The following pseudocode illustrates the behavior:
* Each specific conversion is described by a conversion
* constant in the class {@link VectorOperators}. Each conversion
* operator has a specified {@linkplain
* VectorOperators.Conversion#domainType() domain type} and
* {@linkplain VectorOperators.Conversion#rangeType() range type}.
* The domain type must exactly match the lane type of the input
* vector, while the range type determines the lane type of the
* output vectors.
*
* A conversion operator may be classified as (respectively)
* in-place, expanding, or contracting, depending on whether the
* bit-size of its domain type is (respectively) equal, less than,
* or greater than the bit-size of its range type.
*
* Independently, conversion operations can also be classified
* as reinterpreting or value-transforming, depending on whether
* the conversion copies representation bits unchanged, or changes
* the representation bits in order to retain (part or all of)
* the logical value of the input value.
*
* If a reinterpreting conversion contracts, it will truncate the
* upper bits of the input. If it expands, it will pad upper bits
* of the output with zero bits, when there are no corresponding
* input bits.
*
* As another variation of behavior, an in-place conversion
* can incorporate an expanding or contracting conversion, while
* retaining the same lane size between input and output.
*
* In the case of a contraction, the lane value is first converted
* to the smaller value, and then zero-padded (as if by a subsequent
* reinterpretation) before storing into the output lane.
*
* In the case of an expansion, the lane value is first truncated
* to the smaller value (as if by an initial reinterpretation),
* and then converted before storing into the output lane.
*
* An expanding conversion such as {@code S2I} ({@code short}
* value to {@code int}) takes a scalar value and represents it
* in a larger format (always with some information redundancy).
*
* A contracting conversion such as {@code D2F} ({@code double}
* value to {@code float}) takes a scalar value and represents it
* in a smaller format (always with some information loss).
*
* Some in-place conversions may also include information loss,
* such as {@code L2D} ({@code long} value to {@code double})
* or {@code F2I} ({@code float} value to {@code int}).
*
* Reinterpreting in-place conversions are not lossy, unless the
* bitwise value is somehow not legal in the output type.
* Converting the bit-pattern of a {@code NaN} may discard bits
* from the {@code NaN}'s significand.
*
* This classification is important, because, unless otherwise
* documented, conversion operations never change vector
* shape, regardless of how they may change lane sizes.
*
* Therefore an expanding conversion cannot store all of its
* results in its output vector, because the output vector has fewer
* lanes of larger size, in order to have the same overall bit-size as
* its input.
*
* Likewise, a contracting conversion must store its relatively small
* results into a subset of the lanes of the output vector, defaulting
* the unused lanes to zero.
*
* As an example, a conversion from {@code byte} to {@code long}
* ({@code M=8}) will discard 87.5% of the input values in order to
* convert the remaining 12.5% into the roomy {@code long} lanes of
* the output vector. The inverse conversion will convert back all of
* the large results, but will waste 87.5% of the lanes in the output
* vector.
*
* In-place conversions ({@code M=1}) deliver all of
* their results in one output vector, without wasting lanes.
*
* To manage the details of these
* expansions and contractions,
* a non-zero {@code part} parameter selects partial results from
* expansions, or steers the results of contractions into
* corresponding locations, as follows:
*
* The {@code VLENGTH/M} output lanes represent a partial
* slice of the whole logical result of the conversion, filling
* the entire physical output vector.
*
* A group of such output vectors, with logical result parts
* steered to disjoint blocks, can be reassembled using the
* {@linkplain VectorOperators#OR bitwise or} or (for floating
* point) the {@link VectorOperators#FIRST_NONZERO FIRST_NONZERO}
* operator.
*
* This method is a restricted version of the more general
* but less frequently used shape-changing method
* {@link #convertShape(VectorOperators.Conversion,VectorSpecies,int)
* convertShape()}.
* The result of this method is the same as the expression
* {@code this.convertShape(conv, rsp, this.broadcast(part))},
* where the output species is
* {@code rsp=this.species().withLanes(FTYPE.class)}.
*
* @param conv the desired scalar conversion to apply lane-wise
* @param part the part number
* of the result, or zero if neither expanding nor contracting
* @param If the old and new species have the same shape, the behavior
* is exactly the same as the simpler, shape-invariant method
* {@link #convert(VectorOperators.Conversion,int) convert()}.
* In such cases, the simpler method {@code convert()} should be
* used, to make code easier to reason about.
* Otherwise, this is a shape-changing operation, and may
* have special implementation costs.
*
* As a combined effect of shape changes and lane size changes,
* the input and output species may have different lane counts, causing
* expansion or contraction.
* In this case a non-zero {@code part} parameter selects
* partial results from an expanded logical result, or steers
* the results of a contracted logical result into a physical
* output vector of the required output species.
*
* The following pseudocode illustrates the behavior of this
* method for in-place, expanding, and contracting conversions.
* (This pseudocode also applies to the shape-invariant method,
* but with shape restrictions on the output species.)
* Note that only one of the three code paths is relevant to any
* particular combination of conversion operator and shapes.
*
* If the old and new species have different shape, this is a
* shape-changing operation, and may have special
* implementation costs.
*
* @param rsp the desired output species
* @param part the part number
* of the result, or zero if neither expanding nor contracting
* @param
* Bytes are extracted from primitive lane elements according
* to {@linkplain ByteOrder#LITTLE_ENDIAN little endian} ordering.
* The lanes are stored according to their
* memory ordering.
*
* This method behaves as if it calls
* {@link #intoByteBuffer(ByteBuffer,int,ByteOrder,VectorMask)
* intoByteBuffer()} as follows:
*
* Bytes are extracted from primitive lane elements according
* to {@linkplain ByteOrder#LITTLE_ENDIAN little endian} ordering.
* The lanes are stored according to their
* memory ordering.
*
* This method behaves as if it calls
* {@link #intoByteBuffer(ByteBuffer,int,ByteOrder,VectorMask)
* intoByteBuffer()} as follows:
*
* Bytes are extracted from primitive lane elements according
* to the specified byte ordering.
* The lanes are stored according to their
* memory ordering.
*
* This method behaves as if it calls
* {@link #intoByteBuffer(ByteBuffer,int,ByteOrder,VectorMask)
* intoByteBuffer()} as follows:
*
* Bytes are extracted from primitive lane elements according
* to the specified byte ordering.
* The lanes are stored according to their
* memory ordering.
*
* This method behaves as if it calls
* {@link #intoByteBuffer(ByteBuffer,int,ByteOrder,VectorMask)
* intoByteBuffer()} as follows:
*
* Bytes are extracted from primitive lane elements according
* to the specified byte ordering.
* The lanes are stored according to their
* memory ordering.
*
* The following pseudocode illustrates the behavior, where
* {@code EBuffer} is the primitive buffer type, {@code ETYPE} is the
* primitive element type, and {@code EVector} is the primitive
* vector type for this vector:
* The comparison of lane values is produced as if by a call to
* {@link Arrays#equals(int[],int[]) Arrays.equals()},
* as appropriate to the arrays returned by
* {@link #toArray toArray()} on both vectors.
*
* @return whether this vector is identical to some other object
*/
@Override
public abstract boolean equals(Object obj);
/**
* Returns a hash code value for the vector.
* based on the lane values and the vector species.
*
* @return a hash code value for this vector
*/
@Override
public abstract int hashCode();
/**
* Returns all the lane values of this vector, boxed in a list.
* The list elements are boxed and presented in lane order.
* The list is immutable, as if returned from
* {@link List#of(Object[]) List.<E>of}.
*
* @apiNote
* Because this operation jumps out of the domain of vectors into
* the domain of Java collections, it is likely to have large
* overheads, as compared with other vector operations.
* Often {@link #toArray Vector.toArray} is preferable,
* since it produces a packed array of unboxed lane values.
*
* @return a list containing the lane values of this vector
*/
// FIXME: Does this pull its weight? Probably not.
// Perhaps it's fine to rely on the {@code toArray()} methods.
public abstract ListShapes and species
*
* The information capacity of a vector is determined by its
* {@linkplain #shape() vector shape}, also called its
* {@code VSHAPE}. Each possible {@code VSHAPE} is represented by
* a member of the {@link VectorShape} enumeration, and represents
* an implementation format shared in common by all vectors of a
* of that shape. Thus, the {@linkplain #bitSize() size in bits} of
* of a vector is determined by appealing to its vector shape.
*
* Vector subtypes
*
* Vector declares a set of vector operations (methods) that are common to all
* element types (such as addition). Sub-classes of Vector with a concrete
* element type declare further operations that are specific to that
* element type (such as access to element values in lanes, logical operations
* on values of integral elements types, or transcendental operations on values
* of floating point element types).
* There are six abstract sub-classes of Vector corresponding to the supported set
* of element types, {@link ByteVector}, {@link ShortVector},
* {@link IntVector} {@link LongVector}, {@link FloatVector}, and
* {@link DoubleVector}. Along with type-specific operations these classes
* support creation of vector values (instances of Vector).
* They expose static constants corresponding to the supported species,
* and static methods on these types generally take a species as a parameter.
* For example,
* {@link FloatVector#fromArray(VectorSpecies, float[], int) FloatVector.fromArray}
* creates and returns a float vector of the specified species, with elements
* loaded from the specified float array.
* It is recommended that Species instances be held in {@code static final}
* fields for optimal creation and usage of Vector values by the runtime compiler.
*
* Lane-wise operations
*
* We use the term lanes when defining operations on
* vectors. The number of lanes in a vector is the number of scalar
* elements it holds. For example, a vector of type {@code float} and
* shape {@code S_256_BIT} has eight lanes, since {@code 32*8=256}.
*
*
*
*
* {@code
* ETYPE scalar_unary_op(ETYPE s);
* EVector a = ...;
* VectorSpecies
*
* {@code
* ETYPE scalar_binary_op(ETYPE s, ETYPE t);
* EVector a = ...;
* VectorSpecies
* {@code
* ETYPE scalar_nary_op(ETYPE... args);
* EVector[] v = ...;
* int N = v.length;
* VectorSpecies
* {@code
* IntVector a = ...;
* int VLENGTH = a.length();
* VectorShape VSHAPE = a.shape();
* double[] arlogical = new double[VLENGTH];
* for (int i = 0; i < limit; i++) {
* int e = a.lane(i);
* arlogical[i] = (double) e;
* }
* VectorSpecies
* {@code
* ETYPE assoc_scalar_binary_op(ETYPE s, ETYPE t);
* EVector a = ...;
* ETYPE r =
* {@code
* EVector a = ...;
* Shuffle
* {@code
* ETYPE[] ar = new ETYPE[a.length()];
* for (int i = 0; i < ar.length; i++) {
* boolean isSet = m.laneIsSet(i);
* ar[i] = isSet ? b.lane(i) : a.lane(i);
* }
* EVector r = EVector.fromArray(species, ar, 0);
* }
* {@code
* boolean scalar_binary_test_op(ETYPE s, ETYPE t);
* EVector a = ...;
* VectorSpecies
* Masked operations
*
*
*
*
*
* {@code
* ETYPE scalar_binary_op(ETYPE s, ETYPE t);
* EVector a = ...;
* VectorSpecies
*
* Lane order and byte order
*
* The number of lane values stored in a given vector is referred to
* as its {@linkplain #length() vector length} or {@code VLENGTH}.
*
* It is useful to consider vector lanes as ordered
* sequentially from first to last, with the first lane
* numbered {@code 0}, the next lane numbered {@code 1}, and so on to
* the last lane numbered {@code VLENGTH-1}. This is a temporal
* order, where lower-numbered lanes are considered earlier than
* higher-numbered (later) lanes. This API uses these terms
* in preference to spatial terms such as "left", "right", "high",
* and "low".
*
* Memory operations
*
* As was already mentioned, vectors can be loaded from memory and
* stored back. An optional mask can control which individual memory
* locations are read from or written to. The shape of a vector
* determines how much memory it will occupy. In the absence of
* masking, the lanes are stored as a dense sequence of back-to-back
* values in memory, the same as a dense (gap-free) series of single
* scalar values in an array of the scalar type.
*
* Memory order corresponds exactly to lane order. The first vector
* lane value occupies the first position in memory, and so on, up to
* the length of the vector. Although memory order is not directly
* defined by Java as a separate concept, the memory order of stored
* vector lanes always corresponds to increasing index values in a
* Java array or in a {@link java.nio.ByteBuffer}.
*
* Expansions, contractions, and partial results
*
* Since vectors are fixed in size, occasions often arise where the
* logical result of an operation is not the same as the physical size
* of the proposed output vector. To encourage user code that is as
* portable and predictable as possible, this API has a systematic
* approach to the design of such resizing vector operations.
*
*
*
*
*
* The method {@link VectorSpecies#partLimit(VectorSpecies,boolean)
* partLimit()} on {@link VectorSpecies} can be used, before any
* expanding or contracting operation is performed, to query the
* limiting value on a part parameter for a proposed expansion
* or contraction. The value returned from {@code partLimit()} is
* positive for expansions, negative for contractions, and zero for
* in-place operations. Its absolute value is the parameter {@code
* M}, and so it serves as an exclusive limit on valid part number
* arguments for the relevant methods. Thus, for expansions, the
* {@code partLimit()} value {@code M} is the exclusive upper limit
* for part numbers, while for contractions the {@code partLimit()}
* value {@code -M} is the exclusive lower limit.
*
* Moving data across lane boundaries
* The cross-lane methods which do not redraw lanes or change species
* are more regularly structured and easier to reason about.
* These operations are:
*
*
*
* Hardware platform dependencies
*
* The Vector API is to accelerate computations in style of Single
* Instruction Multiple Data (SIMD), using available hardware
* resources such as vector hardware registers and vector hardware
* instructions. The API is designed to make effective use of
* multiple SIMD hardware platforms.
*
* No boxing of primitives
*
* Although a vector type like {@code VectorValue-based classes and identity operations
*
* {@code Vector}, along with all of its subtypes and many of its
* helper types like {@code VectorMask} and {@code VectorShuffle}, is a
* value-based
* class.
*
*
*
*
* @apiNote
* If the {@code ETYPE} is {@code float} or {@code double},
* this operation can lose precision and/or range, as a
* normal part of casting the result down to {@code long}.
*
* Usually
* {@linkplain IntVector#reduceLanes(VectorOperators.Associative,VectorMask)
* strongly typed access}
* is preferable, if you are working with a vector
* subtype that has a known element type.
*
* @implNote
* The value of a floating-point reduction may be a function
* both of the input values as well as the order of scalar
* operations which combine those values, specifically in the
* case of {@code ADD} and {@code MUL} operations, where
* details of rounding depend on operand order.
* See {@link FloatVector#reduceLanes(VectorOperators.Associative)
* FloatVector.reduceLanes()} for a discussion.
*
* @param op the operation used to combine lane values
* @param m the mask controlling lane selection
* @return the reduced result accumulated from the selected lane values
* @throws UnsupportedOperationException if this vector does
* not support the requested operation
* @see #reduceLanesToLong(VectorOperators.Associative)
* @see IntVector#reduceLanes(VectorOperators.Associative,VectorMask)
* @see FloatVector#reduceLanes(VectorOperators.Associative,VectorMask)
*/
public abstract long reduceLanesToLong(VectorOperators.Associative op,
VectorMask
*
*
* The following pseudocode illustrates this behavior:
* {@code
* Vector
*
* @param v the second input vector, containing replacement lane values
* @param m the mask controlling lane selection from the second input vector
* @return the result of blending the lane elements of this vector with
* those of the second input vector
*/
public abstract Vector{@code
* Vector
*
* @param scale the number to multiply by each lane index
* {@code N}, typically {@code 1}
* @return the result of incrementing each lane element by its
* corresponding lane index {@code N}, scaled by {@code scale}
* @throws IllegalArgumentException
* if the values in the interval
* {@code [0..VLENGTH*scale]}
* are not representable by the {@code ETYPE}
*/
public abstract Vector{@code
* EVector slice = v1.slice(origin, v2);
* EVector w1 = slice.unslice(origin, v1, 0);
* EVector w2 = slice.unslice(origin, v2, 1);
* assert v1.equals(w1);
* assert v2.equals(w2);
* }
*
*
*
*
*
* @param origin the first input lane to transfer into the slice
* @param v1 a second vector logically concatenated with the first,
* before the slice is taken (if omitted it defaults to zero)
* @return a contiguous slice of {@code VLENGTH} lanes, taken from
* this vector starting at the indicated origin, and
* continuing (as needed) into the second vector
* @throws ArrayIndexOutOfBoundsException if {@code origin}
* is negative or greater than {@code VLENGTH}
* @see #slice(int,Vector,VectorMask)
* @see #slice(int)
* @see #unslice(int,Vector,int)
*/
public abstract Vector{@code
* EVector slice = v1.slice(origin, v2, m);
* EVector w1 = slice.unslice(origin, v1, 0, m);
* EVector w2 = slice.unslice(origin, v2, 1, m);
* assert v1.equals(w1);
* assert v2.equals(w2);
* }
*
* @param origin the first input lane to transfer into the slice
* @param v1 a second vector logically concatenated with the first,
* before the slice is taken (if omitted it defaults to zero)
* @param m the mask controlling lane selection into the resulting vector
* @return a contiguous slice of {@code VLENGTH} lanes, taken from
* this vector starting at the indicated origin, and
* continuing (as needed) into the second vector
* @throws ArrayIndexOutOfBoundsException if {@code origin}
* is negative or greater than {@code VLENGTH}
* @see #slice(int,Vector)
* @see #unslice(int,Vector,int,VectorMask)
*/
// FIXME: does this pull its weight? It's symmetrical with masked unslice.
public abstract Vector
*
*
*
* @param origin the first output lane to receive the slice
* @param w the background vector that (as two copies) will receive
* the inserted slice
* @param part the part number of the result (either zero or one)
* @return either the first or second part of a pair of
* background vectors {@code w}, updated by inserting
* this vector at the indicated origin
* @throws ArrayIndexOutOfBoundsException if {@code origin}
* is negative or greater than {@code VLENGTH},
* or if {@code part} is not zero or one
* @see #slice(int,Vector)
* @see #unslice(int,Vector,int,VectorMask)
*/
public abstract Vector{@code
* Vector
*
* @param s the shuffle controlling lane index selection
* @param m the mask controlling application of the shuffle
* @return the rearrangement of the lane elements of this vector
* @throws IndexOutOfBoundsException if there are any exceptional
* source indexes in the shuffle where the mask is set
* @see #rearrange(VectorShuffle)
* @see #rearrange(VectorShuffle,Vector)
* @see VectorShuffle#laneIsValid()
*/
public abstract Vector{@code
* Vector
*
* @param s the shuffle controlling lane selection from both input vectors
* @param v the second input vector
* @return the rearrangement of lane elements of this vector and
* a second input vector
* @see #rearrange(VectorShuffle)
* @see #rearrange(VectorShuffle,VectorMask)
* @see VectorShuffle#laneIsValid()
* @see #slice(int,Vector)
*/
public abstract Vector{@code
* long[] a = this.toLongArray();
* int[] sa = new int[a.length];
* for (int i = 0; i < a.length; i++) {
* sa[i] = (int) a[i];
* }
* return VectorShuffle.fromValues(this.species(), sa);
* }
*
* @return a shuffle representation of this vector
* @see VectorShuffle#fromValues(VectorSpecies,int...)
*/
public abstract VectorShuffle{@code
* int domSize = this.byteSize();
* int ranSize = species.vectorByteSize();
* int M = (domSize > ranSize ? domSize / ranSize : ranSize / domSize);
* assert Math.abs(part) < M;
* assert (part == 0) || (part > 0) == (domSize > ranSize);
* byte[] ra = new byte[Math.max(domSize, ranSize)];
* if (domSize > ranSize) { // expansion
* this.intoByteArray(ra, 0);
* int origin = part * ranSize;
* return species.fromByteArray(ra, origin);
* } else { // contraction or size-invariant
* int origin = (-part) * domSize;
* this.intoByteArray(ra, origin);
* return species.fromByteArray(ra, 0);
* }
* }
*
* @apiNote Although this method is defined as if the vectors in
* question were loaded or stored into memory, memory semantics
* has little to do or nothing with the actual implementation.
* The appeal to little-endian ordering is simply a shorthand
* for what could otherwise be a large number of detailed rules
* concerning the mapping between lane-structured vectors and
* byte-structured vectors.
*
* @param species the desired vector species
* @param part the part number
* of the result, or zero if neither expanding nor contracting
* @param
*
*
* {@code
* FTYPE scalar_conversion_op(ETYPE s);
* EVector a = ...;
* VectorSpecies
*
* @param conv the desired scalar conversion to apply lane-wise
* @param rsp the desired output species
* @param part the part number
* of the result, or zero if neither expanding nor contracting
* @param {@code
* var bb = ByteBuffer.wrap(a);
* var bo = ByteOrder.LITTLE_ENDIAN;
* var m = maskAll(true);
* intoByteBuffer(bb, offset, m, bo);
* }
*
* @param a the byte array
* @param offset the offset into the array
* @throws IndexOutOfBoundsException
* if {@code offset+N*ESIZE < 0}
* or {@code offset+(N+1)*ESIZE > a.length}
* for any lane {@code N} in the vector
*/
public abstract void intoByteArray(byte[] a, int offset);
/**
* Stores this vector into a byte array starting at an offset
* using a mask.
* {@code
* var bb = ByteBuffer.wrap(a);
* var bo = ByteOrder.LITTLE_ENDIAN;
* intoByteBuffer(bb, offset, m, bo);
* }
*
* @param a the byte array
* @param offset the offset into the array
* @param m the mask controlling lane selection
* @throws IndexOutOfBoundsException
* if {@code offset+N*ESIZE < 0}
* or {@code offset+(N+1)*ESIZE > a.length}
* for any lane {@code N} in the vector
* where the mask is set
*/
public abstract void intoByteArray(byte[] a, int offset,
VectorMask{@code
* var bb = ByteBuffer.wrap(a);
* intoByteBuffer(bb, offset, m, bo);
* }
*
* @param a the byte array
* @param offset the offset into the array
* @param bo the intended byte order
* @param m the mask controlling lane selection
* @throws IndexOutOfBoundsException
* if {@code offset+N*ESIZE < 0}
* or {@code offset+(N+1)*ESIZE > a.length}
* for any lane {@code N} in the vector
* where the mask is set
*/
public abstract void intoByteArray(byte[] a, int offset,
ByteOrder bo,
VectorMask{@code
* var m = maskAll(true);
* intoByteBuffer(bb, offset, m, bo);
* }
*
* @param bb the byte buffer
* @param offset the offset into the array
* @param bo the intended byte order
* @param m the mask controlling lane selection
* @throws IndexOutOfBoundsException
* if {@code offset+N*ESIZE < 0}
* or {@code offset+(N+1)*ESIZE > bb.limit()}
* for any lane {@code N} in the vector
*/
public abstract void intoByteBuffer(ByteBuffer bb, int offset, ByteOrder bo);
/**
* Stores this vector into a byte buffer starting at an offset
* using explicit byte order and a mask.
* {@code
* EBuffer eb = bb.duplicate()
* .position(offset)
* .order(bo).asEBuffer();
* ETYPE[] a = this.toArray();
* for (int n = 0; n < a.length; n++) {
* if (m.laneIsSet(n)) {
* eb.put(n, es[n]);
* }
* }
* }
* @implNote
* This operation is likely to be more efficient if
* the specified byte order is the same as
* {@linkplain ByteOrder#nativeOrder()
* the platform native order},
* since this method will not need to reorder
* the bytes of lane values.
* In the special case where {@code ETYPE} is
* {@code byte}, the byte order argument is
* ignored.
*
* @param bb the byte buffer
* @param offset the offset into the array
* @param bo the intended byte order
* @param m the mask controlling lane selection
* @throws IndexOutOfBoundsException
* if {@code offset+N*ESIZE < 0}
* or {@code offset+(N+1)*ESIZE > bb.limit()}
* for any lane {@code N} in the vector
* where the mask is set
*/
public abstract void intoByteBuffer(ByteBuffer bb, int offset,
ByteOrder bo, VectorMask