1 /* 2 * Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved. 3 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 4 * 5 * This code is free software; you can redistribute it and/or modify it 6 * under the terms of the GNU General Public License version 2 only, as 7 * published by the Free Software Foundation. Oracle designates this 8 * particular file as subject to the "Classpath" exception as provided 9 * by Oracle in the LICENSE file that accompanied this code. 10 * 11 * This code is distributed in the hope that it will be useful, but WITHOUT 12 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 13 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 14 * version 2 for more details (a copy is included in the LICENSE file that 15 * accompanied this code). 16 * 17 * You should have received a copy of the GNU General Public License version 18 * 2 along with this work; if not, write to the Free Software Foundation, 19 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 20 * 21 * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 22 * or visit www.oracle.com if you need additional information or have any 23 * questions. 24 */ 25 26 /** 27 * {@Incubating} 28 * <p> 29 * Classes to express vector computations that, given suitable hardware 30 * and runtime ability, are accelerated using vector hardware instructions. 31 * <p> 32 * Vector computations consist of a sequence of operations on vectors. 33 * A vector is a fixed sequence of scalar values; a scalar value is 34 * a single unit of value such as an int, a long, a float and so on. 35 * Operations on vectors typically perform the equivalent scalar operation on all 36 * scalar values of the participating vectors, usually generating a vector result. 37 * When run on a supporting platform, these operations can be 38 * executed in parallel by the hardware. 39 * This style of parallelism is called <em>Single Instruction Multiple Data</em> (SIMD) 40 * parallelism. 41 * 42 * <p>The abstract class {@link jdk.incubator.vector.Vector} represents an ordered immutable sequence of 43 * values of the same element type 'e' that is one of the following primitive types - 44 * byte, short, int, long, float, or double. The type variable E corresponds to the 45 * boxed element type, specifically the class that wraps a value of e in an object 46 * (such as Integer class that wraps a value of int). 47 * 48 * <p>Vector declares a set of vector operations (methods) that are common to 49 * all element types (such as addition). Subclasses of Vector corresponding to 50 * a specific element type declare further operations that are specific to that element type 51 * (such as access to element values in lanes, logical operations on values of integral 52 * elements types, or transcendental operations on values of floating point element 53 * types). There are six abstract subclasses of {@link jdk.incubator.vector.Vector} corresponding to the supported set of 54 * element types: {@link jdk.incubator.vector.ByteVector}, {@link jdk.incubator.vector.ShortVector}, 55 * {@link jdk.incubator.vector.IntVector}, {@link jdk.incubator.vector.LongVector}, 56 * {@link jdk.incubator.vector.FloatVector}, and {@link jdk.incubator.vector.DoubleVector}. 57 * 58 * In addition to element type, vectors are parameterized by their <em>shape</em>, 59 * which is their length. The supported shapes are 60 * represented by the enum {@link jdk.incubator.vector.VectorShape}. 61 * The combination of element type and shape determines a <em>vector species</em>, 62 * represented by {@link jdk.incubator.vector.VectorSpecies}. The various typed 63 * vector classes expose static constants corresponding to the supported species, 64 * and static methods on these types generally take a species as a parameter. 65 * For example, 66 * {@link jdk.incubator.vector.FloatVector#fromArray(VectorSpecies, float[], int) FloatVector.fromArray()} 67 * creates and returns a float vector of the specified species, with elements 68 * loaded from the specified float array. 69 * 70 * <p> 71 * The species instance for a specific combination of element type and shape 72 * can be obtained by reading the appropriate static field, as follows: 73 * <p> 74 * {@code VectorSpecies<Float> s = FloatVector.SPECIES_256}; 75 * <p> 76 * 77 * Code that is agnostic to species can request the "preferred" species for a 78 * given element type, where the optimal size is selected for the current platform: 79 * <p> 80 * {@code VectorSpecies<Float> s = FloatVector.SPECIES_PREFERRED}; 81 * <p> 82 * 83 * <p> 84 * Here is an example of multiplying elements of two float arrays {@code a and b} using vector computation 85 * and storing result in array {@code c}. 86 * <pre>{@code 87 * static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_512; 88 * 89 * void vectorMultiply(float[] a, float[] b, float[] c) { 90 * int i = 0; 91 * // It is assumed array arguments are of the same size 92 * for (; i < (a.length & ~(SPECIES.length() - 1)); 93 * i += SPECIES.length()) { 94 * FloatVector va = FloatVector.fromArray(SPECIES, a, i); 95 * FloatVector vb = FloatVector.fromArray(SPECIES, b, i); 96 * FloatVector vc = va.mul(vb) 97 * vc.intoArray(c, i); 98 * } 99 * 100 * for (; i < a.length; i++) { 101 * c[i] = a[i] * b[i]; 102 * } 103 * } 104 * }</pre> 105 * 106 * The scalar computation after the vector computation is required to process the tail of 107 * elements, the length of which is smaller than the species length. {@code VectorSpecies} also defines a 108 * {@link jdk.incubator.vector.VectorSpecies#loopBound(int) loopBound()} helper method which can be used in place of 109 * {@code (a.length & ~(SPECIES.length() - 1))} in the above code to determine the terminating condition. 110 * 111 * The example above uses vectors hardcoded to a concrete shape (512-bit). Instead, we could use preferred 112 * species as shown below, to make the code dynamically adapt to optimal shape for the platform on which it runs. 113 * 114 * <pre>{@code 115 * static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED; 116 * }</pre> 117 * 118 * <h2>Vector operations</h2> 119 * We use the term <em>lanes</em> when defining operations on vectors. The number of lanes 120 * in a vector is the number of scalar elements it holds. For example, a vector of 121 * type {@code Float} and shape {@code VectorShape.S_256_BIT} has eight lanes. 122 * Vector operations can be grouped into various categories and their behavior 123 * generally specified as follows: 124 * <ul> 125 * <li> 126 * A lane-wise unary operation operates on one input vector and produce a 127 * result vector. 128 * For each lane of the input vector the 129 * lane element is operated on using the specified scalar unary operation and 130 * the element result is placed into the vector result at the same lane. 131 * The following pseudocode expresses the behavior of this operation category, 132 * where {@code e} is the element type and {@code EVector} corresponds to the 133 * primitive Vector type: 134 * 135 * <pre>{@code 136 * EVector a = ...; 137 * e[] ar = new e[a.length()]; 138 * for (int i = 0; i < a.length(); i++) { 139 * ar[i] = scalar_unary_op(a.lane(i)); 140 * } 141 * EVector r = EVector.fromArray(a.species(), ar, 0); 142 * }</pre> 143 * 144 * Unless otherwise specified the input and result vectors will have the same 145 * element type and shape. 146 * 147 * <li> 148 * A lane-wise binary operation operates on two input 149 * vectors to produce a result vector. 150 * For each lane of the two input vectors, 151 * a and b say, the corresponding lane elements from a and b are operated on 152 * using the specified scalar binary operation and the element result is placed 153 * into the vector result at the same lane. 154 * The following pseudocode expresses the behavior of this operation category: 155 * 156 * <pre>{@code 157 * EVector a = ...; 158 * EVector b = ...; 159 * e[] ar = new e[a.length()]; 160 * for (int i = 0; i < a.length(); i++) { 161 * ar[i] = scalar_binary_op(a.lane(i), b.lane(i)); 162 * } 163 * EVector r = EVector.fromArray(a.species(), ar, 0); 164 * }</pre> 165 * 166 * Unless otherwise specified the two input and result vectors will have the 167 * same element type and shape. 168 * 169 * <li> 170 * Generalizing from unary and binary operations, a lane-wise n-ary 171 * operation operates on n input vectors to produce a 172 * result vector. 173 * N lane elements from each input vector are operated on 174 * using the specified n-ary scalar operation and the element result is placed 175 * into the vector result at the same lane. 176 * 177 * Unless otherwise specified the n input and result vectors will have the same 178 * element type and shape. 179 * 180 * <li> 181 * A vector reduction operation operates on all the lane 182 * elements of an input vector, and applies an accumulation function to all the 183 * lane elements to produce a scalar result. 184 * If the reduction operation is associative then the result may be accumulated 185 * by operating on the lane elements in any order using a specified associative 186 * scalar binary operation and identity value. Otherwise, the reduction 187 * operation specifies the behavior of the accumulation function. 188 * The following pseudocode expresses the behavior of this operation category 189 * if it is associative: 190 * <pre>{@code 191 * EVector a = ...; 192 * e r = <identity value>; 193 * for (int i = 0; i < a.length(); i++) { 194 * r = assoc_scalar_binary_op(r, a.lane(i)); 195 * } 196 * }</pre> 197 * 198 * Unless otherwise specified the scalar result type and element type will be 199 * the same. 200 * 201 * <li> 202 * A lane-wise binary test operation operates on two input vectors to produce a 203 * result mask. For each lane of the two input vectors, a and b say, the 204 * the corresponding lane elements from a and b are operated on using the 205 * specified scalar binary test operation and the boolean result is placed 206 * into the mask at the same lane. 207 * The following pseudocode expresses the behavior of this operation category: 208 * <pre>{@code 209 * EVector a = ...; 210 * EVector b = ...; 211 * boolean[] ar = new boolean[a.length()]; 212 * for (int i = 0; i < a.length(); i++) { 213 * ar[i] = scalar_binary_test_op(a.lane(i), b.lane(i)); 214 * } 215 * VectorMask<E> r = VectorMask.fromArray(a.species(), ar, 0); 216 * }</pre> 217 * 218 * Unless otherwise specified the two input vectors and result mask will have 219 * the same element type and shape. 220 * 221 * <li> 222 * The prior categories of operation can be said to operate within the vector 223 * lanes, where lane access is uniformly applied to all vectors, specifically 224 * the scalar operation is applied to elements taken from input vectors at the 225 * same lane, and if appropriate applied to the result vector at the same lane. 226 * A further category of operation is a cross-lane vector operation where lane 227 * access is defined by the arguments to the operation. Cross-lane operations 228 * generally rearrange lane elements, for example by permutation (commonly 229 * controlled by a {@link jdk.incubator.vector.VectorShuffle}) or by blending (commonly controlled by a 230 * {@link jdk.incubator.vector.VectorMask}). Such an operation explicitly specifies how it rearranges lane 231 * elements. 232 * </ul> 233 * 234 * <p> 235 * If a vector operation does not belong to one of the above categories then 236 * the operation explicitly specifies how it processes the lane elements of 237 * input vectors, and where appropriate expresses the behavior using 238 * pseudocode. 239 * 240 * <p> 241 * Many vector operations provide an additional {@link jdk.incubator.vector.VectorMask mask}-accepting 242 * variant. 243 * The mask controls which lanes are selected for application of the scalar 244 * operation. Masks are a key component for the support of control flow in 245 * vector computations. 246 * <p> 247 * For certain operation categories the mask accepting variants can be specified 248 * in generic terms. If a lane of the mask is set then the scalar operation is 249 * applied to corresponding lane elements, otherwise if a lane of a mask is not 250 * set then a default scalar operation is applied and its result is placed into 251 * the vector result at the same lane. The default operation is specified as follows: 252 * <ul> 253 * <li> 254 * For a lane-wise n-ary operation the default operation is a function that returns 255 * it's first argument, specifically the lane element of the first input vector. 256 * <li> 257 * For an associative vector reduction operation the default operation is a 258 * function that returns the identity value. 259 * <li> 260 * For lane-wise binary test operation the default operation is a function that 261 * returns false. 262 * </ul> 263 * Otherwise, the mask accepting variant of the operation explicitly specifies 264 * how it processes the lane elements of input vectors, and where appropriate 265 * expresses the behavior using pseudocode. 266 * 267 * <p> 268 * For convenience, many vector operations of arity greater than one provide 269 * an additional scalar-accepting variant (such as adding a constant scalar 270 * value to all lanes of a vector). This variant accepts compatible 271 * scalar values instead of vectors for the second and subsequent input vectors, 272 * if any. 273 * Unless otherwise specified the scalar variant behaves as if each scalar value 274 * is transformed to a vector using the appropriate vector {@code broadcast} operation, and 275 * then the vector accepting vector operation is applied using the transformed 276 * values. 277 * 278 * <h2> Performance notes </h2> 279 * This package depends on the runtime's ability to dynamically compile vector operations 280 * into optimal vector hardware instructions. There is a default scalar implementation 281 * for each operation which is used if the operation cannot be compiled to vector instructions. 282 * 283 * <p>There are certain things users need to pay attention to for generating optimal vector machine code: 284 * 285 * <ul> 286 * <li>The shape of vectors used should be supported by the underlying platform. For example, 287 * code written using {@code IntVector} of Shape S_512_BIT will not be compiled to vector 288 * instructions on a platform which supports only 256 bit vectors. Instead, the default 289 * scalar implementation will be used. 290 * For this reason, it is recommended to use the preferred species as shown above to write 291 * generically sized vector computations. 292 * <li>Classes defined in this package should be treated as 293 * <a href="{@docRoot}/java.base/java/lang/doc-files/ValueBased.html">value-based</a> classes. 294 * Use of identity-sensitive operations (including reference equality 295 * ({@code ==}), identity hash code, or synchronization) will limit generation of 296 * optimal vector instructions. 297 * </ul> 298 */ 299 package jdk.incubator.vector;