1 /* 2 * Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved. 3 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. 4 * 5 * This code is free software; you can redistribute it and/or modify it 6 * under the terms of the GNU General Public License version 2 only, as 7 * published by the Free Software Foundation. Oracle designates this 8 * particular file as subject to the "Classpath" exception as provided 9 * by Oracle in the LICENSE file that accompanied this code. 10 * 11 * This code is distributed in the hope that it will be useful, but WITHOUT 12 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or 13 * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License 14 * version 2 for more details (a copy is included in the LICENSE file that 15 * accompanied this code). 16 * 17 * You should have received a copy of the GNU General Public License version 18 * 2 along with this work; if not, write to the Free Software Foundation, 19 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. 20 * 21 * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA 22 * or visit www.oracle.com if you need additional information or have any 23 * questions. 24 */ 25 26 /** 27 * {@Incubating} 28 * <p> 29 * Classes to express vector computations that, given suitable hardware 30 * and runtime ability, are accelerated using vector hardware instructions. 31 * <p> 32 * Vector computations consist of a sequence of operations on vectors. 33 * A vector is a fixed sequence of scalar values; a scalar value is 34 * a single unit of value such as an int, a long, a float and so on. 35 * Operations on vectors typically perform the equivalent scalar operation on all 36 * scalar values of the participating vectors, usually generating a vector result. 37 * When run on a supporting platform, these operations can be 38 * executed in parallel by the hardware. 39 * This style of parallelism is called <em>Single Instruction Multiple Data</em> (SIMD) 40 * parallelism. 41 * 42 * <p>The abstract class {@link jdk.incubator.vector.Vector} represents an ordered immutable sequence of 43 * values of the same element type 'e' that is one of the following primitive types - 44 * byte, short, int, long, float, or double. The type variable E corresponds to the 45 * boxed element type, specifically the class that wraps a value of e in an object 46 * (such as Integer class that wraps a value of int). 47 * 48 * <p>Vector declares a set of vector operations (methods) that are common to 49 * all element types (such as addition). Subclasses of Vector corresponding to 50 * a specific element type declare further operations that are specific to that element type 51 * (such as access to element values in lanes, logical operations on values of integral 52 * elements types, or transcendental operations on values of floating point element 53 * types). There are six abstract subclasses of {@link jdk.incubator.vector.Vector} corresponding to the supported set of 54 * element types: {@link jdk.incubator.vector.ByteVector}, {@link jdk.incubator.vector.ShortVector}, 55 * {@link jdk.incubator.vector.IntVector}, {@link jdk.incubator.vector.LongVector}, 56 * {@link jdk.incubator.vector.FloatVector}, and {@link jdk.incubator.vector.DoubleVector}. 57 * 58 * In addition to element type, vectors are parameterized by their <em>shape</em>, 59 * which is their length. The supported shapes are 60 * represented by the enum {@link jdk.incubator.vector.VectorShape}. 61 * The combination of element type and shape determines a <em>vector species</em>, 62 * represented by {@link jdk.incubator.vector.VectorSpecies}. The various typed 63 * vector classes expose static constants corresponding to the supported species, 64 * and static methods on these types generally take a species as a parameter. 65 * For example, 66 * {@link jdk.incubator.vector.FloatVector#fromArray(VectorSpecies, float[], int) FloatVector.fromArray()} 67 * creates and returns a float vector of the specified species, with elements 68 * loaded from the specified float array. 69 * 70 * <p> 71 * The species instance for a specific combination of element type and shape 72 * can be obtained by reading the appropriate static field, as follows: 73 * <p> 74 * {@code VectorSpecies<Float> s = FloatVector.SPECIES_256}; 75 * <p> 76 * 77 * Code that is agnostic to species can request the "preferred" species for a 78 * given element type, where the optimal size is selected for the current platform: 79 * <p> 80 * {@code VectorSpecies<Float> s = FloatVector.SPECIES_PREFERRED}; 81 * <p> 82 * 83 * <p> 84 * Here is an example of multiplying elements of two float arrays {@code a and b} using vector computation 85 * and storing result in array {@code c}. 86 * <pre>{@code 87 * static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_512; 88 * 89 * void vectorMultiply(float[] a, float[] b, float[] c) { 90 * int i = 0; 91 * // It is assumed array arguments are of the same size 92 * for (; i < (a.length & ~(SPECIES.length() - 1)); 93 * i += SPECIES.length()) { 94 * FloatVector va = FloatVector.fromArray(SPECIES, a, i); 95 * FloatVector vb = FloatVector.fromArray(SPECIES, b, i); 96 * FloatVector vc = va.mul(vb) 97 * vc.intoArray(c, i); 98 * } 99 * 100 * for (; i < a.length; i++) { 101 * c[i] = a[i] * b[i]; 102 * } 103 * } 104 * }</pre> 105 * 106 * The scalar computation after the vector computation is required to process the tail of 107 * elements, the length of which is smaller than the species length. 108 * 109 * The example above uses vectors hardcoded to a concrete shape (512-bit). Instead, we could use preferred 110 * species as shown below, to make the code dynamically adapt to optimal shape for the platform on which it runs. 111 * 112 * <pre>{@code 113 * static final VectorSpecies<Float> SPECIES = FloatVector.SPECIES_PREFERRED; 114 * }</pre> 115 * 116 * <h2>Vector operations</h2> 117 * We use the term <em>lanes</em> when defining operations on vectors. The number of lanes 118 * in a vector is the number of scalar elements it holds. For example, a vector of 119 * type {@code Float} and shape {@code VectorShape.S_256_BIT} has eight lanes. 120 * Vector operations can be grouped into various categories and their behavior 121 * generally specified as follows: 122 * <ul> 123 * <li> 124 * A lane-wise unary operation operates on one input vector and produce a 125 * result vector. 126 * For each lane of the input vector the 127 * lane element is operated on using the specified scalar unary operation and 128 * the element result is placed into the vector result at the same lane. 129 * The following pseudocode expresses the behavior of this operation category, 130 * where {@code e} is the element type and {@code EVector} corresponds to the 131 * primitive Vector type: 132 * 133 * <pre>{@code 134 * EVector a = ...; 135 * e[] ar = new e[a.length()]; 136 * for (int i = 0; i < a.length(); i++) { 137 * ar[i] = scalar_unary_op(a.get(i)); 138 * } 139 * EVector r = EVector.fromArray(a.species(), ar, 0); 140 * }</pre> 141 * 142 * Unless otherwise specified the input and result vectors will have the same 143 * element type and shape. 144 * 145 * <li> 146 * A lane-wise binary operation operates on two input 147 * vectors to produce a result vector. 148 * For each lane of the two input vectors, 149 * a and b say, the corresponding lane elements from a and b are operated on 150 * using the specified scalar binary operation and the element result is placed 151 * into the vector result at the same lane. 152 * The following pseudocode expresses the behavior of this operation category: 153 * 154 * <pre>{@code 155 * EVector a = ...; 156 * EVector b = ...; 157 * e[] ar = new e[a.length()]; 158 * for (int i = 0; i < a.length(); i++) { 159 * ar[i] = scalar_binary_op(a.get(i), b.get(i)); 160 * } 161 * EVector r = EVector.fromArray(a.species(), ar, 0); 162 * }</pre> 163 * 164 * Unless otherwise specified the two input and result vectors will have the 165 * same element type and shape. 166 * 167 * <li> 168 * Generalizing from unary and binary operations, a lane-wise n-ary 169 * operation operates on n input vectors to produce a 170 * result vector. 171 * N lane elements from each input vector are operated on 172 * using the specified n-ary scalar operation and the element result is placed 173 * into the vector result at the same lane. 174 * 175 * Unless otherwise specified the n input and result vectors will have the same 176 * element type and shape. 177 * 178 * <li> 179 * A vector reduction operation operates on all the lane 180 * elements of an input vector, and applies an accumulation function to all the 181 * lane elements to produce a scalar result. 182 * If the reduction operation is associative then the result may be accumulated 183 * by operating on the lane elements in any order using a specified associative 184 * scalar binary operation and identity value. Otherwise, the reduction 185 * operation specifies the behavior of the accumulation function. 186 * The following pseudocode expresses the behavior of this operation category 187 * if it is associative: 188 * <pre>{@code 189 * EVector a = ...; 190 * e r = <identity value>; 191 * for (int i = 0; i < a.length(); i++) { 192 * r = assoc_scalar_binary_op(r, a.get(i)); 193 * } 194 * }</pre> 195 * 196 * Unless otherwise specified the scalar result type and element type will be 197 * the same. 198 * 199 * <li> 200 * A lane-wise binary test operation operates on two input vectors to produce a 201 * result mask. For each lane of the two input vectors, a and b say, the 202 * the corresponding lane elements from a and b are operated on using the 203 * specified scalar binary test operation and the boolean result is placed 204 * into the mask at the same lane. 205 * The following pseudocode expresses the behavior of this operation category: 206 * <pre>{@code 207 * EVector a = ...; 208 * EVector b = ...; 209 * boolean[] ar = new boolean[a.length()]; 210 * for (int i = 0; i < a.length(); i++) { 211 * ar[i] = scalar_binary_test_op(a.get(i), b.get(i)); 212 * } 213 * VectorMask<E> r = VectorMask.fromArray(a.species(), ar, 0); 214 * }</pre> 215 * 216 * Unless otherwise specified the two input vectors and result mask will have 217 * the same element type and shape. 218 * 219 * <li> 220 * The prior categories of operation can be said to operate within the vector 221 * lanes, where lane access is uniformly applied to all vectors, specifically 222 * the scalar operation is applied to elements taken from input vectors at the 223 * same lane, and if appropriate applied to the result vector at the same lane. 224 * A further category of operation is a cross-lane vector operation where lane 225 * access is defined by the arguments to the operation. Cross-lane operations 226 * generally rearrange lane elements, for example by permutation (commonly 227 * controlled by a {@link jdk.incubator.vector.VectorShuffle}) or by blending (commonly controlled by a 228 * {@link jdk.incubator.vector.VectorMask}). Such an operation explicitly specifies how it rearranges lane 229 * elements. 230 * </ul> 231 * 232 * <p> 233 * If a vector operation does not belong to one of the above categories then 234 * the operation explicitly specifies how it processes the lane elements of 235 * input vectors, and where appropriate expresses the behavior using 236 * pseudocode. 237 * 238 * <p> 239 * Many vector operations provide an additional {@link jdk.incubator.vector.VectorMask mask}-accepting 240 * variant. 241 * The mask controls which lanes are selected for application of the scalar 242 * operation. Masks are a key component for the support of control flow in 243 * vector computations. 244 * <p> 245 * For certain operation categories the mask accepting variants can be specified 246 * in generic terms. If a lane of the mask is set then the scalar operation is 247 * applied to corresponding lane elements, otherwise if a lane of a mask is not 248 * set then a default scalar operation is applied and its result is placed into 249 * the vector result at the same lane. The default operation is specified as follows: 250 * <ul> 251 * <li> 252 * For a lane-wise n-ary operation the default operation is a function that returns 253 * it's first argument, specifically the lane element of the first input vector. 254 * <li> 255 * For an associative vector reduction operation the default operation is a 256 * function that returns the identity value. 257 * <li> 258 * For lane-wise binary test operation the default operation is a function that 259 * returns false. 260 * </ul> 261 * Otherwise, the mask accepting variant of the operation explicitly specifies 262 * how it processes the lane elements of input vectors, and where appropriate 263 * expresses the behavior using pseudocode. 264 * 265 * <p> 266 * For convenience, many vector operations of arity greater than one provide 267 * an additional scalar-accepting variant (such as adding a constant scalar 268 * value to all lanes of a vector). This variant accepts compatible 269 * scalar values instead of vectors for the second and subsequent input vectors, 270 * if any. 271 * Unless otherwise specified the scalar variant behaves as if each scalar value 272 * is transformed to a vector using the appropriate vector {@code broadcast} operation, and 273 * then the vector accepting vector operation is applied using the transformed 274 * values. 275 * 276 * <h2> Performance notes </h2> 277 * This package depends on the runtime's ability to dynamically compile vector operations 278 * into optimal vector hardware instructions. There is a default scalar implementation 279 * for each operation which is used if the operation cannot be compiled to vector instructions. 280 * 281 * <p>There are certain things users need to pay attention to for generating optimal vector machine code: 282 * 283 * <ul> 284 * <li>The shape of vectors used should be supported by the underlying platform. For example, 285 * code written using {@code IntVector} of Shape S_512_BIT will not be compiled to vector 286 * instructions on a platform which supports only 256 bit vectors. Instead, the default 287 * scalar implementation will be used. 288 * For this reason, it is recommended to use the preferred species as shown above to write 289 * generically sized vector computations. 290 * <li>Classes defined in this package should be treated as 291 * <a href="{@docRoot}/java.base/java/lang/doc-files/ValueBased.html">value-based</a> classes. 292 * Use of identity-sensitive operations (including reference equality 293 * ({@code ==}), identity hash code, or synchronization) will limit generation of 294 * optimal vector instructions. 295 * </ul> 296 */ 297 package jdk.incubator.vector;