< prev index next >

src/share/vm/opto/superword.cpp

Print this page
rev 8530 : SIMD: RFR(S): 8085932: Fixing bugs in detecting memory alignments in SuperWord - before making Trace class.
rev 8531 : SIMD: RFR(S): 8085932: Fixing bugs in detecting memory alignments in SuperWord - before making Trace class.
Copied print msg from prestine. No Tracer yet.
rev 8532 : SIMD: RFR(S): 8085932: Fixing bugs in detecting memory alignments in SuperWord. Added Tracer. Tabulation not fixed.
rev 8533 : SIMD: RFR(S): 8085932: Fixing bugs in detecting memory alignments in SuperWord - before making Trace class.
Printing (debug+trace) functions still here.
rev 8534 : SIMD: RFR(S): 8085932: Fixing bugs in detecting memory alignments in SuperWord.
Extra printing removed. Bug fixing for invariant and scale still here.
rev 8535 : SIMD: RFR(S): 8085932: Fixing bugs in detecting memory alignments in SuperWord.
Fixing printf.
rev 8536 : SIMD: RFR(S): 8085932: Fixing bugs in detecting memory alignments in SuperWord.
Fixing printf, take 2.
rev 8537 : SIMD: RFR(S): 8085932: Fixing bugs in detecting memory alignments in SuperWord.
Removed fix for "already has an invariant", "already found a sclae"
rev 8541 : SIMD: RFR(S): 8085932: Fixing bugs in detecting memory alignments in SuperWord.
Starting to add vector for conditional_move. Preparation only, need to add vector Bool, vector CmpD, vector CmoveD.
rev 8542 : SIMD: RFR(S): 8085932: added merge_packs_to_cmovd, CMoveDVNode is not built yet.
rev 8543 : SIMD: RFR(S): 8085932: Passed "Unimplemented". Failed as "Unprofitable".
rev 8544 : SIMD: RFR(S): 8085932: passed profitability. Need the correct .ad file.
Has FIXME! in places where stepping by 1 need to be corrected by stepping by 3. May revisit this code and find a better solution.
rev 8545 : SIMD: RFR(S): 8085932: Ideal Graph builds OK and the code is generated. .ad file is not correct yet.
rev 8700 : Merge
rev 8706 : SIMD: RFR(S): 8085932: fixing "friend class"
Added initialization to ctor, trailing spaces removed.
rev 8707 : Merge
rev 8708 : SIMD: CMove update - from c:\Java\openjdk-clone-060315\hotspot\
rev 8709 : Merge
rev 8710 : SIMD: small cleanup
rev 8711 : SIMD: CMoveVD - produces some code, but actually garbage in .ad file.
No reshaping in Matcher.
rev 8712 : SIMD: CMoveVD - .ad catches up CC code! The generating code still incorrect.
rev 8713 : SIMD: CMoveVD - .ad is "almost" good. need to make CC in codegen taken rightly from IdealGraph.
Also added (Windows only) an option to stop in debugger after compiled file has been printed.
See compile.cpp: WINDOWS_ONLY(if(method()->has_option("BreakAfterCompilation")) DebugBreak();)
rev 8714 : SIMD: CMoveVD - .ad is good (need to be tested).
SuperWord creates CMoveVDNode(cc, src1, src2, vt) and cc is a clone of the original Bool in CmpD.
rev 8717 : SIMD: CMoveVD - .ad is good (need to be tested).
Added class CMoveVD_map. Removing Flag_is_CMove on a way.
rev 8718 : SIMD: CMoveVD - clean-up - created normal constructor in CMoveVD_map.
rev 8719 : SIMD: CMoveVD - cleanup.
rev 8720 : SIMD: added is_Bool_candidate, is_CmpD_candidate
rev 8721 : SIMD: cleanup
rev 8722 : SIMD: created class CMoveKit
rev 8723 : SIMD: cleanup
rev 8725 : SIMD: small changes ...
rev 8726 : SIMD: use insert instead of push, since the index is known.
rev 8727 : SIMD: cleanup
rev 8728 : SIMD: cleanup
rev 8729 : SIMD: removed constant "3", need cleanup.
rev 8730 : SIMD: cleanup
rev 8731 : SIMD: cleanup
rev 8732 : SIMD: almost clean superword.cpp. but revisit !FIXME!
rev 8733 : SIMD: cleanup. src/cpu/x86/vm/x86.ad needs more.
Some !FIXME! are remaining, mostly for second thought
rev 8734 : SIMD: OK for ">=", does NOT work for ">" (Computational ERROR: val(253369.89505687315) and val_gold(4573627.789285206) not equal.)
callValue = (callValue >= 0.0) ? callValue : 0;

val      = forCallValue(Sval, Xval, MuByT, VBySqrtT);
val_gold = forCallValue_gold(Sval, Xval, MuByT, VBySqrtT);
if (val != val_gold) {
rev 8877 : tests from repo commit
rev 8879 : SIMD: cleanup. Nearly good to push to Oracle.
rev 8880 : SIMD: cleanup -> good to send
rev 8885 : SIMD: starting support for safe loop modification in SW - added PhaseIdealLoop::create_reserve_version_of_loop.
Ugly, but works OK (no support yet for loop reverse)
rev 8886 : SIMD: added BoolNode bol_ne - now BoolNode bol_eq may be subsumed by bol_ne and therefore loop will be switched to the reserved copy.
rev 8887 : SIMD: class LoopReserveKit created. Both directions are tested, in this checkin
switch_to_reserved() is called in SuperWord::output just for testing reverse to the original loop.
rev 8889 : SIMD: added option DoReserveCopyInSuperWord
rev 8927 : SIMD: added
        assert(bol->is_Bool(), "should be BoolNode - too late to bail out!");
        if (do_reserve_copy() && lk._has_reserved && !bol->is_Bool()) {
          goto output_error;
        }
rev 8930 : SIMD: cleanup - trailing spaces, tabs
rev 8931 : SIMD: version of LoopReserveKit that does not require goto:
it's dtor is making switching to the modified copy of the loop (switch_to_reserved),
if use_new() has been ever called, otherwise dtor does nothing and then
by default the graph remains in it's cloned copy.
NOTE, that superword.cpp:2106 includes goto output_error which unconditionally
(for testing purpose) switches control to the reserved copy.
NOTE: code is dirty, need cleaning. But works.
NOTE: it still has modification of BoolNode. Way to replace it with ConINode.
rev 8932 : SIMD: cleanup
rev 8933 : SIMD: added LoopReserveKit::_active, cleanup
rev 8934 : SIMD: create_reserve is right in the LoopReserveKit::ctor.
DoReserveCopyInSuperWord should be used as only condition
for working with lk.functions in SuperWord::output
rev 8937 : SIMD: added option DoReserveCopyInSuperWordTest for testing switching to reversed copy.
Much better functionality description of LoopReserveKit in loopnode.hpp
Cleanup in loopUnswitch.cpp
rev 8938 : SIMD: some functions are renamed, some cleanup
rev 9037 : SIMD: added SuperWord code for testing CountedLoopReserveKit
rev 9039 : Merge
rev 9040 : SIMD: fixing some lines escaped in merge,
using do_reserve_copy() instead of direct use of DoReserveCopyInSuperWord.
rev 9045 : SIMD: fixed if (!def->is_Bool() || def->in(0) != NULL || def->outcnt() != 1) return NULL;,
      Removed if(method()->has_option("BreakAfterCompilation")) os::breakpoint();
rev 9093 : Merge
rev 9098 : SIMD: rename _CountedLoopReserveKit_test to _CountedLoopReserveKit_debug
rev 9099 : SIMD: fixing kvn comments (4) to Re: RFR(M): 8136725
rev 9101 : Merge
rev 9102 : SIMD: fixing merge
rev 9106 : SIMD: removing assert in SuperWord::output and vector_opd. Need cleanup.
rev 9107 : SIMD: adding more processing around asserts in SuperWord::output
rev 9108 : SIMD: added if (do_reserve_copy()) to all suspects in SuperWord::output
rev 9109 : SIMD: small debug/release bug fixing
rev 9110 : SIMD: formatting
rev 9138 : Merge
rev 9139 : Merge
rev 9147 : Merge
rev 9148 : SIMD: fixing NOT_PRODUCT printout and formatting "if-return"
rev 9150 : SIMD: fixing trace/debug printiout
rev 9158 : SIMD restore from 9150, 'relase Test results passed 520; failed 22; error 6.
fastdebug produces 'load vector' and 17 vs 28 performance gain on -XX+UseCMov
rev 9159 : SIMD: same output for release and fastdebug as 9158
rev 9160 : SIMD: again same output for release and fastdebug as 9158
rev 9162 : SIMD: better debug printing and formatting

@@ -35,10 +35,11 @@
 #include "opto/mulnode.hpp"
 #include "opto/opcodes.hpp"
 #include "opto/opaquenode.hpp"
 #include "opto/superword.hpp"
 #include "opto/vectornode.hpp"
+#include "opto/movenode.hpp"
 
 //
 //                  S U P E R W O R D   T R A N S F O R M
 //=============================================================================
 

@@ -53,10 +54,11 @@
   _data_entry(arena(), 8,  0, NULL),      // nodes with all inputs from outside
   _mem_slice_head(arena(), 8,  0, NULL),  // memory slice heads
   _mem_slice_tail(arena(), 8,  0, NULL),  // memory slice tails
   _node_info(arena(), 8,  0, SWNodeInfo::initial), // info needed per node
   _clone_map(phase->C->clone_map()),      // map of nodes created in cloning
+  _cmovev_kit(_arena, this),                    // map to facilitate CMoveVD creation
   _align_to_ref(NULL),                    // memory reference to align vectors to
   _disjoint_ptrs(arena(), 8,  0, OrderedPair::initial), // runtime disambiguated pointer pairs
   _dg(_arena),                            // dependence graph
   _visited(arena()),                      // visited node set
   _post_visited(arena()),                 // post visited node set

@@ -70,10 +72,11 @@
   _race_possible(false),                  // cases where SDMU is true
   _early_return(true),                    // analysis evaluations routine
   _num_work_vecs(0),                      // amount of vector work we have
   _num_reductions(0),                     // amount of reduction work we have
   _do_vector_loop(phase->C->do_vector_loop()),  // whether to do vectorization/simd style
+  _do_reserve_copy(DoReserveCopyInSuperWord),
   _ii_first(-1),                          // first loop generation index - only if do_vector_loop()
   _ii_last(-1),                           // last loop generation index - only if do_vector_loop()
   _ii_order(arena(), 8, 0, 0)
 {
 #ifndef PRODUCT

@@ -96,11 +99,22 @@
   if (!cl->is_valid_counted_loop()) return; // skip malformed counted loop
 
   if (!cl->is_main_loop() ) return; // skip normal, pre, and post loops
   // Check for no control flow in body (other than exit)
   Node *cl_exit = cl->loopexit();
-  if (cl_exit->in(0) != lpt->_head) return;
+  if (cl_exit->in(0) != lpt->_head) {
+    #ifndef PRODUCT
+      if (TraceSuperWord) {
+        tty->print_cr("SuperWord::transform_loop: loop too complicated, cl_exit->in(0) != lpt->_head");
+        tty->print("cl_exit %d", cl_exit->_idx); cl_exit->dump();
+        tty->print("cl_exit->in(0) %d", cl_exit->in(0)->_idx); cl_exit->in(0)->dump();
+        tty->print("lpt->_head %d", lpt->_head->_idx); lpt->_head->dump();
+        lpt->dump_head();
+      }
+    #endif
+    return;
+  }
 
   // Make sure the are no extra control users of the loop backedge
   if (cl->back_control()->outcnt() != 1) {
     return;
   }

@@ -385,10 +399,14 @@
 
   combine_packs();
 
   construct_my_pack_map();
 
+  if (_do_vector_loop) {
+    merge_packs_to_cmovd();
+  }
+
   filter_packs();
 
   schedule();
 
   output();

@@ -1065,10 +1083,21 @@
   }
 }
 
 //------------------------------data_size---------------------------
 int SuperWord::data_size(Node* s) {
+  Node* use = NULL; //test if the node is a candidate for CMoveVD optimization, then return the size of CMov
+  if (_do_vector_loop) {
+    use = _cmovev_kit.is_Bool_candidate(s);
+    if (use != NULL) {
+      return data_size(use);
+    }
+    use = _cmovev_kit.is_CmpD_candidate(s);
+    if (use != NULL) {
+      return data_size(use);
+    }
+  }
   int bsize = type2aelembytes(velt_basic_type(s));
   assert(bsize != 0, "valid size");
   return bsize;
 }
 

@@ -1111,10 +1140,11 @@
   assert(alignment(s1) + data_size(s1) == alignment(s2), "just checking");
 
   if (s1->is_Load()) return false;
 
   int align = alignment(s1);
+  NOT_PRODUCT(if(is_trace_alignment()) tty->print_cr("SuperWord::follow_use_defs: s1 %d, align %d", s1->_idx, align);)
   bool changed = false;
   int start = s1->is_Store() ? MemNode::ValueIn   : 1;
   int end   = s1->is_Store() ? MemNode::ValueIn+1 : s1->req();
   for (int j = start; j < end; j++) {
     Node* t1 = s1->in(j);

@@ -1125,10 +1155,11 @@
       if (est_savings(t1, t2) >= 0) {
         Node_List* pair = new Node_List();
         pair->push(t1);
         pair->push(t2);
         _packset.append(pair);
+        NOT_PRODUCT(if(is_trace_alignment()) tty->print_cr("SuperWord::follow_use_defs: set_alignment(%d, %d, %d)", t1->_idx, t2->_idx, align);)
         set_alignment(t1, t2, align);
         changed = true;
       }
     }
   }

@@ -1146,10 +1177,11 @@
   assert(alignment(s1) + data_size(s1) == alignment(s2), "just checking");
 
   if (s1->is_Store()) return false;
 
   int align = alignment(s1);
+  NOT_PRODUCT(if(is_trace_alignment()) tty->print_cr("SuperWord::follow_def_uses: s1 %d, align %d", s1->_idx, align);)
   int savings = -1;
   int num_s1_uses = 0;
   Node* u1 = NULL;
   Node* u2 = NULL;
   for (DUIterator_Fast imax, i = s1->fast_outs(imax); i < imax; i++) {

@@ -1177,10 +1209,11 @@
   if (savings >= 0) {
     Node_List* pair = new Node_List();
     pair->push(u1);
     pair->push(u2);
     _packset.append(pair);
+    NOT_PRODUCT(if(is_trace_alignment()) tty->print_cr("SuperWord::follow_def_uses: set_alignment(%d, %d, %d)", u1->_idx, u2->_idx, align);)
     set_alignment(u1, u2, align);
     changed = true;
   }
   return changed;
 }

@@ -1452,10 +1485,200 @@
     tty->cr();
   }
 #endif
 }
 
+//------------------------------merge_packs_to_cmovd---------------------------
+// Merge CMoveD into new vector-nodes
+// We want to catch this pattern and subsume CmpD and Bool into CMoveD
+//
+//                   SubD             ConD
+//                  /  |               /
+//                 /   |           /   /
+//                /    |       /      /
+//               /     |   /         /
+//              /      /            /
+//             /    /  |           /
+//            v /      |          /
+//         CmpD        |         /
+//          |          |        /
+//          v          |       /
+//         Bool        |      /
+//           \         |     /
+//             \       |    /
+//               \     |   /
+//                 \   |  /
+//                   \ v /
+//                   CMoveD
+//
+
+void SuperWord::merge_packs_to_cmovd() {
+  for (int i = _packset.length() - 1; i >= 0; i--) {
+    _cmovev_kit.make_cmovevd_pack(_packset.at(i));
+  }
+  #ifndef PRODUCT
+    if (TraceSuperWord) {
+      tty->print_cr("\nSuperWord::merge_packs_to_cmovd(): After merge");
+      print_packset();
+      tty->cr();
+    }
+  #endif
+}
+
+Node* CMoveKit::is_Bool_candidate(Node* def) const {
+  Node* use = NULL;
+  if (!def->is_Bool() || def->in(0) != NULL || def->outcnt() != 1) {
+    return NULL;
+  }
+  for (DUIterator_Fast jmax, j = def->fast_outs(jmax); j < jmax; j++) {
+    use = def->fast_out(j);
+    if (!_sw->same_generation(def, use) || !use->is_CMove()) {
+      return NULL;
+    }
+  }
+  return use;
+}
+
+Node* CMoveKit::is_CmpD_candidate(Node* def) const {
+  Node* use = NULL;
+  if (!def->is_Cmp() || def->in(0) != NULL || def->outcnt() != 1) {
+    return NULL;
+  }
+  for (DUIterator_Fast jmax, j = def->fast_outs(jmax); j < jmax; j++) {
+    use = def->fast_out(j);
+    if (!_sw->same_generation(def, use) || (use = is_Bool_candidate(use)) == NULL || !_sw->same_generation(def, use)) {
+      return NULL;
+    }
+  }
+  return use;
+}
+
+Node_List* CMoveKit::make_cmovevd_pack(Node_List* cmovd_pk) {
+  Node *cmovd = cmovd_pk->at(0);
+  if (!cmovd->is_CMove()) {
+    return NULL;
+  }
+  if (pack(cmovd) != NULL) { // already in the cmov pack
+    return NULL;
+  }
+  if (cmovd->in(0) != NULL) {
+    NOT_PRODUCT(if(_sw->is_trace_cmov()) {tty->print("CMoveKit::make_cmovevd_pack: CMoveD %d has control flow, escaping...", cmovd->_idx); cmovd->dump();})
+    return NULL;
+  }
+
+  Node* bol = cmovd->as_CMove()->in(CMoveNode::Condition);
+  if (!bol->is_Bool()
+      || bol->outcnt() != 1
+      || !_sw->same_generation(bol, cmovd)
+      || bol->in(0) != NULL  // BoolNode has control flow!!
+      || _sw->my_pack(bol) == NULL) {
+      NOT_PRODUCT(if(_sw->is_trace_cmov()) {tty->print("CMoveKit::make_cmovevd_pack: Bool %d does not fit CMoveD %d for building vector, escaping...", bol->_idx, cmovd->_idx); bol->dump();})
+      return NULL;
+  }
+  Node_List* bool_pk = _sw->my_pack(bol);
+  if (bool_pk->size() != cmovd_pk->size() ) {
+    return NULL;
+  }
+
+  Node* cmpd = bol->in(1);
+  if (!cmpd->is_Cmp()
+      || cmpd->outcnt() != 1
+      || !_sw->same_generation(cmpd, cmovd)
+      || cmpd->in(0) != NULL  // CmpDNode has control flow!!
+      || _sw->my_pack(cmpd) == NULL) {
+      NOT_PRODUCT(if(_sw->is_trace_cmov()) {tty->print("CMoveKit::make_cmovevd_pack: CmpD %d does not fit CMoveD %d for building vector, escaping...", cmpd->_idx, cmovd->_idx); cmpd->dump();})
+      return NULL;
+  }
+  Node_List* cmpd_pk = _sw->my_pack(cmpd);
+  if (cmpd_pk->size() != cmovd_pk->size() ) {
+    return NULL;
+  }
+
+  if (!test_cmpd_pack(cmpd_pk, cmovd_pk)) {
+    NOT_PRODUCT(if(_sw->is_trace_cmov()) {tty->print("CMoveKit::make_cmovevd_pack: cmpd pack for CmpD %d failed vectorization test", cmpd->_idx); cmpd->dump();})
+    return NULL;
+  }
+
+  Node_List* new_cmpd_pk = new Node_List();
+  uint sz = cmovd_pk->size() - 1;
+  for (uint i = 0; i <= sz; ++i) {
+    Node* cmov = cmovd_pk->at(i);
+    Node* bol  = bool_pk->at(i);
+    Node* cmp  = cmpd_pk->at(i);
+
+    new_cmpd_pk->insert(i, cmov);
+
+    map(cmov, new_cmpd_pk);
+    map(bol, new_cmpd_pk);
+    map(cmp, new_cmpd_pk);
+
+    _sw->set_my_pack(cmov, new_cmpd_pk); // and keep old packs for cmp and bool
+  }
+  _sw->_packset.remove(cmovd_pk);
+  _sw->_packset.remove(bool_pk);
+  _sw->_packset.remove(cmpd_pk);
+  _sw->_packset.append(new_cmpd_pk);
+  NOT_PRODUCT(if(_sw->is_trace_cmov()) {tty->print_cr("CMoveKit::make_cmovevd_pack: added syntactic CMoveD pack"); _sw->print_pack(new_cmpd_pk);})
+  return new_cmpd_pk;
+}
+
+bool CMoveKit::test_cmpd_pack(Node_List* cmpd_pk, Node_List* cmovd_pk) {
+  Node* cmpd0 = cmpd_pk->at(0);
+  assert(cmpd0->is_Cmp(), "CMoveKit::test_cmpd_pack: should be CmpDNode");
+  assert(cmovd_pk->at(0)->is_CMove(), "CMoveKit::test_cmpd_pack: should be CMoveD");
+  assert(cmpd_pk->size() == cmovd_pk->size(), "CMoveKit::test_cmpd_pack: should be same size");
+  Node* in1 = cmpd0->in(1);
+  Node* in2 = cmpd0->in(2);
+  Node_List* in1_pk = _sw->my_pack(in1);
+  Node_List* in2_pk = _sw->my_pack(in2);
+
+  if (in1_pk != NULL && in1_pk->size() != cmpd_pk->size()
+    || in2_pk != NULL && in2_pk->size() != cmpd_pk->size() ) {
+    return false;
+  }
+
+  // test if "all" in1 are in the same pack or the same node
+  if (in1_pk == NULL) {
+    for (uint j = 1; j < cmpd_pk->size(); j++) {
+      if (cmpd_pk->at(j)->in(1) != in1) {
+        return false;
+      }
+    }//for: in1_pk is not pack but all CmpD nodes in the pack have the same in(1)
+  }
+  // test if "all" in2 are in the same pack or the same node
+  if (in2_pk == NULL) {
+    for (uint j = 1; j < cmpd_pk->size(); j++) {
+      if (cmpd_pk->at(j)->in(2) != in2) {
+        return false;
+      }
+    }//for: in2_pk is not pack but all CmpD nodes in the pack have the same in(2)
+  }
+  //now check if cmpd_pk may be subsumed in vector built for cmovd_pk
+  int cmovd_ind1, cmovd_ind2;
+  if (cmpd_pk->at(0)->in(1) == cmovd_pk->at(0)->as_CMove()->in(CMoveNode::IfFalse)
+   && cmpd_pk->at(0)->in(2) == cmovd_pk->at(0)->as_CMove()->in(CMoveNode::IfTrue)) {
+      cmovd_ind1 = CMoveNode::IfFalse;
+      cmovd_ind2 = CMoveNode::IfTrue;
+  } else if (cmpd_pk->at(0)->in(2) == cmovd_pk->at(0)->as_CMove()->in(CMoveNode::IfFalse)
+          && cmpd_pk->at(0)->in(1) == cmovd_pk->at(0)->as_CMove()->in(CMoveNode::IfTrue)) {
+      cmovd_ind2 = CMoveNode::IfFalse;
+      cmovd_ind1 = CMoveNode::IfTrue;
+  }
+  else {
+    return false;
+  }
+
+  for (uint j = 1; j < cmpd_pk->size(); j++) {
+    if (cmpd_pk->at(j)->in(1) != cmovd_pk->at(j)->as_CMove()->in(cmovd_ind1)
+        || cmpd_pk->at(j)->in(2) != cmovd_pk->at(j)->as_CMove()->in(cmovd_ind2)) {
+        return false;
+    }//if
+  }
+  NOT_PRODUCT(if(_sw->is_trace_cmov()) { tty->print("CMoveKit::test_cmpd_pack: cmpd pack for 1st CmpD %d is OK for vectorization: ", cmpd0->_idx); cmpd0->dump(); })
+  return true;
+}
+
 //------------------------------implemented---------------------------
 // Can code be generated for pack p?
 bool SuperWord::implemented(Node_List* p) {
   bool retValue = false;
   Node* p0 = p->at(0);

@@ -1471,26 +1694,36 @@
         retValue = ReductionNode::implemented(opc, size, arith_type->basic_type());
       }
     } else {
       retValue = VectorNode::implemented(opc, size, velt_basic_type(p0));
     }
+    if (!retValue) {
+      if (is_cmov_pack(p)) {
+        NOT_PRODUCT(if(is_trace_cmov()) {tty->print_cr("SWPointer::implemented: found cmpd pack"); print_pack(p);})
+        return true;
+      }
+    }
   }
   return retValue;
 }
 
+bool SuperWord::is_cmov_pack(Node_List* p) {
+  return _cmovev_kit.pack(p->at(0)) != NULL;
+}
 //------------------------------same_inputs--------------------------
 // For pack p, are all idx operands the same?
-static bool same_inputs(Node_List* p, int idx) {
+bool SuperWord::same_inputs(Node_List* p, int idx) {
   Node* p0 = p->at(0);
   uint vlen = p->size();
   Node* p0_def = p0->in(idx);
   for (uint i = 1; i < vlen; i++) {
     Node* pi = p->at(i);
     Node* pi_def = pi->in(idx);
-    if (p0_def != pi_def)
+    if (p0_def != pi_def) {
       return false;
   }
+  }
   return true;
 }
 
 //------------------------------profitable---------------------------
 // For pack p, are all operands and all uses (with in the block) vector?

@@ -1503,13 +1736,14 @@
   // size or alignment.
   // Also, for now, return false if not scalar promotion case when inputs are
   // the same. Later, implement PackNode and allow differing, non-vector inputs
   // (maybe just the ones from outside the block.)
   for (uint i = start; i < end; i++) {
-    if (!is_vector_use(p0, i))
+    if (!is_vector_use(p0, i)) {
       return false;
   }
+  }
   // Check if reductions are connected
   if (p0->is_reduction()) {
     Node* second_in = p0->in(2);
     Node_List* second_pk = my_pack(second_in);
     if ((second_pk == NULL) || (_num_work_vecs == _num_reductions)) {

@@ -1535,10 +1769,13 @@
     // For now, return false if not all uses are vector.
     // Later, implement ExtractNode and allow non-vector uses (maybe
     // just the ones outside the block.)
     for (uint i = 0; i < p->size(); i++) {
       Node* def = p->at(i);
+      if (is_cmov_pack_internal_node(p, def)) {
+        continue;
+      }
       for (DUIterator_Fast jmax, j = def->fast_outs(jmax); j < jmax; j++) {
         Node* use = def->fast_out(j);
         for (uint k = 0; k < use->req(); k++) {
           Node* n = use->in(k);
           if (def == n) {

@@ -1761,18 +1998,34 @@
       _igvn.replace_input_of(ld, MemNode::Memory, mem_input);
     }
   }
 }
 
+#ifndef PRODUCT
+void SuperWord::print_loop(bool whole) {
+  Node_Stack stack(_arena, _phase->C->unique() >> 2);
+  Node_List rpo_list;
+  VectorSet visited(_arena);
+  visited.set(lpt()->_head->_idx);
+  _phase->rpo(lpt()->_head, stack, visited, rpo_list);
+  _phase->dump(lpt(), rpo_list.size(), rpo_list );
+  if(whole) {
+    tty->print_cr("\n Whole loop tree");
+    _phase->dump();
+    tty->print_cr(" End of whole loop tree\n");
+  }
+}
+#endif
+
 //------------------------------output---------------------------
 // Convert packs into vector node operations
 void SuperWord::output() {
   if (_packset.length() == 0) return;
 
 #ifndef PRODUCT
   if (TraceLoopOpts) {
-    tty->print("SuperWord    ");
+    tty->print("SuperWord::output    ");
     lpt()->dump_head();
   }
 #endif
 
   // MUST ENSURE main loop's initial value is properly aligned:

@@ -1787,19 +2040,32 @@
 
   Compile* C = _phase->C;
   CountedLoopNode *cl = lpt()->_head->as_CountedLoop();
   uint max_vlen_in_bytes = 0;
   uint max_vlen = 0;
+
+  NOT_PRODUCT(if(is_trace_loop_reverse()) {tty->print_cr("SWPointer::output: print loop before create_reserve_version_of_loop"); print_loop(true);})
+
+  CountedLoopReserveKit make_reversable(_phase, _lpt, do_reserve_copy());
+
+  NOT_PRODUCT(if(is_trace_loop_reverse()) {tty->print_cr("SWPointer::output: print loop after create_reserve_version_of_loop"); print_loop(true);})
+
+  if (do_reserve_copy() && !make_reversable.has_reserved()) {
+    NOT_PRODUCT(if(is_trace_loop_reverse() || TraceLoopOpts) {tty->print_cr("SWPointer::output: loop was not reserved correctly, exiting SuperWord");})
+    return;
+  }
+
   for (int i = 0; i < _block.length(); i++) {
     Node* n = _block.at(i);
     Node_List* p = my_pack(n);
     if (p && n == executed_last(p)) {
       uint vlen = p->size();
       uint vlen_in_bytes = 0;
       Node* vn = NULL;
       Node* low_adr = p->at(0);
       Node* first   = executed_first(p);
+      NOT_PRODUCT(if(is_trace_cmov()) {tty->print_cr("SWPointer::output: %d executed first, %d executed last in pack", first->_idx, n->_idx); print_pack(p);})
       int   opc = n->Opcode();
       if (n->is_Load()) {
         Node* ctl = n->in(MemNode::Control);
         Node* mem = first->in(MemNode::Memory);
         SWPointer p1(n->as_Mem(), this, NULL, false);

@@ -1821,27 +2087,49 @@
         vn = LoadVectorNode::make(opc, ctl, mem, adr, atyp, vlen, velt_basic_type(n), control_dependency(p));
         vlen_in_bytes = vn->as_LoadVector()->memory_size();
       } else if (n->is_Store()) {
         // Promote value to be stored to vector
         Node* val = vector_opd(p, MemNode::ValueIn);
+        if (val == NULL) {
+          if (do_reserve_copy()) {
+            NOT_PRODUCT(if(is_trace_loop_reverse() || TraceLoopOpts) {tty->print_cr("SWPointer::output: val should not be NULL, exiting SuperWord");})
+            return; //and reverse to backup IG
+          }
+          ShouldNotReachHere();
+        }
+
         Node* ctl = n->in(MemNode::Control);
         Node* mem = first->in(MemNode::Memory);
         Node* adr = low_adr->in(MemNode::Address);
         const TypePtr* atyp = n->adr_type();
         vn = StoreVectorNode::make(opc, ctl, mem, adr, atyp, val, vlen);
         vlen_in_bytes = vn->as_StoreVector()->memory_size();
-      } else if (n->req() == 3) {
+      } else if (n->req() == 3 && !is_cmov_pack(p)) {
         // Promote operands to vector
         Node* in1 = NULL;
         bool node_isa_reduction = n->is_reduction();
         if (node_isa_reduction) {
           // the input to the first reduction operation is retained
           in1 = low_adr->in(1);
         } else {
           in1 = vector_opd(p, 1);
+          if (in1 == NULL) {
+            if (do_reserve_copy()) {
+              NOT_PRODUCT(if(is_trace_loop_reverse() || TraceLoopOpts) {tty->print_cr("SWPointer::output: in1 should not be NULL, exiting SuperWord");})
+              return; //and reverse to backup IG
+            }
+            ShouldNotReachHere();
+          }
         }
         Node* in2 = vector_opd(p, 2);
+        if (in2 == NULL) {
+          if (do_reserve_copy()) {
+            NOT_PRODUCT(if(is_trace_loop_reverse() || TraceLoopOpts) {tty->print_cr("SWPointer::output: in2 should not be NULL, exiting SuperWord");})
+            return; //and reverse to backup IG
+          }
+          ShouldNotReachHere();
+        }
         if (VectorNode::is_invariant_vector(in1) && (node_isa_reduction == false) && (n->is_Add() || n->is_Mul())) {
           // Move invariant vector input into second position to avoid register spilling.
           Node* tmp = in1;
           in1 = in2;
           in2 = tmp;

@@ -1861,14 +2149,75 @@
       } else if (opc == Op_SqrtD || opc == Op_AbsF || opc == Op_AbsD || opc == Op_NegF || opc == Op_NegD) {
         // Promote operand to vector (Sqrt/Abs/Neg are 2 address instructions)
         Node* in = vector_opd(p, 1);
         vn = VectorNode::make(opc, in, NULL, vlen, velt_basic_type(n));
         vlen_in_bytes = vn->as_Vector()->length_in_bytes();
-      } else {
+      } else if (is_cmov_pack(p)) {
+        if (!n->is_CMove()) {
+          continue;
+        }
+        // place here CMoveVDNode
+        NOT_PRODUCT(if(is_trace_cmov()) {tty->print_cr("SWPointer::output: print before CMove vectorization"); print_loop(false);})
+        Node* bol = n->in(CMoveNode::Condition);
+        if (!bol->is_Bool() && bol->Opcode() == Op_ExtractI && bol->req() > 1 ) {
+          NOT_PRODUCT(if(is_trace_cmov()) {tty->print_cr("SWPointer::output: %d is not Bool node, trying its in(1) node %d", bol->_idx, bol->in(1)->_idx); bol->dump(); bol->in(1)->dump();})
+          bol = bol->in(1); //may be ExtractNode
+        }
+
+        assert(bol->is_Bool(), "should be BoolNode - too late to bail out!");
+        if (!bol->is_Bool()) {
+          if (do_reserve_copy()) {
+            NOT_PRODUCT(if(is_trace_loop_reverse() || TraceLoopOpts) {tty->print_cr("SWPointer::output: expected %d bool node, exiting SuperWord", bol->_idx); bol->dump();})
+            return; //and reverse to backup IG
+          }
+          ShouldNotReachHere();
+        }
+
+        int cond = (int)bol->as_Bool()->_test._test;
+        Node* in_cc  = _igvn.intcon(cond);
+        NOT_PRODUCT(if(is_trace_cmov()) {tty->print("SWPointer::output: created intcon in_cc node %d", in_cc->_idx); in_cc->dump();})
+        Node* cc = bol->clone();
+        cc->set_req(1, in_cc);
+        NOT_PRODUCT(if(is_trace_cmov()) {tty->print("SWPointer::output: created bool cc node %d", cc->_idx); cc->dump();})
+
+        Node* src1 = vector_opd(p, 2); //2=CMoveNode::IfFalse
+        if (src1 == NULL) {
+          if (do_reserve_copy()) {
+            NOT_PRODUCT(if(is_trace_loop_reverse() || TraceLoopOpts) {tty->print_cr("SWPointer::output: src1 should not be NULL, exiting SuperWord");})
+            return; //and reverse to backup IG
+          }
         ShouldNotReachHere();
       }
+        Node* src2 = vector_opd(p, 3); //3=CMoveNode::IfTrue
+        if (src2 == NULL) {
+          if (do_reserve_copy()) {
+            NOT_PRODUCT(if(is_trace_loop_reverse() || TraceLoopOpts) {tty->print_cr("SWPointer::output: src2 should not be NULL, exiting SuperWord");})
+            return; //and reverse to backup IG
+          }
+          ShouldNotReachHere();
+        }
+        BasicType bt = velt_basic_type(n);
+        const TypeVect* vt = TypeVect::make(bt, vlen);
+        vn = new CMoveVDNode(cc, src1, src2, vt);
+        NOT_PRODUCT(if(is_trace_cmov()) {tty->print("SWPointer::output: created new CMove node %d: ", vn->_idx); vn->dump();})
+      } else {
+        if (do_reserve_copy()) {
+          NOT_PRODUCT(if(is_trace_loop_reverse() || TraceLoopOpts) {tty->print_cr("SWPointer::output: ShouldNotReachHere, exiting SuperWord");})
+          return; //and reverse to backup IG
+        }
+        ShouldNotReachHere();
+      }
+
       assert(vn != NULL, "sanity");
+      if (vn == NULL) {
+        if (do_reserve_copy()){
+          NOT_PRODUCT(if(is_trace_loop_reverse() || TraceLoopOpts) {tty->print_cr("SWPointer::output: got NULL node, cannot proceed, exiting SuperWord");})
+          return; //and reverse to backup IG
+        }
+        ShouldNotReachHere();
+      }
+
       _igvn.register_new_node_with_optimizer(vn);
       _phase->set_ctrl(vn, _phase->get_ctrl(p->at(0)));
       for (uint j = 0; j < p->size(); j++) {
         Node* pm = p->at(j);
         _igvn.replace_node(pm, vn);

@@ -1884,12 +2233,14 @@
         tty->print("new Vector node: ");
         vn->dump();
       }
 #endif
     }
-  }
+  }//for (int i = 0; i < _block.length(); i++)
+
   C->set_max_vector_size(max_vlen_in_bytes);
+
   if (SuperWordLoopUnrollAnalysis) {
     if (cl->has_passed_slp()) {
       uint slp_max_unroll_factor = cl->slp_max_unroll();
       if (slp_max_unroll_factor == max_vlen) {
         NOT_PRODUCT(if (TraceSuperWordLoopUnrollAnalysis) tty->print_cr("vector loop(unroll=%d, len=%d)\n", max_vlen, max_vlen_in_bytes*BitsPerByte));

@@ -1898,10 +2249,16 @@
         C->set_major_progress();
         cl->mark_do_unroll_only();
       }
     }
   }
+
+  if (do_reserve_copy()) {
+    make_reversable.use_new();
+  }
+  NOT_PRODUCT(if(is_trace_loop_reverse()) {tty->print_cr("\n Final loop after SuperWord"); print_loop(true);})
+  return;
 }
 
 //------------------------------vector_opd---------------------------
 // Create a vector operand for the nodes in pack p for operand: in(opd_idx)
 Node* SuperWord::vector_opd(Node_List* p, int opd_idx) {

@@ -1910,10 +2267,14 @@
   Node* opd = p0->in(opd_idx);
 
   if (same_inputs(p, opd_idx)) {
     if (opd->is_Vector() || opd->is_LoadVector()) {
       assert(((opd_idx != 2) || !VectorNode::is_shift(p0)), "shift's count can't be vector");
+      if (opd_idx == 2 && VectorNode::is_shift(p0)) {
+        NOT_PRODUCT(if(is_trace_loop_reverse() || TraceLoopOpts) {tty->print_cr("shift's count can't be vector");})
+        return NULL;
+      }
       return opd; // input is matching vector
     }
     if ((opd_idx == 2) && VectorNode::is_shift(p0)) {
       Compile* C = _phase->C;
       Node* cnt = opd;

@@ -1932,20 +2293,28 @@
           cnt = new AndINode(opd, cnt);
           _igvn.register_new_node_with_optimizer(cnt);
           _phase->set_ctrl(cnt, _phase->get_ctrl(opd));
         }
         assert(opd->bottom_type()->isa_int(), "int type only");
+        if (!opd->bottom_type()->isa_int()) {
+          NOT_PRODUCT(if(is_trace_loop_reverse() || TraceLoopOpts) {tty->print_cr("Should be int type only");})
+          return NULL;
+        }
         // Move non constant shift count into vector register.
         cnt = VectorNode::shift_count(p0, cnt, vlen, velt_basic_type(p0));
       }
       if (cnt != opd) {
         _igvn.register_new_node_with_optimizer(cnt);
         _phase->set_ctrl(cnt, _phase->get_ctrl(opd));
       }
       return cnt;
     }
     assert(!opd->is_StoreVector(), "such vector is not expected here");
+    if (opd->is_StoreVector()) {
+      NOT_PRODUCT(if(is_trace_loop_reverse() || TraceLoopOpts) {tty->print_cr("StoreVector is not expected here");})
+      return NULL;
+    }
     // Convert scalar input to vector with the same number of elements as
     // p0's vector. Use p0's type because size of operand's container in
     // vector should match p0's size regardless operand's size.
     const Type* p0_t = velt_type(p0);
     VectorNode* vn = VectorNode::scalar2vector(opd, vlen, p0_t);

@@ -1968,10 +2337,14 @@
 
   for (uint i = 1; i < vlen; i++) {
     Node* pi = p->at(i);
     Node* in = pi->in(opd_idx);
     assert(my_pack(in) == NULL, "Should already have been unpacked");
+    if (my_pack(in) != NULL) {
+      NOT_PRODUCT(if(is_trace_loop_reverse() || TraceLoopOpts) {tty->print_cr("Should already have been unpacked");})
+      return NULL;
+    }
     assert(opd_bt == in->bottom_type()->basic_type(), "all same type");
     pk->add_opd(in);
   }
   _igvn.register_new_node_with_optimizer(pk);
   _phase->set_ctrl(pk, _phase->get_ctrl(opd));

@@ -1999,11 +2372,12 @@
     for (DUIterator_Fast jmax, j = def->fast_outs(jmax); j < jmax; j++) {
       Node* use = def->fast_out(j);
       for (uint k = 0; k < use->req(); k++) {
         Node* n = use->in(k);
         if (def == n) {
-          if (!is_vector_use(use, k)) {
+          Node_List* u_pk = my_pack(use);
+          if ((u_pk == NULL || !is_cmov_pack(u_pk) || use->is_CMove()) && !is_vector_use(use, k)) {
             _n_idx_list.push(use, k);
           }
         }
       }
     }
< prev index next >