--- old/src/share/vm/utilities/workgroup.hpp	2015-04-01 12:39:51.519913547 +0200
+++ new/src/share/vm/utilities/workgroup.hpp	2015-04-01 12:39:51.407913552 +0200
@@ -302,6 +302,57 @@
 // serialized to give each worker a unique "part".  Workers that
 // are not needed for this tasks (i.e., "_active_workers" have
 // been started before it, continue to wait for work.
+//
+// Note on use of FlexibleWorkGang's for GC.
+// There are three places where task completion is determined.
+// In
+//    1) ParallelTaskTerminator::offer_termination() where _n_threads
+//    must be set to the correct value so that count of workers that
+//    have offered termination will exactly match the number
+//    working on the task.  Tasks such as those derived from GCTask
+//    use ParallelTaskTerminator's.  Tasks that want load balancing
+//    by work stealing use this method to gauge completion.
+//    2) SubTasksDone has a variable _n_threads that is used in
+//    all_tasks_completed() to determine completion.  all_tasks_complete()
+//    counts the number of tasks that have been done and then reset
+//    the SubTasksDone so that it can be used again.  When the number of
+//    tasks is set to the number of GC workers, then _n_threads must
+//    be set to the number of active GC workers. G1RootProcessor and
+//    GenCollectedHeap have SubTasksDone.
+//    3) SequentialSubTasksDone has an _n_threads that is used in
+//    a way similar to SubTasksDone and has the same dependency on the
+//    number of active GC workers.  CompactibleFreeListSpace and Space
+//    have SequentialSubTasksDone's.
+//
+// Examples of using SubTasksDone and SequentialSubTasksDone:
+//  G1RootProcessor and GenCollectedHeap::process_roots() use
+//  SubTasksDone* _process_strong_tasks to claim tasks for workers
+//
+//  GenCollectedHeap::gen_process_roots() calls
+//      rem_set()->younger_refs_iterate()
+//  to scan the card table and which eventually calls down into
+//  CardTableModRefBS::par_non_clean_card_iterate_work().  This method
+//  uses SequentialSubTasksDone* _pst to claim tasks.
+//  Both SubTasksDone and SequentialSubTasksDone call their method
+//  all_tasks_completed() to count the number of GC workers that have
+//  finished their work.  That logic is "when all the workers are
+//  finished the tasks are finished".
+//
+//  The pattern that appears  in the code is to set _n_threads
+//  to a value > 1 before a task that you would like executed in parallel
+//  and then to set it to 0 after that task has completed.  A value of
+//  0 is a "special" value in set_n_threads() which translates to
+//  setting _n_threads to 1.
+//
+//  Some code uses _n_termination to decide if work should be done in
+//  parallel.  The notorious possibly_parallel_oops_do() in threads.cpp
+//  is an example of such code.  Look for variable "is_par" for other
+//  examples.
+//
+//  The active_workers is not reset to 0 after a parallel phase.  It's
+//  value may be used in later phases and in one instance at least
+//  (the parallel remark) it has to be used (the parallel remark depends
+//  on the partitioning done in the previous parallel scavenge).
 
 class FlexibleWorkGang: public WorkGang {
   // The currently active workers in this gang.