172 * that may be stolen by other workers. Preference rules give
173 * first priority to processing tasks from their own queues (LIFO
174 * or FIFO, depending on mode), then to randomized FIFO steals of
175 * tasks in other queues.
176 *
177 * WorkQueues
178 * ==========
179 *
180 * Most operations occur within work-stealing queues (in nested
181 * class WorkQueue). These are special forms of Deques that
182 * support only three of the four possible end-operations -- push,
183 * pop, and poll (aka steal), under the further constraints that
184 * push and pop are called only from the owning thread (or, as
185 * extended here, under a lock), while poll may be called from
186 * other threads. (If you are unfamiliar with them, you probably
187 * want to read Herlihy and Shavit's book "The Art of
188 * Multiprocessor programming", chapter 16 describing these in
189 * more detail before proceeding.) The main work-stealing queue
190 * design is roughly similar to those in the papers "Dynamic
191 * Circular Work-Stealing Deque" by Chase and Lev, SPAA 2005
192 * (http://research.sun.com/scalable/pubs/index.html) and
193 * "Idempotent work stealing" by Michael, Saraswat, and Vechev,
194 * PPoPP 2009 (http://portal.acm.org/citation.cfm?id=1504186).
195 * See also "Correct and Efficient Work-Stealing for Weak Memory
196 * Models" by Le, Pop, Cohen, and Nardelli, PPoPP 2013
197 * (http://www.di.ens.fr/~zappa/readings/ppopp13.pdf) for an
198 * analysis of memory ordering (atomic, volatile etc) issues. The
199 * main differences ultimately stem from GC requirements that we
200 * null out taken slots as soon as we can, to maintain as small a
201 * footprint as possible even in programs generating huge numbers
202 * of tasks. To accomplish this, we shift the CAS arbitrating pop
203 * vs poll (steal) from being on the indices ("base" and "top") to
204 * the slots themselves. So, both a successful pop and poll
205 * mainly entail a CAS of a slot from non-null to null. Because
206 * we rely on CASes of references, we do not need tag bits on base
207 * or top. They are simple ints as used in any circular
208 * array-based queue (see for example ArrayDeque). Updates to the
209 * indices must still be ordered in a way that guarantees that top
210 * == base means the queue is empty, but otherwise may err on the
211 * side of possibly making the queue appear nonempty when a push,
212 * pop, or poll have not fully committed. Note that this means
|
172 * that may be stolen by other workers. Preference rules give
173 * first priority to processing tasks from their own queues (LIFO
174 * or FIFO, depending on mode), then to randomized FIFO steals of
175 * tasks in other queues.
176 *
177 * WorkQueues
178 * ==========
179 *
180 * Most operations occur within work-stealing queues (in nested
181 * class WorkQueue). These are special forms of Deques that
182 * support only three of the four possible end-operations -- push,
183 * pop, and poll (aka steal), under the further constraints that
184 * push and pop are called only from the owning thread (or, as
185 * extended here, under a lock), while poll may be called from
186 * other threads. (If you are unfamiliar with them, you probably
187 * want to read Herlihy and Shavit's book "The Art of
188 * Multiprocessor programming", chapter 16 describing these in
189 * more detail before proceeding.) The main work-stealing queue
190 * design is roughly similar to those in the papers "Dynamic
191 * Circular Work-Stealing Deque" by Chase and Lev, SPAA 2005
192 * (http://labs.oracle.com/pls/apex/f?p=labs:10:0) and
193 * "Idempotent work stealing" by Michael, Saraswat, and Vechev,
194 * PPoPP 2009 (http://portal.acm.org/citation.cfm?id=1504186).
195 * See also "Correct and Efficient Work-Stealing for Weak Memory
196 * Models" by Le, Pop, Cohen, and Nardelli, PPoPP 2013
197 * (http://www.di.ens.fr/~zappa/readings/ppopp13.pdf) for an
198 * analysis of memory ordering (atomic, volatile etc) issues. The
199 * main differences ultimately stem from GC requirements that we
200 * null out taken slots as soon as we can, to maintain as small a
201 * footprint as possible even in programs generating huge numbers
202 * of tasks. To accomplish this, we shift the CAS arbitrating pop
203 * vs poll (steal) from being on the indices ("base" and "top") to
204 * the slots themselves. So, both a successful pop and poll
205 * mainly entail a CAS of a slot from non-null to null. Because
206 * we rely on CASes of references, we do not need tag bits on base
207 * or top. They are simple ints as used in any circular
208 * array-based queue (see for example ArrayDeque). Updates to the
209 * indices must still be ordered in a way that guarantees that top
210 * == base means the queue is empty, but otherwise may err on the
211 * side of possibly making the queue appear nonempty when a push,
212 * pop, or poll have not fully committed. Note that this means
|