NPAC Technical Report SCCS-759
Concatenated Parallelism: A Technique for Efficient Parallel Divide and Conquer
Srinivas Aluru, Sanjay Goil, Sanjay Ranka
Submitted March 26 1996
Abstract
A number of problems have efficient algorithms that are based on the divide
and conquer paradigm. Such problems can be solved in parallel by mapping
the corresponding divide and conquer tree to the parallel computer under
consideration. Two basic strategies are used in such parallelizations:
{\em Task Parallelism}, in which different subproblems are assigned to
different groups of processors and {\em Data Parallelism}, in which the
tasks are solved one after the other using all the processors. Task
parallelism involves significant data movement and data parallelism causes
problems due to load imbalance.
In this paper we propose a new strategy, which we call {\em Concatenated
Parallelism}, for efficient parallel solution of problems resulting in
divide and conquer trees. Our strategy is useful when the communication
time due to data movement in distributing the subproblems using task
parallelism is significant when compared to the time required to divide the
subproblems. This happens to be the case for a number of important
applications including quicksort, quickhull and the construction of
quadtrees, octrees and multidimensional binary search trees. We consider
both balanced and unbalanced deterministic divide and conquer trees as well
as randomized divide and conquer trees. We provide both theoretical and
experimental analysis of Concatenated Parallelism for coarse-grained distributed
memory parallel machines along with comparisons with task and Data Parallelism.
We have implemented our algorithms on the CM-5 and report on the experimental results.
One useful application of our technique is a scalable parallel quicksort
algorithm.