NPAC Technical Report SCCS-759

Concatenated Parallelism: A Technique for Efficient Parallel Divide and Conquer

Srinivas Aluru, Sanjay Goil, Sanjay Ranka

Submitted March 26 1996

Abstract

A number of problems have efficient algorithms that are based on the divide and conquer paradigm. Such problems can be solved in parallel by mapping the corresponding divide and conquer tree to the parallel computer under consideration. Two basic strategies are used in such parallelizations: {\em Task Parallelism}, in which different subproblems are assigned to different groups of processors and {\em Data Parallelism}, in which the tasks are solved one after the other using all the processors. Task parallelism involves significant data movement and data parallelism causes problems due to load imbalance. In this paper we propose a new strategy, which we call {\em Concatenated Parallelism}, for efficient parallel solution of problems resulting in divide and conquer trees. Our strategy is useful when the communication time due to data movement in distributing the subproblems using task parallelism is significant when compared to the time required to divide the subproblems. This happens to be the case for a number of important applications including quicksort, quickhull and the construction of quadtrees, octrees and multidimensional binary search trees. We consider both balanced and unbalanced deterministic divide and conquer trees as well as randomized divide and conquer trees. We provide both theoretical and experimental analysis of Concatenated Parallelism for coarse-grained distributed memory parallel machines along with comparisons with task and Data Parallelism. We have implemented our algorithms on the CM-5 and report on the experimental results. One useful application of our technique is a scalable parallel quicksort algorithm.