1 |
Pivot selection requires a 1-D reduction
-
Distribute rows Þ parallel, with communication
-
Distribute columns Þ sequential, but no communication
|
2 |
Element updates require the old value and elements from the pivot row and column
-
Distribute rows Þ parallel, but broadcast the pivot row
-
Distribute columns Þ parallel, but broadcast the pivot column
|
3 |
Each stage works on a smaller contiguous region of the array
-
BLOCK Þ processors drop out of the computation
-
CYCLIC Þ work stays (fairly) evenly distributed until the end
|
4 |
The bottom line
-
(*,CYCLIC) if broadcast > pivoting one column
-
(CYCLIC,*) if broadcast < one column, synchronous comm.
-
(CYCLIC,CYCLIC) if broadcast < one col., overlapped comm.
|