Full HTML for

Basic foilset Parallel Computation Illustrated with Adaptive Integration

Given by Nancy J McCracken at CPS615 Basic Simulation Track for Computational Science on Fall Semester 95. Foils prepared 22 February 98
Outside Index Summary of Material


This talk assumes the material found in the CPS615 lecture notes, Introduction to Numerical Integration.
Going on from the static data decomposition for integration on page 27 of those notes, we examine further parallel algorithms for computing adaptive Romberg (recursive Simpson) integration.
The data decomposition and remapping techniques discussed are generally applicable to parallel adaptive algorithms for other applications.

Table of Contents for full HTML of Parallel Computation Illustrated with Adaptive Integration

Denote Foils where Image Critical
Denote Foils where HTML is sufficient

1 Computing Techniques for Adaptive Integration
2 Computing Techniques for Adaptive Integration
3 Sequential Romberg Integration
4 Program for Romberg Integration
Parallel Romberg Integration

5 Parallel Romberg Integration, Continued
6 Adaptive Parallel Romberg - CommAll Version
At each iteration, data is decomposed again by all processors.

7 Adaptive Parallel Romberg - program initialization
Adaptive Parallel Romberg - compute integral iteration and communicate values to all processors
Adaptive Parallel Romberg - recompute decomposition
Adaptive Parallel Romberg - Neighbors Version
At each iteration, processors average load with neighbors

8 More discussion of Neighbors version

Outside Index Summary of Material



HTML version of Basic Foils prepared 22 February 98

Foil 1 Computing Techniques for Adaptive Integration

From Parallel Computation Illustrated with Adaptive Integration CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
CPS615
November 1, 1995
Nancy McCracken
Geoffrey Fox
NPAC
Syracuse University
111 College Place
Syracuse NY 13244-4100

HTML version of Basic Foils prepared 22 February 98

Foil 2 Computing Techniques for Adaptive Integration

From Parallel Computation Illustrated with Adaptive Integration CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
This talk assumes the material found in the CPS615 lecture notes, Introduction to Numerical Integration.
Going on from the static data decomposition for integration on page 27 of those notes, we examine further parallel algorithms for computing adaptive Romberg (recursive Simpson) integration.
The data decomposition and remapping techniques discussed are generally applicable to parallel adaptive algorithms for other applications.

HTML version of Basic Foils prepared 22 February 98

Foil 3 Sequential Romberg Integration

From Parallel Computation Illustrated with Adaptive Integration CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Recall that when you repeatedly apply Simpson's rule, halving the interval each time, you can use old values of the function. Furthermore, there is a convenient pattern to the Simpson coefficients.
In our program, we start with an initial set of Xi's for which we calculate f values and save them in an array oldf. Then for each iteration of Simpson's rule, we collect the new values in an array newf. The result uses 2 * oldf and 4*newf (with exceptions for the endpoints).

HTML version of Basic Foils prepared 22 February 98

Foil 4 Program for Romberg Integration
Parallel Romberg Integration

From Parallel Computation Illustrated with Adaptive Integration CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
maintains same static data decomposition as Simpson's rule - see page 27 of Numerical Integration notes.
program sketched using MPI, could also be done in HPF

HTML version of Basic Foils prepared 22 February 98

Foil 5 Parallel Romberg Integration, Continued

From Parallel Computation Illustrated with Adaptive Integration CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
all processors calculate new overall sum and decide whether to continue another iteration. (master/slave paradigm not necessary)
Variation: processors decide independently whether to continue. This saves the Allreduce communication.

HTML version of Basic Foils prepared 22 February 98

Foil 6 Adaptive Parallel Romberg - CommAll Version
At each iteration, data is decomposed again by all processors.

From Parallel Computation Illustrated with Adaptive Integration CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Instead of assigning subintervals or decomposing iterations among the processors, we treat each element of matrix oldf as an interval and decompose intervals among the processors.
Adaptive: only those subintervals which do not meet the error criteria are iterated. Since this leads to load imbalance, processors divide up intervals each time.
Introduce a boolean array to keep track of which intervals need further computing. After each iteration, if K intervals need further computing, each processor assigns itself K/Nproc intervals in a contiguous set.
Initial Decomposition:
Decomposition after iteration:
Note that each processor "owns" a contiguous set of indices and works on a subset.

HTML version of Basic Foils prepared 22 February 98

Foil 7 Adaptive Parallel Romberg - program initialization
Adaptive Parallel Romberg - compute integral iteration and communicate values to all processors
Adaptive Parallel Romberg - recompute decomposition
Adaptive Parallel Romberg - Neighbors Version
At each iteration, processors average load with neighbors

From Parallel Computation Illustrated with Adaptive Integration CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
Instead of sending all data to all processors, each processor sends the number of intervals that it needs to compute on the next iteration to the processor on its left and right (except the end processors have only one neighbor).
First, use an Allreduce to calculate the load average, the ideal number of intervals per processor. When a processor receives the number of intervals from its rightmost neigbor, it compares with its own. If it is different by more than one:
  • If its own is greater and below load average, do not adjust load.
  • Otherwise, average the two numbers and adjust endindex by half the number, counting those intervals for which compute is still true. Send the new index to the right neighbor to use as its startindex.
  • Repeat this process for the leftmost neighbor.
Iterate the averaging of intervals until a load balance is achieved - at the end of each iteration, an allreduce can test whether any processor is still load balancing.

HTML version of Basic Foils prepared 22 February 98

Foil 8 More discussion of Neighbors version

From Parallel Computation Illustrated with Adaptive Integration CPS615 Basic Simulation Track for Computational Science -- Fall Semester 95. *
Full HTML Index
We believe this converges as the below average load processors stay fixed, and extra load will flow "downhill" until it gets there. This is probably more slowly converging than necessary - there are lots of possible algorithms, including arbitrarily stopping after some number of iterations.
The load balancing iterations may take up to Nproc sends and receives of only a small number of values. If there is no significant load imbalance, this may be much less, making this method have less message-passing than the CommAll algorithm.
Variation: In this application and in general, the units of computation do not have to be contiguous. Each processor can have a list of the indices that it is working on and send load to neighbors with wraparound, or communicate load to other choices of processors.

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Feb 22 1998