PDE Algorithm Description

Processor Configuration

We will start with one configuration of processors. A small number of other configurations will be added after we gain experience with the level of detail that can be shown with 4 processors. For example, we expect the level of detail in what is shown with the data structures to be reduced for larger numbers of processors.

Initial configuration had the number of processors, Nproc, equal to 4 in a 2 x 2 configuration.

Data Structures

Message Files

There will be a file with a standard name of messagefiles.txt which will contain one line for each type of messages. Each line will have the label of the type of messages (to be put into the choice) followed by a colon, followed by the name of the file with those messages. Example:

Explanations:pdeexplain.txt
MPI Code Fragments:pdempi.txt
Performance Analysis:pdeperform.txt
Within each message file, there will be ascii text separated by number tags (or some other file protocol to be designed).

Sequence of Events

Phase 1: Initialization

The background of this phase will show the four processors. Each will have an empty square to represent the n+2 by n+2 array, and an empty single value for tolerance. In this phase, processor 0 will be bigger than the others and will show the entire n by n Inital array. The boundary values will be highlighted.

Label: Initialization
Display Msg 1.
Comm step: proc 0 divides the initial array into 4 pieces and sends each one to one of the four processors. Time: tlat + n^2 * tcomm (msg size n^2). In the individual processors, each msg array is shown to arrive in the middle of the local working arrays. The boundary values will remain highlighted. Note that the guard pts will remain empty.

Possible resulting picture:

(Put picture here.)

Phase 2: Computation

The background of this phase should stop showing the initial array in Proc 0, but continue to show the working array and tolerance in each processor.

Loop 1: Iterate Niter times, where Niter is 100 to start. Let I be the loop variable.
Label: Iteration I
Display: Msg2.
Step:


Label: Exchange guard values
Display: Append Msg3 to previous.
Step: For each vector of n elements on the boundary of a local W array and also in the interior of the problem, send those elements to its neighbor. The neighbor should receive the elements and put them in the guard ring. This should be done in 4 communication steps: Each communication is a message of size n, so time is tlat + n * tcomm. Here is a sample picture:

(Put picture here.)

Loop 2: Iterate n * n times, for each data element in *interior* of working array.
Label: Update values
Display: Append Msg4 to previous.
Step: For each data element in the interior of the local work array, highlight a 5 point stencil (the element itself + each neighbor to the left, right, up and down). Note that some guard points get highlighted as neighbors. The highlight should last for 4 * tcalc time, representing computation using those data elements.
End Loop 2. Here is a picture:

(Put picture here.)


If iterate I is a multiple of K, start K=5, then
Loop 3: Iterate n * n times, for each data element in *interior* of working array.
Label: Check for error
Display: Append Msg5 to previous.
Step: For each data element in the interior of the local work array, highlight that element and the tolerance value. The highlight should last for 1 * tcalc time, representing computation using those data elements.
End Loop 2.

Label: End of iteration
Display: Msg6
Highlight the n by n interior portions of the work arrays in each processor to show the solution.