We will start with one configuration of processors. A small number of other configurations will be added after we gain experience with the level of detail that can be shown with 4 processors. For example, we expect the level of detail in what is shown with the data structures to be reduced for larger numbers of processors.
Initial configuration had the number of processors, Nproc, equal to 4 in a 2 x 2 configuration.
Data Structures
Message Files
There will be a file with a standard name of messagefiles.txt which will contain one line for each type of messages. Each line will have the label of the type of messages (to be put into the choice) followed by a colon, followed by the name of the file with those messages. Example:
Explanations:pdeexplain.txt MPI Code Fragments:pdempi.txt Performance Analysis:pdeperform.txtWithin each message file, there will be ascii text separated by number tags (or some other file protocol to be designed).
Sequence of Events
Phase 1: Initialization
The background of this phase will show the four processors. Each will have an empty square to represent the n+2 by n+2 array, and an empty single value for tolerance. In this phase, processor 0 will be bigger than the others and will show the entire n by n Inital array. The boundary values will be highlighted.
Label: Initialization
Display Msg 1.
Comm step: proc 0 divides the initial array into 4 pieces and sends each
one to one of the four processors. Time: tlat + n^2 * tcomm (msg size n^2).
In the individual processors, each msg array is shown to arrive in the
middle of the local working arrays. The boundary values will remain
highlighted. Note that the guard pts will remain empty.
Possible resulting picture:
(Put picture here.)
Phase 2: Computation
The background of this phase should stop showing the initial array in Proc 0, but continue to show the working array and tolerance in each processor.
Loop 1: Iterate Niter times, where Niter is 100 to start. Let I be the
loop variable.
Label: Iteration I
Display: Msg2.
Step:
Label: Exchange guard values
Display: Append Msg3 to previous.
Step: For each vector of n elements on the boundary of a local W array and also in the interior of the problem, send those elements to its neighbor. The neighbor should receive the elements and put them in the guard ring. This should be done in 4 communication steps:Each communication is a message of size n, so time is tlat + n * tcomm. Here is a sample picture:
- each processor sends its bottom row to its bottom neighbor, and receives a row from its top neighbor to put in its guard ring.
- each processor sends its top row to its top neighbor, and receives a row from its bottom neighbor to put in its guard ring.
- left and right neighbors do the same.
- the idea of top and bottom, and left and right, do *not* wrap, i.e. it is never necessary to send elements which are in the boundary of the initial array.
(Put picture here.)
Loop 2: Iterate n * n times, for each data element in *interior* of working array.
Label: Update values
Display: Append Msg4 to previous.
Step: For each data element in the interior of the local work array, highlight a 5 point stencil (the element itself + each neighbor to the left, right, up and down). Note that some guard points get highlighted as neighbors. The highlight should last for 4 * tcalc time, representing computation using those data elements.
End Loop 2. Here is a picture:(Put picture here.)
Loop 3: Iterate n * n times, for each data element in *interior* of working array.
Label: Check for error
Display: Append Msg5 to previous.
Step: For each data element in the interior of the local work array, highlight that element and the tolerance value. The highlight should last for 1 * tcalc time, representing computation using those data elements.
End Loop 2.