CPS-615 -- Assignments Page -- Fall 1996
CPS-615 -- Course Assignments -- Fall 1996
Table of Contents
Back to class homepage
- Except for the first assignment, all assignments should be written up
in the style of a report, i.e. if you write a program for the
solution, it is never enough to just turn in the code, you must
always explain what the program does and how it solves the
problem.
- You may either hand in your homework on paper in class, or you may
submit it as a collection of web documents. For handing in
homework on the web, we assume that you will not want everyone
to read your solutions before the homework deadline. Therefore, we
have provided a directory in your web directory called
homework. It can be read by the group cps615ad, but not by
anyone else. DO NOT CHANGE THE PROTECTIONS OF THIS DIRECTORY!
- To submit your homework, copy everything to the following standard directory:
For example to place Homework 2 in your local directory hw2, you execute:
cp -r hw2 /usr/local/archives/public/projects/cps615fall96/students/yourusername/homework/hw2
- Please copy each homework assignment to a standardly named directory,
for example, hw2, hw3 and so on, and name the top-level page
of each assignment index.html. After the homework deadline,
the TA will copy your homework to a public place for grading
and viewing.
Read my short overview of HPCC.
Browse other HPCC resources and write a short (say 2 pages equivalent) HTML commentary
describing areas that you found that look important and were not
covered well in my paper.
Here are a starting set of Web Resources.
Your report should be something that we could hyperlink as an addon
resource to expand material in my short somewhat dated
review.
As your report is HTML it can and should include references as
clickable Web links. Take a look at the following general references:
Go to solutions page
This assignment consists of two parts.
- Consider the analysis in class of the performance (speedup S) of
Laplace's Equation in two dimensions, found in lecture
foilset: Laplace Example --
Programming Models and Performance. You need to do the following:
- (A) Repeat this analysis for the one dimensional version of
problem.
- (B) Repeat this analysis in three dimensions.
- (C) Use the results obtained in steps (A), (B), and class notes to
generalize the speedup formula which relates S to
grain size n, number of processors P, and
hardware parameters t/comm and t/calc to
be a function of dimension d = 1, 2 or 3.
- Hadrian's Palace is L meters by L meters in
size. Its floor is to be covered by tiles which are 0.99
by 0.99 meters in size and separated by 0.02 meters of grout in
between each tile.
Take the same rather informal approach given in the lectures to the
analysis of the construction of Hadrian's wall and modify it
for this case. Note Hadrian is keen on dancing and so his
Palaces have no internal walls and can be considered as an
unobstructed uniform two dimension expanse of tiles. In this
part you need to do the following:
- Initially assume every worker (layer of tiles) has the same
capability and works at same rate.
What happens to your analysis if the workers fall into five groups
which work at rates 0.8, 0.9, 1.0, 1.1, and 1.2 times
the average.
A good reference is Parallel
Computing Works!.
Go to solutions page
Note: NO Lecture September 19 (Thursday)
Consider the Sun Cluster in NPAC which has 8 UltraSparc2 based machines which are each
capable of 100 megaflop (millions of floating point) per second performance. we wish to solve 2
problems on this machine:
- Laplace Equation given in class
but modified to a N by N by N grid in three dimensions.
- A model problem for three dimensional Computational Fluid
Dynamics which can be
thought of as Lapace's equation with following modifications:
- One stores 5 values ( 3 components of velocity, density, Energy)
not one per grid point.
- Each computation involves an additional multiplication by a (5 X
matrix ).
These changes can be summarized as operationally:
- Communication load increases by a factor of 5.
- Computation increases by a factor of 25.
Derive the expected speedup (for upto 8 nodes) for the following choices of networks connecting the
UltraSparcs:
- The first choice is a SINGLE SHARED Ethernet peaking at 10
megabits/second bandwidth .
- The second choice is switched fast Ethernet (each Sun system can simultaneously transmit at
given bandwidth) with a peak performance of 100 megabits per
second per link.
- The third choice is switched OC-12 ATM (each Sun system can simultaneously transmit at
given bandwidth) with a peak performance of 620 Megabits per
second.
Assume that any of these networks can realize 50% of its rated peak
performance. Derive the problem size (value of N) as a
function of the number of processors which will lead to an
efficiency of 80%. How much memory per node would we need
for the 80% efficiency in computation?
References:
Go to solutions page
This assignment is about the Discussion of HPCC Software so you will need
to review materials particularly on Fortran90,
HPF and
MPI and submit a review
report. Here is a list of references that you might like to use for the review:
- For this assignment you will need to familiarize yourself with
VPL Web
interface to HPF, Fortran90 and MPI.
- Write a Fortran90 routine implementing Jacobi Iteration for Laplace's equation in two
dimensions and run it with 64 by 64 grid and unit boundary conditions on the square.
See that the program produces correct answer, which is?
Go to solutions page
In the previous assignment, you were asked to familiarize yourself
with VPL and
demonstrate it by running an F90 program. For this
assignment you need to do the following:
- Write a short (about a page or so) critique of
VPL features
describing any additional feature(s) that you
may think would be useful to add to it. We are interested in all
comments (positive or negative) and would like them to be
as specific as possible.
- Please put your critique (in text) on the Web (on your VPL area)
with your other homeworks.
- Store your homeworks in directories called 'homework1',
'homework2', and so on. If the problem is divided into
sub-problems, try to put your answers in clearly marked
separate files in the appropriate directory.
- Comments, if any, and grades will appear shortly there in files -- each named
'comments-and-grade'.
Please direct your comments/suggestions about VPL either to grader, Geoffrey or Kivanc
(dincer@npac.syr.edu).
Go to solutions page
This assignment consists of two sub-problems:
- Write an HPF program to solve the Gaussian elimination problem
using appropriate DISTRIBUTE compiler
directives. Time your program on a large problem using the
DEC compiler on the Alpha cluster. Compare timings in the
forward part when using BLOCK distribution versus
timings in also the forward part when using CYCLIC
distribution.
- [A] This program need to be compiled and run using VPL. The
input -cofficients of the equations (in a form of a
matrix)- can either be created at random in the program
itself or input as a file.
- [B] You should time your program when it is compiled using the
BLOCK and CYCLIC distribution directives.
- [C] Please show your timings, and as usual, explain exactly what
program, compiling, running and so on that you did, and
what conclusions you may draw from your experiment, if
any, regarding the use of the BLOCK and CYCLIC
distribution directives.
- Consider the following problem. Within a metal slab of dimensions
0 <= x <= 2 and 0 <= y <= 1, the theoretical steady-state
heat distribution is given by the following theoretical
solution:
U (x,y) = -4.7 + 0.2x - 0.3y + 0.09(x**2 - y**2) + ln (sqrt (((x +
0.2)**2) + ((y + 0.2)**2)))
We wish to determine the experimental steady-state heat distribution
along a series of mesh points by solving the Laplace heat
equation using the Jacobi iteration algorithm (Fox et al.,
1988)(Gerald and Wheatley, 1989).
- [A] The Dirichlet boundary conditions for the metal slab may be
calculated using the theoretical solution U (x,y) along
the following intervals:
- Top 0 <= x <= 2 and y=1
- Bottom 0 <= x <= 2 and y=0
- Right x=2 and 0 <= y <= 1
- Left x=0 and 0 <= y <= 1
- [B] Conduct the following experiments:
- [B.1] Solve the Laplace heat equation using a mesh size of 0.10 and
an initial internal value of 0.
- [B.2] Solve the Laplace heat equation using a mesh size of 0.05 and
an initial internal value of 0.
- [B.3] Compare the number of iterations necessary to solve the Laplace
heat equation using the two mesh sizes in the above
B.1 and B.2 experiments.
- [B.4] Solve the Laplace heat equation using a mesh size of 0.10 and
an initial internal value composed of the average of
the Dirichlet boundary values.
- [B.5] Compare the number of iterations necessary to solve the
Laplace heat equation using the two initial internal
values specified in equations of B.1 and B.4 above.
Notes:
- A tolerance of 0.000003 should be used in the Jacobi iteration
algorithm. Each program should return: number of iterations
necessary to solve the Laplace heat equation. cumulative
error between the experimental and theoretical solutions at
the mesh points.
- Programs should be written using the Fortran 90 or HP Fortran
languages. Double precision variables should be used
throughout the calculations.
- It is essential that you use the VPL interface for compiling and
running the program. If the problem is divided into
sub-problems, try to put your answers in clearly marked
separate files in the appropriate directory.
- If you encounter any difficulties with VPL, please contact the TA.
References:
- Fox, G.C., Johnson, M.A., Lyzenga, G.A., Otto, S.W., Salmon,
J.K. and Walker, D.W. (1988). Solving Problems on Concurrent
Processors, Volume I, General Techniques and Regular
Problems, Prentice-Hall, Englewood Cliffs, New Jersey,
pp. 117-133.
- Gerald, C.F. and Wheatley, P.O. (1989). Applied Numerical
Analysis, Fourth Edition, Addison-Wesley, pp. 471-489.
Go to solutions page
N-Body and Random Number Generation.
In N-Body Solver -
NPAC benchmark suite there is a collection of N-body solvers and
our interest is the HPF source
which is an N-body source code written by Tom Haupt and runs
on the alphafarm. This code has a speedup problem, or in other
words, as we increase the number of processors that will run
on we get an opposite of what we desire, that is we get a
slowdown rather than a speedup. This deficiency (slowdown) is
mainly due to the use of the random number generator. As you
can see in the code, the RNG is written in F90 since HPF does
not provide parallel random number generators.
The question is to parallelize the F90 source for the random number
generator to speedup the abovementioned N-body program.
No need to strictly follow the F90 source to do the
parallelization but come up with your own ideas of putting together an
efficient parallel generator. Show your speedup timings.
Hints:
- As mentioned above, HPF does not currently provide a parallel
random number generator so a parallel implementation would
require that the numbers generated by different processors
are not correlated.
- A nice feature is to try not to correlate the results with the
number of processors used.
- Take a look at Tom Haupt's HPF tutorial page.
In particular, pay a visit to the page on Extrinsic Procedures. Since
in the extrinsic mode the parallel RNG can reuse sequential
F90 intrinsics, the generators work independently;
therefore, in this approach the problem reduces to finding
"right" seeds on each node.
References:
- Paul Coddington's Random Number Generators for Parallel Computers.
- Parallel
Computing Works! has a section on the N-body problem but this is the more
advanced fast multipole algorithm. The older book Solving Problems
on Concurrent Processors describes the basic N-body methods
used in sample code.
- Parallel N-Body Simulations are described in
Parallel N-Body Simulations contains
algorithm analysis and pointers to very interesting papers and
simulations.
- Some of John Salmon's publications on the subject are available here.
Go to solutions page
MPI for Laplace Solver.
This assignment consists of two related parts:
- Is to solve the Laplace heat equation over a rectangle by the
red/black SOR (successive overrelaxation) method. The
program can be written in either C with MPI or Fortran with
MPI to run on the alphafarm via VPL. Use a grid size of N >=
100 and use the heat equation to derive the boundary values
of the grid.
- Investigate the optimum value(s) of the overrelaxation parameter
(omega) by making a minimum of 10 runs for values of omega
between 1 and 2.
Comments and Suggestions:
- Basically, the method is to implement the Gauss-Seidel (GS)
algorithm in parallel. One possibility is to use what is
commonly referred to as the odd-even (or maybe the even-odd)
GS method. See Discussion in CPS615 Module on Iterative PDE Solvers.
- One can start with a 2-d grid of M points, then partition it into two sets of points. Updating
goes as follows:
- Update the even set to give, for example, values X.
- Using the X values, update the odd set, giving Y.
So, essentially what constitute the GS algorithm is this multi-step approach of
update-then-compute.
- On the aspect of parallelism, we could say that parallelism of the
SOR method is pretty much similar to parallelism of the GS
except for the over relaxation parameter (omega) which is
used in the SOR update (see class notes):
z^{k+1} (SOR) = omega z^{k+1} (GS) + (1 - omega)z^k
References:
- Intro. to Parallel Computing: design and analysis of
algorithms. V. Kumar et. al, Benjamin/Cummings
publ. 1994. (mainly section 11.2, pages 426-433, will be
handed in in the class).
- Solving Problems on Concurrent Processors, vol. I, G. Fox,
M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and
D. Walker. Prentice-Hall 1988. (Chapter 7 is very useful).
- In your VPL accounts, you will find materials-for-hwk9
directory containing various MPI
examples. There is a
program laplace.f90 which gives an elementary solution of
Laplace's equation in 2 dimensions using the point-Jacobi
iteration. Maybe you could start from there.
- Some MPI resources, etc. can accessed through the software tools and
algorithms section of the class homepage.
- A useful MPI tutorial (1995) with examples in C and fortran is at
this page.
Go to solutions page
Here are some of the mini-projects done by students:
Back to class homepage