Project Report


Solve Linear Equations in Data Parallel Computing

Xinying Li
email : xli@npac.syr.edu
December,1996


1. Description of the Issue

The recent availability of advanced-architecture computers has had a very significant impact on all spheres of scientific computation including algorithm research and software development in numerical linear algebra. The usage of vectorization and parallelism makes many linear issues have certain typical and efficient solution.
Solving the system of linear equations is the basic issue of linear algebra. It has directive methods and iterative methods. In this project I employ the the former in the angle of data parallel computing.
For the common linear equations they are usually described into the following form, supposed that we have n linear equations relating n variables.
        a(11)x(11) + ... + a(1n)x(n) = b(1)
            :               
            :
        a(n1)x(n1) + ... + a(nn)x(n) = b(n)
or
        A X = b
where A is a real n*n matrix and X and b are real vectors of length n . According to the fundamental theory of linear algebra, if the rank of matrix A is not less than n ,that is
        det(A) != 0   
it has unique solution:
        X = A-1 b.   
To get the solution in directive methods we usually adopt Gauss Elimination and back substitution,which is regarded as a fairly efficient and typical method .

2. Developing Data Parallelism

In the procedure of elimination we need substract the line containing the pivot from every lower lines continuously, so we can have the substraction parallelize in column. Considering the load balance of computing, the coefficient matrix is distributed in CYCLIC along the column in this case.
In backsubstitution, computation of each step depends on the former result of x(i-1) , so we can employ the paralism by using the former result in every eqution at the same time, that is , when the x(i) is got, all equtions substract the term containing x(i). So the paralism is got by redistribute the coefficient matrix with CYCLIC in lines.

3. Running and Analysis

Because I need to generate profile to time the program, I compile and run the program directly on alpha farm: kestrel1 ... kestrel8 in NPAC. 4 nodes of the Alphafarm are for Parallel execution, and 1 node of it for sequetial execution.Using the DEC OSF/1 Fortran 90 compiler.
The execution time should be collected in comparable measures. As we know, there are quite a few different time measures often used in computer system, such as cpu time, elapsed time, system time, etc. The situation is even worse when talking about parallel execution. Often some profiling times are used. Here we use profiling option offered by compiler to get parallel timing and sequential timing.
Running the program for linear systems of various sizes, from 128*128 to 2048*2048 , we can see the speedup(table 1). And due to dynamic nature of parallel computer system. I conducted at least 5 runs for each program and each input data size. The numbers given in table 1 are the minimun (not average) of different runs.
Abserving the efficiency of different data distribution, I employ column-cyclic to eliminate, and redistribute the data with row-cyclic to backsubstitute. But it is a pity that DEC compiler doesn't support the dynamic data distribution, namely, the redistribute directive is ignored by the compiler, so the result is not pretty satisfying.
Table1: #proc 128*128 256*256 512*512 1024*1024 2048*2048 ------------------------------------------------------------------------------------- 1 0.19 3.54 40.05 230.66 2636.24 ------------------------------------------------------------------------------------- 4 1.80 3.85 14.75 64.04 706.00 -------------------------------------------------------------------------------------

Bibliography

[1] Jack J.Dongarra,et al. Solving Linear Systems on Vector and Shared Memory Computers. SIAM,Philadelphia. 1991
[2] Michael J.Quinn. Parallel Computing Theory and Proctice.McGRAM-HILL,INC. 1994
[3] G.Hadley. Linear Alogebra.Addison-Wesley Publishing Company,INC. 1961


* HPF program for this project is available in CPS615 homework directory:
/usr/local/archives/public/projects/cps615fall96/students/xli/homework/project