In [2] different kinds of Choleski algorithm organization are discussed. Among them, column interleaved outer product form and row interleaved column Choleski form are considered more promising ones on a parallel computer system with distributed memory. Here we only write one of them, the outer product form of Choleski algorithm, which is based on the following mathematic equation:
the updating of the submatrix is accomplished by subtracting the
outer product of
.
A simple description of the algorithm in a sequential code may be,
On a parallel computer with a distributed memory, the array may have a column-interleaved storage, then each processor can update its own column after getting the first updated column of the matrix or submatrix.
We write the above algorithm in the following HPJava program. In the code, a cyclic range is used to allocate the original array on multiprocessors, the result L can overwrite part of its storage. During the computation, collective communication remap is used for broadcasting updated columns.
on(Adlib.global) { Range x = new CyclicRange(size, Adlib.global.dim(0)); float a[[,#]] = new float [[size, x]]; // initialize the array here; float b[[]] = new float [[size]]; // used as a buffer Location j; at(j=x|0) a[0,j]=sqrt(a[0,j]); for(int k=0; k<size-1; k++) { for(int s=k+1; s<size; s++) at(j=x|k) a[s,j]/=a[j,j]; Adlib.remap(b[[k+1:]],a[[k,x|k+1:]]); over (j=x|k+1:) for (int i=x; i<size; i++) a[i,j]-=b[i]*b[j]; at(j=x|k+1) a[k+1,j]=sqrt(a[k+1,j]); } }