Next: Performance and Numerical Up: Implementation Previous: RCS Computation

Out-of-Core Algorithm

Solution of very large problems requires out-of-core matrix filling and solution. The out-of-core matrix solution would be done using a vendor-specific out-of-core solver, such as ProSolver DES [80] on the Intel Paragon or the out-of-core CMSSL solver on the CM-5. Here, we only consider the development of out-of-core filling on the CM-5 system. The reason for developing an out-of-core filling algorithm is that the matrix is too large to be stored in the main memory of the system. We have to fill one portion of the matrix at a time and write it to a file, and then fill another portion of the matrix and write it out, and so on. Compare this with the in-core matrix fill algorithm, where the matrix is filled once then written out once. The main idea of designing an out-of-core algorithm is to modify the in-core filling algorithm structure and fill a portion of the matrix instead of the whole matrix.

Similarly, the issues involved with parallel out-of-core algorithm design are the problems of decomposition, load-balancing and communication. As with the in-core fill algorithm, the no communication approach is adopted here. The important question for out-of-core filling is how to decompose the problem into a set of small problems which can be fitted in in-core memory.

Before answering the question, we assume that the matrix, with N rows and columns, contains bytes; the system has bytes of available memory attached on each node; the number of processors is ; and the total number of out-of-core filling is , where has to satify the inequality

When the column block decomposition is used, each in-core fill is to fill a slab of matrix. The size of the out-of-core slab is restricted by

and

where is the number of columns for each node to fill at the out-of-core fill. At the last out-of-core fill (or ), the number of unfilled columns is . The job to fill the columns is distributed into nodes as evenly as possible. Let and . A node fills the columns if its index is less , otherwise it fills columns. The worst load balancing is when some nodes have one column more to fill than the other nodes.

Now, we can discuss our out-of-core fill algorithm since the data decomposition is decided. Based on the application and the particular system architecture, one supplies and which is the maximum number of columns that can be held in each node's memory. Let each node go through a loop of number of out-of-core procedures (from 1 to ). Each node calculates for the out-of-core fill and sets the global upper and lower bound. For example, the global lower bound is one and upper bound is for ; and the global lower bound is and upper bound is and so on. Each node fills a portion of the matrix in the same way as the in-core fill algorithm. However, each node does not pay any attention to the columns which fall outside the fill bound. After every node has completed the desired filling, a global synchronous write command is issued to write the portion of the matrix into a file. Then, each node examined goes back to the loop.

A pseudo code of the out-of-core matrix fill is shown in Figure 4.12.

We implemented this algorithm and tested it on the CM5 for the case. The results are summarized in Table . It is seen that there is relatively little difference between out-of-core and in-core fill times.



Next: Performance and Numerical Up: Implementation Previous: RCS Computation


xshen@
Sat Dec 3 17:51:03 EST 1994