Project Status of the NPAC/SRC CEM Project
TASKS FINISHED
-
All the code development on Intel, CM5 and IBM SP-1 has been finished.
An efficient matrix filling algorithm is developed which requires no
communication at all. Thus the filling part of the code solely
depends on node performance of parallel system.
-
Porting of the Intel version to IBM SP-1 PVM version took just one day,
due to the portability of BLACS/ScaLAPACK and PVM. Also this code is
ported to NPAC's 8-node DEC Alpha Farm (Gigaswitch-connected) and the
ATM testbed of 2 SUN IPXs.
-
Production runs for test cases with
matrix size up to 10,000 (including EMCC
real test cases) have been conducted on the following platforms:
- CCSF 512-node Intel Delta
- CCSF 56-node Intel Paragon
- NASA AMES NAS 208-node Intel Paragon
- CCSF 64-node Intel iPSC/860
- Minnesota AHPCRC 512-node CM5
- ANL 128-node IBM SP-1 (only 64-node used)
-
The performance results are very encouraging.
For all the platforms, the speedup for the filling part is
almost linear and IBM SP-1 gives the best filling performance.
(the filling part is not dominated by floating-point calculation,
about 50% floating-point, 50% memory access).
For the LU solver, CM5 CMSSL solver achieves the best. For a
matrix size = 10,000 case, we got 16 GFlops on 512-node
CM5 with VU. The same case on 512-node Intel Delta is 8 GFlops.
The same case on 56-node Paragon is about 4 GFlops. (note that
the LU timing on the Intel platforms are not optimized timing
as we use the same ScaLAPACK partition in both filling and solving
parts of the code and in the production run a data partition, represented
by a blocksize and virtual processor grid, only optimized for
filling is used.)
-
The parallel code has been ported to IBM SP-1 with the HPswitch running
EUI-H protocol, after receiving a prereleased unoptimized BLACS version
on EUI-H from ORNL/UTK.
-
An interactive simulation/visualization system for this application is developed
under the AVS(Application Visualization System) which allows run-time steering
of model parameters input(currently the azimuth - the transmitter's angle,
and other visualization parameters), initiating simulation on multiple parallel
systems(currently IBM SP1 and CM5), and graphically displaying 2D far-field
along with 3D geometry target with or without visible gridding on the surface. Click
here to take a look of the graphical user interface.
Following (geometry) test cases have been conducted for the parallel
ParaMom package:
NASA almond Wedge Cylinder with Gap Semi-sphere(symmetry)
Sphere Cylinder Plane
VFY Cone Sphere with Gap Circular Cavity
Rectangular Cavity Wedge Plat Cylinder
Plate Cylinder Business Card Double Ogive
Single Ogive
TASKS PLANNED
-
On Intel platform, experiment with parallel I/O approach to allow
different partitions in filling and LU to achieve optimized performance
for both filling and ScaLAPACK algorithms, while using CFS/PFS as a buffer
to move data between filling and LU slover.
-
On Intel Paragon, insert the Prosolver-DES factor and solve routines
into the code to replace the ScaLAPACK LU solver.
-
On the CM5, start to look into out-of-core filling algorithm.
-
Port the code to Cray T3D.
Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu
This page is maintained by Gang Cheng, gcheng@npac.syr.edu.
Last change: 04/14/94