Focused Effort Title: Enforcing Scalability of Parallel CMS Thematic Area(s): Scalable Computing Migration PI Name: David Bernholdt PI EMail Address: bernhold@npac.syr.edu PI Telephone: 315 443 3857 PI Fax: 315 443 1973 Project Description: Both CEWES and ARL have/are supporting aspects of CMS parallelization over the last few years. In Year 3, we made a major progress in this project: we ported CMS code to the Origin2000 platforms at CEWES and ARL MSRCs and we were also running our Parallel CMS on the Origin2000 provided by the ASC MSRC for the HPCMO booth at Supercomputing'98 in Orlando, FL, Nov 98. After performing the port, we conducted tests to accumulate timing results and evaluate effficency of our parallel implementation. Measuring performance and scalability on Origin2000 is a subtle task since the system assigns the number of available processors dynamically and this number fluctuates during the runtime. To cope with this problem, we constructed suitable real-time performance monitoring tools with the Java applet front-end and with the associated middleware monitor node plugged into the parallel simulation backend. Performance results we got this way indicated that our parallel port offered almost linear speedup i.e. perfect scalability up to and including 4 processors, but the performance deteriorates for 8 of more processors. In a sequence of cross-checks we identified the reason as related to rather complex object-oriented data structures (including dynamic linked lists of irregular objects), present in the inner loop of the C++ CMS code. This loop runs over all mines and vehicle PDUs, tracking for possible hits and explosions. Our parallelization technique was based on Origin2000 C++ parallel pragmas, i.e. compiler directives that allow to decompose loop indices over processors. However, due to the NUMA architecture of Origin2000 and large memory utilization of the CMS inner loop, we also need to perform the domain decomposition of the associated objects over processors to enforce the memory locality. Such data decomposition pragmas are also available but we discovered that the proper synchronization of the loop and memory decomposition directives turned out to be non-optimal for the CMS inner loop. We performed some modifications of the code, simplifying the inner simulation loop, but we learned in the meantime from Ft. Belvoir that their latest version of CMS performs several additional computations in the inner loop (such as continuous thermal computations for all mines, associated with the new environmental simulation support) that need to be taken into account in our analysis and parallelization process. Hence, to turn our prototype Parallel CMS into a more robust and scalable parallel module, we need to go more deeply into the CMS code and to perform more significant reorganization of the inner loop. We propose to develop such fully scalable production quality Parallel CMS module as Year 4 Focused Effort. Apart from experimenting with and selecting the optimal configuration of compiler pragmas, we will also explore other parallelization techniques based on MPI, CRAY SHMEM, OpenMP and SPEEDES communication models. Benefits: CMS is in active use by the Night Vision and Electronic Sensors Directorate, Ft. Belvoir, and is expected to see increasing use by other DoD users. Ft. Belvoir has a goal of a one million mine simulation, which clearly cannot be met without high-level exploitation of high-performance computing. This project is an important step towards that goal, but will also provide insight as to how other FMS codes, many of which use a highly object-oriented style like CMS, can be adapted for better performance on HPC resources. Required Resources: Deliverables: o Parallel CMS module, scalable over the processor range on Origin2000 (preliminary Sep'99, final Mar'00) o Analysis of performance and tradeoffs between various communication modes (preliminary Sep'99, final Mar'00) o Installation of the scalable Parallel CMS module at CEWES and ARL (Mar'00) o Evaluation of total CPU power for Metacomputing CMS on multi-MSRC platform. (preliminary Sep'99, final Mar'00) o Technical Report (Mar'00)