YEAR 4 ACCOMPLISHMENTS - FMS Support Team The PET FMS support team is based at the Northeast Parallel Architectures Center (NPAC) at Syracuse University. Within NPAC, FMS activities are centered around the Interactive Web Technologies (IWT) group, lead by Dr Wojtek Furmanski. The group includes two research scientists and roughly a dozen graduate research assistants who make various contributions to FMS activities. Year 4 Effort Using our WebHLA framework described in our previous annual reports and lessons learned from previous experiments with Parallel CMS, we addressed in Year 4 the full challenge of Parallel and Distributed, hence Metacomputing, CMS system that extends the sequential CMS simulator from Ft. Belvoir towards large scale minefields of order of million and more active mines. The full effort, started in previous years and brought in Year 4 to a successful scalable large scale metacomputing prototype and demonstration included: a) converting the CMS system from the DIS to HLA framework; b) constructing scalable Parallel CMS federate for Origin2000; and c) linking it with ModSAF vehicle simulator and other utility federates towards a Metacomputing CMS federation. Converting Parallel CMS to HLA using our WebHLA framework was accomplished in Year 3 and was described in previous reports. Here we describe the two main thrusts of our Year 4 effort: Scalable Parallel CMS and Metacomputing CMS Federation. Scalable Parallel CMS In our first attempts to port CMS to Origin2000, conducted in previous years, we identified performance critical parts of the inner loop, related to the repetitive tracking operation over all mines with respect to the vehicle positions and we tried to parallelize it using the Origin2000 compiler pragmas (i.e. loop partition and/or data decomposition directives). Unfortunately, this approach delivered only very limited scalability for up to 4 processors. We concluded that the pragmas based techniques, while efficient for regular Fortran programs, are not very practical for parallelizing complex and dynamic object-oriented event driven FMS simulation codes - especially the 'legacy' object-oriented codes such as CMS which were developed by multiple programming teams over a long period of time and resulted in complex dynamic memory layouts of numerous objects that are now extremely difficult to decipher and properly distribute. In the follow-on effort, conducted in Year 4, we decided to explore an alternative approach based on a more direct, lower level parallelization technique. Based on our analysis of the SPEEDES simulation kernel that is known to deliver scalable object-oriented HPC FMS codes on Origin2000 (such as Parallel Navy Simulation System under development by Metron), we constructed a similar parallel support for CMS. The base concept of this 'micro SPEEDES kernel' approach, borrowed from the SPEEDES engine design but prototyped by us independently of the SPEEDES code, is to use only the fully portable UNIX constructs such as fork and shmem for the inter-process and inter-processor communication. This guarantees that the code is manifestly portable across all UNIX platforms, and hence it can be more easily developed, debugged and tested in the single- processor multi-threaded mode on sequential UNIX boxes. In our micro-kernel, the parent process allocates a shared memory segment using shmget() and then it forks n children, remaps them via execpv(), and passes the shared memory segment descriptor to each child via the command line argument. Each child attaches to its dedicated slice of the shared memory using shmat(), thereby establishing the highest possible performance (no MPI overhead), fully portable (from O2 to O2K) multi- processor communication framework. We also developed a simple set of semaphores to synchronize node programs and to avoid race conditions in critical sections of the code. On a single processor UNIX platform, our kernel, when invoked with n processes, generates in fact n concurrent threads, communicating via UNIX shared memory. In an unscheduled Origin2000 run, the number of threads per processor and the number of processors used are undetermined (i.e. under control of the OS). However, when executed under control of a parallel scheduler such as MISER, each child process forked by our parent is assigned to a different processor, which allows us to regain control over the process placement and to realize a natural scalable implementation of parallel CMS. On top of this micro-kernel infrastructure, we put suitable object-oriented wrappers that hide the explicit shmem based communication under the suitable higher level abstractions so that each node program behaves in fact as a sequential CMS, operating on a suitable subset of the full minefield. CMS module cooperates with ModSAF vehicle simulator running on another machine on the network. CMS continuously reads vehicle motion PDUs from the network, updates vehicle positions and tracks all mines in the minefield in search for possible explosions. In our parallel version, the parent node 0 reads from the physical network and it broadcasts all PDUs via shared memory to children. Each child reads its PDUs from a virtual network which is a TCP/IP wrapper over the shmem communication channel. Minefield segments are assigned to individual node programs using the scattered/cyclic decomposition which guarantees reasonable dynamic load balancing regardless of the current number and configuration of vehicles propagating through the minefield. We found the CMS minefield parser and the whole minefield I/O sector as difficult to decipher and modify to support scattered decomposition. We bypassed this problem by constructing our own Java based minefield parser using the new powerful public domain Java parser technology called ANTLR and offered by the MageLang Institute. Our parser reads the large sequential minefield file and chops it into n files, each representing a reduced node minefield generated via scattered decomposition. All these files are fetched concurrently by the node programs when the parallel CMS starts and the subsequent simulation decomposes naturally into node CMS programs, operating on scattered sectors of the minefield and communicating via the shmem micro-kernel channel described above. We performed timing runs of Parallel CMS, using the Origin2000 systems at the Navy Research Laboratory in Washington, DC and at the ERDC Major Shared Resource Center at Vicksburg, MS. The performance results we obtained indicate that we have successfully constructed a fully scalable Parallel CMS for the Origin2000 platform. We were running Parallel CMS for a large minefield of one million mines, simulated on 16, 32 and 64 nodes, and we were analysing both the total simulation time to determine speedup and the simulation times on individual nodes to determine load balance. We couldn't run our million mines simulation for less than 16 nodes due to node memory limitations. For runs on 16, 32 and 64 nodes we obtained almost perfect (linear) scaling over broad range of processors and very satisfactory load balance in each run. Metacomputing CMS Federation The timing results described above were obtained during Parallel CMS runs conducted within a WebHLA based HPDC environment that span three geographically distributed laboratories and utilized most of the WebHLA tools and federates discussed in our previous reports. The overall configuration of such initial Metacomputing CMS environment included ModSAF, JDIS (our DIS/HLA bridge), Parallel CMS, logger/playback federate such as PDUDB and control/visualization federates such as SimVis. ModSAF, JDIS and SimVis modules were typically running on a workstation cluster at NPAC in Syracuse University. JWORB/OWRTI based Federation Manager was typically running on Origin2000 at ERDC in Vicksburg, MS. Parallel CMS federate was typically running on Origin2000 at NRL in Washington, DC. Large MISER runs at NRL need to be scheduled in a batch mode and are activated at unpredictable times, often in the middle of the night. This created some logistics problems since ModSAF is a GUI based legacy application that needs to be started by a human pressing the button. To bypass the need for a human operator to continuously monitor the MISER batch queue and to start ModSAF manually, we constructed a log of a typical simulation scenario with some 30 vehicles and we played it repetitively from the database using our PDUDB federate. The only program running continuously (at ERDC) was JWORB/OWRTI based Federation Manager. After the Parallel CMS was started by MISER at NRL, it joined distributed federation (managed at ERDC) and automatically activated the PDUDB playback server at NPAC that started to stream vehicle PDUs to JDIS which in turn converted them to HLA interactions and sent (via RTI located at ERDC) to Parallel CMS federate at NRL. Each such event, received by node 0 of Parallel CMS was multicast via shared memory to all nodes of the simulation run and used there by the node CMS programs to update the internal states of simulation vehicles. Inner loop of each node CMS program was continuously tracking all mines scattered into this node against all vehicles in search of possible explosions. Having constructed fully scalable Parallel CMS federate and having established a robust Metacomputing CMS experimentation environment, we proceed now with the next set of experiments towards wide area distributed large scale FMS simulations, using CMS as the application focus and testbed. In the first such experiment, we intent to distribute large minefields of millions of mines over several Origin2000 machines in various DoD labs using domain decomposition, followed by the scattered decomposition of each minefield domain over the nodes of a local parallel system. So far, we obtained first results for Distributed Parallel CMS with 30K mine domains of a 60K mine minefield distributed over Origins in ERDC and NRL facilities. Parallel CMS runs for minefields larger than 30K mines need to be executed via local batch queues and hence a robust metacomputing operation would require global synchronization between such schedulers which is the subject of one of our proposed Year 5 tasks. Our other planned effort includes support for parallel object-oriented programming tools and visual authoring environments. We identified the need for such tools both in our own work on Parallel CMS and in discussions with other FMS teams such as Metron Corporation. Major requirements include Web/Commodity base, HLA compliance and OOA&D standards compliance. We propose to accomplish it by integrating our previous WebFlow and WebHLA efforts with the new industry standards for object analysis and design such as UML (Uniform Modeling Language).