YEAR 4 ACCOMPLISHMENTS - FMS

Support Team

The PET FMS support team is based at the Northeast Parallel
Architectures Center (NPAC) at Syracuse University.  Within NPAC, FMS
activities are centered around the Interactive Web Technologies (IWT)
group, lead by Dr Wojtek Furmanski.  The group includes two research
scientists and roughly a dozen graduate research assistants who make
various contributions to FMS activities.

Year 4 Effort 

Using our WebHLA framework described in our previous annual reports 
and lessons learned from previous experiments with Parallel CMS, we 
addressed in Year 4 the full challenge of Parallel and Distributed, hence 
Metacomputing, CMS system that extends the sequential CMS simulator 
from Ft. Belvoir towards large scale minefields of order of million and 
more active mines. The full effort, started in previous years and brought in 
Year 4 to a successful scalable large scale metacomputing prototype and 
demonstration included: a) converting the CMS system from the DIS to 
HLA framework; b) constructing scalable Parallel CMS federate for 
Origin2000; and c) linking it with ModSAF vehicle simulator and other 
utility federates towards a Metacomputing CMS federation. Converting 
Parallel CMS to HLA using our WebHLA framework was accomplished 
in Year 3 and was described in previous reports. Here we describe the two 
main thrusts of our Year 4 effort: Scalable Parallel CMS and 
Metacomputing CMS Federation. 

Scalable Parallel CMS 

In our first attempts to port CMS to Origin2000, conducted in previous 
years, we identified performance critical parts of the inner loop, related to 
the repetitive tracking operation over all mines with respect to the vehicle 
positions and we tried to parallelize it using the Origin2000 compiler 
pragmas (i.e. loop partition and/or data decomposition directives). 
Unfortunately, this approach delivered only very limited scalability for up 
to 4 processors. We concluded that the pragmas based techniques, while 
efficient for regular Fortran programs, are not very practical for 
parallelizing complex and dynamic object-oriented event driven FMS 
simulation codes  - especially the 'legacy' object-oriented codes such as 
CMS which were developed by multiple programming teams over a long 
period of time and resulted in complex dynamic memory layouts of 
numerous objects that are now extremely difficult to decipher and properly 
distribute. 

In the follow-on effort, conducted in Year 4,  we decided to explore an 
alternative approach based on a more direct, lower level parallelization 
technique. Based on our analysis of the SPEEDES simulation kernel that is 
known to deliver scalable object-oriented HPC FMS codes on Origin2000 
(such as Parallel Navy Simulation System under development by Metron), 
we constructed a similar parallel support for CMS. The base concept of 
this 'micro SPEEDES kernel' approach, borrowed from the SPEEDES 
engine design but prototyped by us independently of the SPEEDES code, 
is to use only the fully portable UNIX constructs such as fork and shmem 
for the inter-process and inter-processor communication. This guarantees 
that the code is manifestly portable across all UNIX platforms, and hence 
it can be more easily developed, debugged and  tested in the single-
processor multi-threaded mode on sequential UNIX boxes.   

In our micro-kernel, the parent process allocates a shared memory segment 
using shmget() and then it forks n children, remaps them via execpv(), and 
passes the shared memory segment descriptor to each child via the 
command line argument. Each child attaches to its dedicated slice of the 
shared memory using shmat(), thereby establishing the highest possible 
performance (no MPI overhead), fully portable (from O2 to O2K) multi-
processor communication framework. We also developed a simple set of 
semaphores to synchronize node programs and to avoid race conditions in 
critical sections of the code. On a single processor UNIX platform, our 
kernel, when invoked with n processes, generates in fact n concurrent 
threads, communicating via UNIX shared memory. In an unscheduled 
Origin2000 run, the number of threads per processor and the number of 
processors used are undetermined (i.e. under control of the OS). However, 
when executed under control of a parallel scheduler such as MISER, each 
child process forked by our parent is assigned to a different processor, 
which allows us to regain control over the process placement and to realize 
a natural scalable implementation of parallel CMS. 

On top of this micro-kernel infrastructure, we put suitable object-oriented 
wrappers that hide the explicit shmem based communication under the 
suitable higher level abstractions so that each node program behaves in 
fact as a sequential CMS, operating on a suitable subset of the full 
minefield. CMS module cooperates with ModSAF vehicle simulator 
running on another machine on the network. CMS continuously reads 
vehicle motion PDUs from the network, updates vehicle positions and 
tracks all mines in the minefield in search for possible explosions. In our 
parallel version, the parent node 0 reads from the physical network and it 
broadcasts all PDUs via shared memory to children. Each child reads its 
PDUs from a virtual network which is a TCP/IP wrapper over the shmem 
communication channel.

Minefield segments are assigned to individual node programs using the 
scattered/cyclic decomposition which guarantees reasonable dynamic load 
balancing regardless of the current number and configuration of vehicles 
propagating through the minefield. We found the CMS minefield parser 
and the whole minefield I/O sector as difficult to decipher and modify to 
support scattered decomposition. We bypassed this problem by 
constructing our own Java based minefield parser using the new powerful 
public domain Java parser technology called ANTLR and offered by the 
MageLang Institute. Our parser reads the large sequential minefield file 
and chops it into n files, each representing a reduced node minefield 
generated via scattered decomposition. All these files are fetched 
concurrently by the node programs when the parallel CMS starts and the 
subsequent simulation decomposes naturally into node CMS programs, 
operating on scattered sectors of the minefield and communicating  via the 
shmem micro-kernel channel described above.

We performed timing runs of Parallel CMS, using the Origin2000 systems 
at the Navy Research Laboratory in Washington, DC and at the ERDC 
Major Shared Resource Center at Vicksburg, MS. The performance results 
we obtained indicate that we have successfully constructed a fully scalable 
Parallel CMS for the Origin2000 platform. We were running Parallel CMS 
for a large minefield of one million mines, simulated on 16, 32 and 64 
nodes, and we were analysing both the total simulation time to determine 
speedup and the simulation times on individual nodes to determine load 
balance. We couldn't run our million mines simulation for less than 16 
nodes due to node memory limitations. For runs on 16, 32 and 64 nodes 
we obtained almost perfect (linear) scaling over broad range of processors 
and very satisfactory load balance in each run.


Metacomputing CMS  Federation

The timing results described above were obtained during Parallel CMS 
runs conducted within a WebHLA based HPDC environment that span 
three geographically distributed laboratories and utilized most of the 
WebHLA tools and federates discussed in our previous reports. The 
overall configuration of such initial Metacomputing CMS environment 
included ModSAF, JDIS (our DIS/HLA bridge), Parallel CMS, 
logger/playback federate such as PDUDB and control/visualization 
federates such as SimVis.  ModSAF, JDIS and SimVis modules were 
typically running on a workstation cluster at NPAC in Syracuse 
University. JWORB/OWRTI based Federation Manager was typically 
running on Origin2000 at ERDC in Vicksburg, MS. Parallel CMS federate 
was typically running on Origin2000 at NRL in Washington, DC. Large 
MISER runs at NRL need to be scheduled in a batch mode and are 
activated at unpredictable times, often in the middle of the night. This 
created some logistics problems since ModSAF is a GUI based legacy 
application that needs to be started by a human pressing the button.  To 
bypass the need for a human operator to continuously monitor the MISER 
batch queue and to start ModSAF manually, we constructed a log of a 
typical simulation scenario with some 30 vehicles and we played it 
repetitively from the database using our PDUDB federate. The only 
program running continuously (at ERDC) was JWORB/OWRTI based 
Federation Manager. After the Parallel CMS was started by MISER at 
NRL, it joined distributed federation (managed at ERDC) and 
automatically activated the PDUDB playback server at NPAC that started 
to stream vehicle PDUs to JDIS which in turn converted them to HLA 
interactions and sent (via RTI located at ERDC) to Parallel CMS federate 
at NRL. Each such event, received by node 0 of Parallel CMS was 
multicast via shared memory to all nodes of the simulation run and used 
there by the node CMS programs to update the internal states of simulation 
vehicles. Inner loop of each node CMS program was continuously tracking 
all mines scattered into this node against all vehicles in search of possible 
explosions.

Having constructed  fully scalable Parallel CMS federate and having 
established a robust Metacomputing CMS experimentation environment, 
we proceed now with the next set of experiments towards wide area 
distributed large scale FMS simulations, using CMS as the application 
focus and testbed. In the first such experiment, we intent to distribute large 
minefields of millions of mines over several Origin2000 machines in 
various DoD labs using domain decomposition, followed by the scattered 
decomposition of each minefield domain over the nodes of a local parallel 
system. So far, we obtained first results for Distributed Parallel CMS with 
30K mine domains of a 60K mine  minefield distributed over Origins in 
ERDC and NRL facilities. 

Parallel CMS runs for minefields larger than 30K mines need to be
executed via local batch queues and hence a robust metacomputing operation 
would require global synchronization between such schedulers which is the 
subject of one of our proposed Year 5 tasks. Our other planned effort 
includes support for parallel object-oriented programming tools 
and visual authoring environments. We identified the need for such tools
both in our own work on Parallel CMS and in discussions with other FMS
teams such as Metron Corporation. Major requirements include
Web/Commodity base, HLA compliance and OOA&D standards compliance. We
propose to accomplish it by integrating our previous WebFlow and WebHLA
efforts with the new industry standards for object analysis and design 
such as UML (Uniform Modeling Language).