WebHLA as Integration
Platform for FMS and other Metacomputing Application Domains
Geoffrey
Fox, Ph. D., Wojtek Furmanski,
Ph. D.,
Ganesh Krishnamurthy, Hasan Ozdemir, Zeynep Odcikin-Ozdemir, Tom Pulikal, Krishnan Rangarajan, Ankur Sood
Northeast Parallel Architectures Center, Syracuse University
111 College Place, Syracuse University, Syracuse NY 13244-4100
{gcf, furm, gkrishna, timucin, zeynep, tapulika, krrangar, asood} @ npac.syr.edu
ABSTRACT
High Level Architecture (HLA)
standards for interoperability between various Modeling and Simulation
paradigms are being enforced by the DoD in parallel with the rapid onset of new
Object Web / Commodity standards for distributed objects and componentware,
emergent at the crossroads of CORBA, COM, Java, and XML technologies. WebHLA
explores synergies between and integrates both trends by offering Object Web
based implementation of the HLA framework. Our goal is to deliver a uniform
platform that facilitates conversion of legacy codes to and development of new
codes in compliance with HLA, HPC and Object Web standards. In this paper: a) we outline the overall design of WebHLA;
b) we summarize the system components prototyped so far; c) we illustrate our
approach for one HPC FMS application – Parallel CMS (Comprehensive Mine
Simulator) - in the area of large scale minefield simulation and countermine
engineering; and finally d) we describe initial steps within our WebHLA based
approach towards more general purpose Metacomputing, initially driven by FMS
applications but applicable also for other CTAs within the HPC Modernization
Program.
1. INTRODUCTION
We present here early
results of our work on Web / Commodity based High Performance Modeling and
Simulation, conducted as part of the academic branch of the Forces Modeling and
Simulation (FMS) domain within the DoD HPC Modernization Program. Our
approach explores synergies between
ongoing and rapid technology evolution processes such as: a) transition of the
DoD M&S standards from DIS and ALSP to HLA; b) extension of Web
technologies from passive information dissemination to interactive distributed
object computing offered by CORBA, Java, COM and W3C WOM; and c) transition of
HPCC systems from custom (such as dedicated MPPs) to commodity base (such as NT
clusters).
One common aspect of all these trends is the enforcement of reusability and shareability of products or components based on new technology standards. DMSO HLA makes the first major step in this direction by offering the interoperability framework between a broad spectrum of simulation paradigms, including both real-time and logical time models (DMSO 1998).
Figure 1: Pragmatic Object Web concepts and components.
However, HLA standard specification leaves several implementation decisions open and to be made by the application developers - this enables reusability and integrability of existing codes but often leaves developers of new simulations without enough guidance.
In WebHLA, we fill this
gap by using the emergent standards of Web based distributed computing – we
call it Pragmatic Object Web (Orfali 1998; Fox et al.
1999) - that integrate Java, CORBA, COM and W3C WOM models for distributed
componentware as illustrated in Fig. 1.
We believe that WebHLA, defined as the convergence point of the standardization processes outlined above will offer a powerful modeling and simulation framework, capable to address the new challenges of DoD computing in the areas of Simulation Based Design, Testing, Evaluation and Acquisition.
Figure 2: Example of protocol integration support in JWORB.
In this document, we
summarize the WebHLA architecture (Section 2), we review the status of WebHLA
components as of Feb ’99 (Section 3) and we illustrate our approach on example
of one large scale HPC FMS application – Parallel CMS (Section 4).
2. WEBHLA OVERVIEW
The overall architecture of WebHLA follows the 3-tier architecture of our Pragmatic Object Web (Fox et al. 1999) (see Figure 1) with a mesh of JWORB (Java Web Object Request Broker) based middleware servers, managing backend simulation modules and offering Web portal style interactive multi-user front-ends. JWORB is a multi-protocol server capable to manage objects conforming to various distributed object models and including CORBA, Java, COM and XML. HLA is also supported via Object Web RTI (OWRTI) i.e. Java CORBA based implementation of DMSO RTI 1.3, packaged as a JWORB service. As illustrated in Fig. 1, objects in any of the popular commodity models can be naturally grouped within the WebHLA framework as HLA federates and they can naturally communicate by exchanging (via JWORB based RTI) XML-ized events or messages packaged as some suitable FOM interactions.
HLA-compliant M&S systems can be integrated in WebHLA by porting legacy codes (typically written in C/C++) to suitable HPC platforms, wrapping such codes as WebHLA federates using cross-language (Java/C++) RTICup API, and using them as plug-and-play components on the JWORB/OWRTI software bus. In case of previous generation simulations following the DIS (or ALSP) model, suitable bridges to the HLA/RTI communication domain are also available in WebHLA, packaged as utility federates. To facilitate experiments with CPU-intense HPC simulation modules, suitable database tools are available such as event logger, event database manager and event playback federate that allow us to save the entire simulation segments and replay later for some analysis, demo or review purposes. Finally, we also constructed SimVis – a commodity (DirectX on NT) graphics based battlefield visualizer federate that offers real-time interactive 3D front-end for typical DIS=>HLA entity level (e.g. ModSAF style) simulations.
In the following, we describe in more detail in Chapter 3 the WebHLA components listed above, followed (in Chapter 4) by an example of using the system to integrate a realistic large scale HPC FMS simulation.
3. WebHLA
COMPONENTS
3.1 JWORB based Object Web RTI
DMSO’s longer range plan includes transferring HLA to industry as CORBA Facility for Modeling and Simulation.
Anticipating these developments, we have recently developed in one of our HPCMP FMS PET projects at NPAC an Object Web based RTI (Fox et al.1998c) prototype, which builds on top of our new JWORB (Java Web Object Request Broker) middleware integration technology. JWORB is a multi-protocol Java network server, currently integrating HTTP (Web) and IIOP (CORBA) and hence acting both as a Web server and a CORBA broker (see Fig. 2) Such server architecture enforces software economy and allows us to efficiently prototype new interactive Web standards such as XML, DOM or RDF in terms of an elegant programming model of Java, while being able to wrap and integrate multi-language legacy software within the solid software engineering framework of CORBA.
Figure 3: Top view representation of the Object Web RTI.
We are now testing this concept and extending JWORB functionality by building Java CORBA based RTI implementation structured as a JWORB service and referred to as Object Web RTI (see Fig. 3). Our implementation includes two base user-level distributed objects: RTI Ambassador and Federate Ambassador, built on top of a set of system-level objects such as RTI Kernel, Federation Execution or Event Queues (including both time-stamp- and receive-order models). RTI Ambassador is further decomposed into a set of management objects, maintained by the Federation Execution object, and including: Object Management, Declaration Management, Ownership Management, Time Management and Data Distribution Management.
RTI is given by some 150 communication and/or utility calls, packaged as 6 main management services: Federation Management, Object Management, Declaration Management, Ownership Management, Time Management, Data Distribution Management, and one general purpose utility service.
Our design is based on 9 CORBA interfaces, including 6 Managers, 2 Ambassadors and RTIKernel. Since each Manager is mapped to an independent CORBA object (see Fig. 4), we can easily provide minimal support for distributed management by simply placing individual managers on different hosts.
Figure 4: RTI services (rounded rectangles) implemented as CORBA objects in Object Web RTI.
To be able to link C++ clients with OWRTI, we developed a C++ library (see Fig. 5) which: a) provides RTI C++ programming interface; and b) it is packaged as a CORBA C++ service and, as such, it can easily cross the language boundaries to access Java CORBA objects that comprise our Java RTI.
Figure 5: Architecture of C++ RTI API that allows to link C++ RTI clients with Java RTI services of OWRTI.
Our C++ DMSO/CORBA glue library uses public domain OmniORB2.5 as a C++ Object Request Broker to connect RTI Kernel object running in Java based ORB. RTI Ambassador glue/proxy object forwards all method calls to its CORBA peer and Federate Ambassador, defined as another CORBA object running on the client side, forwards all received callbacks to its C++ peer. This library is running on Windows NT, IRIX and SunOS systems.
3.2 Parallel ports of selected M&S modules
In parallel with
prototyping core WebHLA technologies described above, we are also analyzing
some selected advanced M&S modules such as the CMS (Comprehensive Mine
Simulator) system developed by Steve Bishop’s team at Ft. Belvoir, VA that
simulates mines, mine fields, minefield components, standalone detection
systems and countermine systems including ASTAMIDS, SMB and MMCM. The system
can be viewed as a virtual T&E tool to facilitate R&D in the area of
new countermine systems and detection technologies of relevance both for the
Army and the Navy. We recently
constructed a parallel port of the system to Origin2000, where it was packaged
and can be used as either a DIS node or as an HLA federate.
Origin
systems support a variety of parallel programming tools. These include: a) Parallel Compilers that take sequential
Fortran, C or C++ codes, optionally with some user-provided compiler directives
(pragmas), and they return the corresponding parallel codes, generated in
either fully automatic and semi-automatic (compiler directives based )
parallelization modes; b) Message Passing libraries such as MPI, PVM, and Cray
SHMEM; c) Scientific & Math Libraries as made available in the Silicon
Graphics Cray Scientific Library (SCSL) and other third party libraries; d) Operating system-based inter-process and
inter-thread communication via standards-based sockets, pthreads, and shared
memory.
Based
on the analysis of the sequential CMS code, we found the semi-automatic,
compiler directives based approach as the most practical parallelization
technique to start with in our case.
The most CPU-intensive inner loop of the CMS simulation runs over all mines in
the system and it is activated in response to each new entity state PDU to check
if there is a match between the current vehicle and mine coordinates that could
lead to a mine detonation. Using directives such as 'pragma parallel' and 'pragma pfor' we managed to partition the
mine-vehicle tracking workload over the available processors, and we achieved a
linear speedup up to four processors. For large multiprocessor configurations,
the efficiency of our pragmas based parallelization scheme deteriorates due to
the NUMA memory model. Indeed, on a distributed shared memory architecture such
as Origin, the latency for a CPU to access main memory increases with the
distance to the physical memory accessed and with the contention on the
internal network. In consequence, the cache behavior of the program has a
significant impact on the performance. To assure scalability across the whole
processor range, we need therefore to enforce data decomposition that matches
the already accomplished workload/loop decomposition. Our initial experiments
with enforcing such full decomposition by using a combination of pragma
directives did not succeed, most likely due to rather complex object oriented
and irregular code processed in the CMS inner loop. We are currently rebuilding
and simplifying the inner loop code so that the associated memory layout of objects
is more regular and hence predictable. For example we are replacing linked
lists over dynamic object pointers by regular arrays of statically allocated
objects etc.
Having
ported the CMS code to Origin2000, we also converted it from DIS to HLA simply
by mapping all PDUs to the corresponding HLA interactions and by using C++ RTI
API that offers connectivity between the C++ and Java RTI models. An HLA port
abstraction layer was introduced that allows for easy switch between DIS and
HLA communication modes with the minimal modification of the original code. CMS
Federate Ambassador receives interactions from RTI and it passes them to the
HLA port that translates them into DIS PDU, to be further parsed internally by
the legacy CMS code. In a similar way, HLA port translates PDUs generated
internally by the CMS into HLA interactions before passing them to the RTI
Ambassador.
3.3
JDIS: DIS-HLA
Bridge and Event I/O Manager
DIS/HLA bridge for CMS described above was constructed internally inside the CMS federate code. In an alternative approach, explored for ModSAF modules, we use another DIS-HLA bridge that operates as an independent process and acts as DIS node on the input channel and as an HLA federate on the output channel. We constructed such a bridge called JDIS in Java, starting from the free DIS Java parser offered by the dis-java-vrml working group of the Web3D Consortium and completing it to support all PDUs required by the linked ModSAF + CMS simulators.
JDIS provides linkage between DIS applications such as ModSAF and CMS running as HLA application. JDIS receives DIS PDUs produced by ModSAF, translates them into HLA interactions and lets RTI forward them to CMS.
JDIS can also write / read PDUs from a file and hence it can be used to log and playback sequences of simulation events. We also used this tool to generate point-like PDU probes as well as stress test PDU sequences when testing and measuring performance of Parallel CMS.
Figure 6: JDIS
Front-End control and display panel.
Visual front-end of JDIS, illustrated in Fig. 6, supports runtime display of the PDU flow, and it offers several controls and utilities, including: a) switches between DIS, HLA and various I/O (file, database) modes; b) frequency calibration for a PDU stream generated from file or database; c) PDU probe and sequence generators; d) simple analysis tools such as statistical filters or performance benchamrks that can be performed on accumulated PDU sequences.
We have written implementations for converting most of the PDUs in the XML format. These include the Entity State, Detonation, Collision, MineField, Transmitter, Receiver, Acknowledge, Designator, Fire, Repair Response, Repair Complete, Resupply Cancel, Resupply Received, Signal, Service Request, Signal, Start / Resume, Stop / Freeze, Mine, MineField Request, MineField Response, MineField State and MineFeld Nack.
There is an ongoing standardization effort within the HLA community called A Real-Time Platform Reference FOM (RPR-FOM) with the aim to define a FOM that offers complete mapping of DIS. Our JDIS will fully comply with RPR-FOM after the standard specification if completed.
3.4
PDUDB: Event DB
Logger and Playback Federate
Playing
the real scenario over and over again for testing and analysis is a time
consuming and tedious effort. A database of the equivalent PDU stream would be
a good solution for selectively playing back segments of a once recorded
scenario. For a prototype version of such a PDU database we used Microsoft’s
Access database and Java servlets for loading as well as retrieving the data
from the database using JDBC.
The
PDU logger servlet receives its input via HTTP PORT message in the form of
XML-encoded PDU sequences. Such input stream is decoded, converted to SQL and
stored in the database using JDBC. The DIS header field common to all the PDUs
is stored in a separate table from the PDU bodies. This table has all the
attributes in the DIS header like the PDU type, timestamp, etc. When the PDUs
are generated from the database this timestamp value is used for calculating
the frequency with which the PDUs are to be send. Some PDUs have some variable
number of attributes like articulation parameters etc., and these attributes
are stored in separate tables so the entire database is normalized.
The
Playback is done using another servlet that sends the PDUs generated from the
database as a result of a query. The servlet is activated by accessing it from
a web browser. Currently the queries are made on timestamps. But any possible
queries can be made on the database to retrieve any information. The servlet
can send the PDUs either in DIS mode or in HLA mode.
PDU-DB
also has a dynamic HTML interface to the tables and data stored in the
database. The interface lets you select any of the PDU tables and browse
quickly and efficiently through the individual records or groups in the table.
3.5
SimVis: DirectX based Battlefield Visualizer
In our Pragmatic Object Web approach, we integrate CORBA, Java, COM and WOM based distributed object technologies. We view CORBA and Java as most adequate for the middleware and backend, whereas COM as the leading candidate for interactive front-ends due to the Microsoft dominance on the desktop market.
Of particular interest for the M&S community seems to be the COM package called DirectX which offers multimedia API for developing powerful graphics, sound and network play applications, based on a consistent interface to devices across different hardware platforms.
Figure 7: Sample screen from CMS
simulation in SimVis.
Using DirectX/Direct3X technology, we constructed a real-time battlefield visualizer, SimVis (see Fig. 7) that can operate both in the DIS and HLA modes. SimVis an NT application written in Visual C++ and it contains the following components: a) internal HLA-DIS bridge, constructed in a similar way as for Parallel CMS discussed above; b) a fast Winsock library based PDU parser; c) Direct3D based rendering engine. PDU parser extracts the battlefield information from the event stream, including state (e.g. velocity) of vehicles in the terrain, position and state of mines and minefields, explosions that occur e.g. when vehicles move over and activate mines etc. The parser performs also suitable type conversions for the network to NT data formats, it constructs the suitable memory data structures and it passed them to the viewer.
SimVis visual interactive controls include base navigation support in terms of directional keys (left, right, up, down, Home, End) as well as some other modes and options, including: a) various rendering modes (wireframe, flat, Gouraud; b) mounting the camera/viewport on a selected vehicle or plane; c) several scene views such as Front, Top, Right, Left and Satellite Views.
4. APPLICATION EXAMPLE: PARALLEL CMS
We illustrate now how all WebHLA components described above cooperate in one specific HPC FMS application – Parallel CMS.
Figure 8: Parallel CMS demo at SC’98
in Orlando, FL.
CMS
is an advanced DIS system under development by the Night Vision Lab at Ft.
Belvoir, VA. CMS simulates a broad spectrum of mines and minefield to interact
with vehicles such as those provided by ModSAF, on the virtual battlefield.
Modern warfare can require millions of mines to be present on the battlefield,
such as in the Korean Demilitarized Zone or the Gulf War. The simulation of such battlefield arenas
requires High Performance Computing support. Syracuse University is building
Parallel and Metacomputing Support for CMS by porting the CMS module to
Origin2000 and linking it with a collection of distributed simulators handling
terrain, vehicles and visualization.
Our
early results for Parallel CMS were demonstrated at Supercomputing ’98. The
overall configuration of the demonstrated system, still based on the DIS
communication and the multicast/MBONE networking is illustrated in Fig. 8 and it includes: a) Parallel CMS running
on Origin2000; b) a set of ModSAF vehicles running on SGI workstation; and c)
real-time 3D visualization front-ends, including Mak Stealth and our SimVis
tool described before.
More
recently, we completed the process of converting Parallel CMS from a DIS node
to an HLA federate, and we also constructed the JDIS bridge that allowed us to
effectively treat ModSAF nodes as HLA federates. Finally, we also completed the
PDUDB federate that allows us to log simulation events (DIS PDUs or the
equivalent HLA interactions) in an SQL database and reply the whole simulation
or its selected/filtered segments on demand.
Figure 9: Architecture
of WebHLA based Parallel CMS.
The
overall configuration of the most recent,. HLA-compliant version of Parallel
CMS system is presented in Fig 9. Parallel CMS federate runs on Origin2000 at
ARL MSRC, other modules (ModSAF, JDIS, PDUDB, SimVis) are running on NPAC SGI
and NT workstations. Using JDIS, we can easily switch between DIS and HLA modes
and between real-time and playback simulation modes. The latter is useful for
analysis and demonstrations. In particular, we also constructed a mobile laptop
demo in the playback mode with Microsoft Access based PDUBD federate, Javasoft
JDK based JDIS, and DirectX based SimVis.
5. WEBHLA BASED METACOMPUTING
Our initial experiments with Parallel CMS described above were promising but they also indicated some missing and urgently needed infrastructure components. We managed to get Parallel CMS running both at CEWES and ARL MSRCs but only in the human-operated demo mode. What we would like to accomplish is a sustained Metacomputing CMS capability i.e. a set of distributed services that are ready to be activated any time by any authorized user in a transparent fashion with respect to the sites, hardware platforms, data decomposition models, parallelization techniques etc. involved in a distributed metacomputing simulation.
Support for such transparent service model is essential since the individual HPC resources in specific centers might be busy or temporarily not available or non-optimal for a particual task. For example, our performance analysis of Parallel CMS indicates that the NUMA architecture of Origin2000 might not be optimal for CMS and hence the large scale minefields might be more efficiently implemented on distributed memory architectures. Also, a human-operated multi-MSRC metacomputing application is non-practical due to the need for multiple secure login for each application run. Hence, some suitable security considerations are required to enable automated metacomputing services without compromising security of individual centers.
In this final chapter, we
summarize the current status of our efforts towards building WebHLA based
Metacomputing infrastructure that would be capable to address the management
issues listed above.
5.1 Application Challenge: Million Mines
CMS
So far, we were running Parallel CMS at Origin2000 platforms at the ARL or CEWES MSRCs, and with other modules running on NPAC cluster. Test scenarios we were running so far included heavy breach operation through a minefield of some 30,000 mines located in Ft. Knox terrain. Nigh Vision Lab at Ft, Belvoir, VA, the primary user and developer of CMS, is interested in simulations including large scale minefields of 1 million and more mines, such as used during the Gulf War or deployed in the Korea Demilitarized Zone. We intend to use such “Million Mines CMS Challenge” as an application driver for WebHLA based Metacomputing framework. To address this challenge, we will need to enforce full scalability of parallel CMS federate across a single Origin2000 system, as well as to sustain a metacomputing operation of a set of such HPC platforms or equivalent workstation/PC clusters, processing individual components of a large scale distributed battlefield. The Origin2000 scalability issues were discussed in Section 3.2, whereas here we focus on the metacomputing coordination and integration.
We will use WebHLA infrastructure developed so far and we will build an automated decomposer that distributes large minefields over a set of machines so that each platform runs a Parallel CMS federate over a suitable minefield component / subset and all such components cooperate within a Parallel CMS Metacomputing Federation. We intend to use initially ARL and CEWES HPC facilities such as Origin2000 and NPAC workstation and PC clusters. In the next step, we will also explore the inclusion of HPC facilities from the selected Distributed Centers involved in FMS activities such as SPAWAR or NRL. Initial demo of Metacomputing CMS will be presented during SC99 where will try to attack for the first time the Million Mines CMS Challenge.
5.2 Metacluster Management
We intend to use the Million Mines CMS as a driving application to address, bootrstrap and test a suite of WebHLA tools for metacomputing management. Our initial focus is on providing fault tolerance support in a distributed heterogeneous environment. Indeed, we would like to experiment with various CMS scenarios, including automated decomposition of minefield components and/or vehicle simulators over currently available distributed resources. Hence, a robust fault tolerance support seems to be a critical service to enable and sustain such experimentation runs.
In our approach, we run JWORB servers on all nodes of a distributed environment and we build metacomputing management services by extending the standard JWORB services. This way, we have at our disposal all Web/Commodity services of CORBA, Java, COM and XML which can be reused, customized or extended towards more specialized metacomputing platforms, scalable from NT or Linux clusters up to multi-MSRC ensambles of HPC systems. Due to homogeneity and platform neutral JWORB architecture, we can naturally perform initial experiments on smaller clusters and to explore scalability towards wide-area configurations in the next stage. So far, we performed some initial experiments with fault tolerance support in JWORB, using a local area NT + Linux cluster at NPAC and two implementation modes: a) pure CORBA service; and b) Object Web RTI based service. Below, we describe both approaches in more detail.
5.3 Fault Tolerance Support in CORBA/JWORB
The objective of Fault Tolerance in Cluster Management is to avoid disruptions due to hardware or software failure, thereby improving availability of a distributed system. Two main aspects of Fault tolerance are: a) Detection of Failure; and b) Moving the critical system and application resources from the faulty node to another working node. In the following, we focus on the failure detection service.
One of the techniques used in clusters for failure detection is Failover. The essence of this concept is that computers watch one another - if one fails, the other takes over. Each working node sends out simple messages, known as heartbeat signals, around the system. The listener node keeps track of these signals and sets suitable timers. If the signals fail to arrive within the set time frame, the node whose heartbeat didn’t arrive is declared dead and an appropriate failover action is taken.
As part of our initial work with building core cluster management techniques in JWORB, wee designed and implemented the following two heartbeat algorithms, based on: a) hierarchical/hybrid topology; and b) virtual ring topology.
In a hybrid approach, servers are grouped in a hierarchical order. In each such small group of 5-10 servers with full interconnect only one (central) server communicates with the parent. Any other server in a group can take over the central server role whenever the current one goes down. The algorithm scales well because each server needs to sustain connections only with its siblings in the current group and with its children in the subgroup.
In a Virtual Ring approach, servers are arranged in a logical ring, where each node keeps track of its right and left neighbors. Hearbeat messages form a token ring and the failure detection is immedate. This scheme provides a decentralized environment, obviating the need of a central server thus eliminating a central point of failure. This algorithm does not scale easily but is very efficient in a small group of servers.
Both algorithms discussed above were recently implemented as CORBA services in JWORB and tested at NPAC on a heterogeneous commodity cluster of NT, Linux and UNIX nodes.
5.4 Fault Tolerance Support in Object Web
RTI
Based on lessons learned in building core cluster management techniques as CORBA services described above, we developed in the next step similar services using Object Web RTI (see Section 3.1). In this approach, we view the whole cluster as a Federation to which the individual cluster nodes join as federates. Cluster management is implemented by designating individual nodes as federates and using RTI management services. Thus we use the RTI concepts of federations, federates, interactions and shared attributes to build a robust cluster management utility.
Each server sends out its attribute updates at regular intervals. These serve as heartbeat signals which denote that a particular server is alive. The central server subscribes to these attributes and thus receives notification from RTI for each update. Whenever a heartbeat signal is received the central server sets the flag for that server to be true. The central server also maintains a thread which checks the flags for each federate at regular intervals. If the flag for any federate remains false for more than 3 iterations, the central server tries to communicate with that node. If a response is achieved the timer is reset and normal execution continues, whereas if no response was received, the node is declared dead and a suitable notification is sent as a RTI interaction to all machines in the group.
The central server also sends out regular interactions as its own heartbeat signals and each of the federates subscribes to these interactions. Each federate maintains a timer which checks for these heartbeat signals form the central server. If the central server is dead the timer goes off for each of the federate. The server which has the lowest ID takes over the central server role and broadcasts this info to the whole group. Thus the cluster is robust to failure of any node and it automatically takes suitable corrective actions.
5.5 Towards WebHLA Component Management
Our initial WebHLA based metacomputing experiments described above were focused on system level services such as fault tolerance and failover support for cluster management. In the next step, we plan to extend this approach by including also the user and application degrees of freedom and by performing mapping on suitable HLA constructs. Indeed, in a distributed multi-user environment, each node can run modules (to be mapped on some user level federates) that are owned by various users and are associated with various distributed applications (to be mapped on some user level federations). Hence, in a similar way as each cluster node can represent itself as a system level federate, participating in the Cluster Federation (CF), all user level federates on a given node would join Node Federation (NF), and all federate modules forming a particular distributed application would join Application Federation (AF). A set of heartbeat monitors, running across these three federations would assure high availability of the whole cluster, failover support for all nodes and fault tolerance for all distributed applications.
The WebHLA approach sketched above will allow us to integrate in an elegant way three apparently different managmenent techniques and computational dimensions of metacomputing: cluster management as in CF, component management as in NF and distributed application management as in AF, here all mapped on suitable HLA management services.
6. SUMMARY
We presented here our approach towards building WebHLA as an interoperable simulation environment that implements new DoD standards on top of new Web / Commodity standards. In particular, we illustrated how HLA/RTI can cooperate with Java (used in JWORB, OWRTI, JDIS), CORBA (used in JWORB, OWRTI), XML (used as alternative representation for FED files and as a wire format for DIS PDUs) and Microsoft COM (used via DirectX in SimVis front-end). Parallel CMS described above is a rather sophisticated FMS application and yet we were able to construct it in a low budget academic R&D environment by making the optimal use and exploring fully the synergy between all involved Web/Commodity technologies listed above.
Hence, our initial results are encouraging and we therefore believe that WebHLA with evolve towards a powerful modeling and simulation framework, capable to address new challenges of DoD and commodity computing in many areas that require federation of multiple resources and collaborative Web based access such as Simulation Based Design and Acquisition.
We also illustrated our initial experiments towards building general purpose Metacomputing infrastructure on top of WebHLA that would be applicable also for other CTAs within the DoD HPC Modernization Program. In the first set of demos in this area, planned for SC99, we intend to use Million Mines CMS Challenge as a large scale FMS application driver that will allow us to quantify the metacomputing requirements and to test the system on some realistic large scale application of relevance for the DoD users.
7. ACKNOWLEDGEMENTS
This work was partially funded by the DoD High Performance Computing Modernization Program’s Programming Environment and Training (PET) project, including ARL Major Shared Resource Center support, Contract Number: DAHC 94-96-C-0010 with Raytheon Systems Company, and CEWES Major Shared Resource Center support, Contract Number: DAHC 94-96-C-0002 with Nichols Research Corp.
8. REFERENCES
Defense Modeling and Simulation Office (DMSO) 1998, High Level Architecture, http://hla.dmso.mil.
D. Bhatia, V. Burzevski, M. Camuseva, G. Fox, W. Furmanski and G. Premchandran, 1997, WebFlow - a visual programming paradigm for Web/Java based coarse grain distributed computing , Concurrency: Practice and Experience, vol. 9, no. 6, June 97, 555-577.
Robert Orfali and Dan Harkey, 1998, Client/Server Programming with Java and CORBA , 2nd Edition, Wiley.
G. C. Fox, W. Furmanski, H. T. Ozdemir and S. Pallickara, 1999, Building Distributed Systems for the Pragmatic Object Web, Wiley, (in progress).
G. C. Fox, W. Furmanski, S. Nair, H. T. Ozdemir, Z. Odcikin Ozdemir and T. A. Pulikal, WebHLA - An Interactive Programming and Training Environment for High Performance Modeling and Simulation, In Proceedings of the SISO Simulation Interoperability Workshop, SIW Fall 98, paper SIW-98F-216, Orlando, FL, Sept 14-18, 1998)
G. C. Fox, W. Furmanski and H. T. Ozdemir, 1998, Java/CORBA based Real-Time Infrastructure to Integrate Event-Driven Simulations, Collaboration and Distributed Object / Componentware Computing, In Proceedings of PDPTA’98 (Las Vegas, NV, July 98.
G. C. Fox, W. Furmanski, S. Nair,
H. T. Ozdemir, Z. Odcikin Ozdemir and T. A. Pulikal, WebHLA
- An Interactive Multiplayer Environment for High Performance Distributed
Modeling and Simulation, In Proceedings of
the International Comference on Web-based Modeling and Simulation,
WebSim99, San Francisco, CA January 17-20 1999.