Focus Effort Title: HLA Integration for HPC Applications applied to CMS. Thematic Area(s): Scalable Computing Migration PI Name: David Bernholdt PI EMail Address: bernholdt@npac.syr.edu PI Telephone: 315 443 3857 PI Fax: 315 443 1973 Project Description: This proposal is in response to two related goals. Firstly the need for message passing based coarse grain parallelism for CMS and secondly integration of classic HPC applications into FMS simulations. The latter is essentially equivalent to HLA compliant versions of these applications. Both goals can be elegantly addressed by using a computing environment built around distributed (or meta) computing built on top of HLA. As the associated runtime RTI supports time management, this builds in a powerful scheduling model. We describe below, a prototype of these ideas built around the CMS application. The basic infrastructure can be applied to applications built under our "web interface" tasks to integrate them into HLA compliant simulations. From the parallel architectures point of view, this project will develop distributed memory version of CMS which can be then easily ported to distributed memory MPPs such as SP-2. From the metacomputing management point of view, this task develops and tests tools for runtime management of distributed resources, initially prototyped in a controlled commodity cluster envioronment, and in the next stage to be expanded to a multi-MSRC metacomputing platform. Finally, from the parallel hardware availability point of view, this task will effectively bring NPAC commodity cluster, including a mix of Linux and NT PCs, as one of the HPC resources available for our Metacomputing CMS experiments. By combining the metacomputing aspect and the parallel computing aspect of WebHLA, it will be possible to consider large-scale CMS simulations involving instances of the code running on multiple HPC systems at multiple sites, with each instance being parallel either in a distributed memory fashion via WebHLA or a shared memory fashion via the separate parallel CMS project and past work. The WebHLA system will operate in a cluster (distributed computing) environment as follows. Each node of such cluster will run JWORB with OWRTI service. Hence, each node can represent itself as a system level federate, participating in the Cluster Federation (CF). All user level federates on a given node join Node Federation (NF), managed by local JWORB server on this node. Finally, all federate modules forming a particular distributed application join Application Federation (AF). A set of heartbeat monitors, running across these three federations assures high availability of the whole cluster, failover support for all nodes and fault tolerance for all distributed applications. Such a WebHLA based Cluster Management service will be then ported to CEWES and ARL MSRCs to support Multi-MSRC Metacomputing CMS application running continuously or on-demand in a robust, fault-tolerant, highly-available self-sustained mode. The proposed WebHLA based Meta-Cluster infrastructure is expected to be also useful for other HPC applications that need to federate distributed multi-MSRC resources. Benefits: The DoD has mandated use of HLA for modeling and simulation applications in a short time frame (by 2004). The increasing use of high-fidelty physics-based simulations in conjunction with what might be considered more traditional FMS applications means that a growing number of DoD HPC applications will have to become HLA compliant in the coming years. This project will result in tools that will make it easier for HPC applications and HPC systems to take part in HLA-based modeling and simulation exercises in accord with the DoD mandate. Require Resources: Deliverables: o WebHLA based cluster management including CF, NF and AF Object Models (Jun'99) o Distributed CMS running on NPAC commodity cluster (Sep'99) o NPAC commodity cluster included into Metacomputing CMS experiments (Jan'00) o Expanding the cluster mgmnt tools to metacomputing environment in support of multi-MSRC parallel CMS (Mar'00) o Technical Report (Mar'00) Notes: To install and maintain a WebHLA installation as described in order to support on-demand use will require a modest amount of assistance from on-site staff of the CEWES MSRC. More specifically, a) A CEWES system administrator or other appropriate individual to assist with the installation of WebHLA, ModSAF, and CMS (expected to require no more than a week) b) Routine on-site support for JWORB acting as WebHLA CEWES Amassador (similar to maintaining a web server with small, fixed number of static pages)