NPAC
Support for Sandia Commodity Clustering
(C-Plant) Technologies –
Pilot
Project Final Report (Draft)
Northeast Parallel
Architectures Center, Syracuse University, Syracuse NY
November 22, 1998
1. Executive Summary....... 1
2. Sandia C-Plant Control Software Requirements 2
3. JWORB based Approach Proposed by NPAC............... 2
3.1. Pragmatic Object Web. 2
3.2. Java Web Object Request Broker (JWORB)...... 3
3.3. High Performance commodity computing (HPcc)........... 3
3.4. Relation to
Other Metacomputing Systems.... 4
3.5. Recommended Architecture 4
3.6. High Level Architecture for Meta-Computing
Federation.... 5
4. Tasks Accomplished So Far............. 6
4.1. Commodity Cluster Testbed Installation
and NPAC........... 6
4.2. Core Cluster Management Functions in
JWORB: Heartbeat, Failover........ 7
4.3. WebFlow as Visual Componentware
Dataflow Facility for JWORB........ 8
4.4. XML Support in JWORB for Uniform
Middleware Datastore..... 9
4.5. HLA Federation model for
Meta-Metacomputing 9
4.6. Building JWORB based Distributed +
Remote Computing Application Demo 10
5. Unanswered Questions and Possible Next Steps.. 12
5.1. Thin versus Fat Nodes 12
5.2. Choice of the Fine Grain HPC Cluster
Management Software..... 12
5.3. Security 12
5.4. Resource Database 13
5.5. Load Balancing... 13
5.6. Metadata Framework 13
5.7. High Level Application Frameworks 13
5.8. Specific Application Requirements for Sandia, C-Plant and ASCI..................... 14
6. Related 1998 Publications 14
We report here on the results of our pilot project with Sandia Labs, conducted in summer and fall 1998 and focused on providing initial support in the area of commodity cluster management software for Sandia C-Plant architecture. We summarize here the requirements identified during the pilot project phase (Chapter 2), we describe our proposed architecture based on our JWORB (Java Web Object Request Broker) server that integrates several commodity technologies of Java, CORBA, COM and XML (Chapter 3), we summarize the tasks accomplished so far (Chapter 4), we collect the list of unanswered questions that could provide natural next steps for a possible follow-on project (Chapter 5) and we include a list of our 1998 papers that illustrate positive response to our ideas and prototypes from various computational communities. Attached to this report is our SC98 demo handout (described in Section 4.6) and our recent RCI report that offers a comprehensive review of our recent work in various domains.
Based on NPAC visit at Sandia in June and the follow-on email exchanges, we collected necessary information and constructed an effective set of requirements to be imposed on C-Plant control software design, addressed by our pilot project. The requirement list includes the following items:
·
The
software should run on heterogeneous community clusters, including Linux, NT
and Solaris nodes;
·
The
software should support or be compatible with the cluster management utilities
under development by Sandia such as bebpod, pct, pingd and yod;
·
The
overall architecture should be scalable
and capable of delivering robust supercomputing services;
·
The
software should be consistent with the overall ASCI wide DRM architecture based
on 7 conceptual layers;
·
The
software should facilitate interoperability between C-Plant nodes and other
entities, managed by a variety metacomputing models including Globus, Legion
and others;
·
The
software should be compatible with the high performance commodity networking
modules such as VIA.
·
Applications
include both classic HPCC simulations but also emerging manufacturing requirements
of linking to persistent object product databases and of supporting many
engineers whose individual needs could be quite modest.
Based on the analysis of requirements listed above, we proposed a model and methodology for C-Plant control and management software based on JWORB (Java Web Object Request Broker) middleware architecture, developed recently by NPAC and now being adapted for several application domains. We describe here in more detail our proposed architecture. Note that JWORB is standards compliant for the four commercial object models described in section 3.1. Thus our solutions built on JWORB can be re-implemented in terms of commercial servers as these become available.
We observed that
today’s and coming distributed and distance computing technologies are becoming increasingly influenced and shaped
by the four major commodity software forces: Java by Sun Microsystems, CORBA by
Object Management Group, DCOM by Microsoft, and the new World-Wide Web
Consortium technologies such as XML or
HTTP-NG. We believe that the best results in robust distributed systems
engineering can be obtained today by mix-and-match and domain-specific
adaptation of these four, partially complementary and partially competing
technology domains. We term this emergent environment as the Pragmatic Object
Web (POW) and we refer to the four major players as POW Stakeholders. We are
formulating our approach in the coming Wiley book on Building Distributed
Systems on the Pragmatic Object Web –
The Best of Web, Java, CORBA and COM. The attached RCI Report New
Systems Technologies and Software Products for HPCC: Volume III – High
Performance Commodity Computing on the Pragmatic Object Web contains a more HPC centered overview of our concepts,
prototypes and recommendations. Our other 1998 publications summarized in the
attached list contain specific projections of our concepts for the individual
computational domains and communities such as Modeling and Simulation, Test and
Evaluation, High Performance Computing, Web based Computing and
Metacomputing.
Central to our 3-tier approach is a powerful multipurpose middleware layer, given by a mesh of interactive JWORB (Java Web Object Request Broker) servers. JWORB is a multi-protocol network server implemented in Java. It currently supports and integrates HTTP and IIOP protocols, i.e. it can act simultaneously as a Web server and a CORBA client, broker or server. We are also building JWORB support for ORPC (Microsoft DCOM protocol) and for XML streaming. Being based on a set of open and actively developed standards, JWORB is functionally equivalent to a collection of servers, dedicated to the individual protocols listed above but our integrated approach offers substantial economy for the middleware software development, installation, management and support. On top of JWORB “software bus”, we build a set of core services and high level horizontal (generic) and vertical (domain specific) facilities, following the overall software architecture of CORBA. We differ from OMG by: a) developing prototype implementations, not only specifications, of all design components; b) mixing-and-matching design components from other competing technologies (such as COM or Web); and c) moving faster and more directly towards distributed and distance computing services and facilities in support for advanced metacomputing applications.
JWORB based approach is an example and a specific instantiation of our more general HPcc (High Performance commodity computing) concept which recommends to build modern HPC on top of Web/Commodity technologies. In this approach, High Performance is simply one of several “illities” addressed by CORBA and to be provided by a set of suitable Quality of Service add-ons and the associated Interceptors that assure access to high bandwidth communication channels and dedicated HPC hardware and software modules, but only when really required by the application. In such conceptually 3-tier model, most of the control and management tasks is performed by the high functionality modest performance commodity software in the middle tier (JWORB), and the control is passed to the (expensive) high performance back-end tier only to handle dedicated, computationally or bandwidth intensive operations. Experience from addressing realistic large scale applications shows that such genuine HPC components can be often represented by and encapsulated in a relatively small fraction of the whole application code, even if the associated HPC computation might consume large fraction of the global CPU time used by the application.
Current generation Metacomputing environments such as Globus or Legion follow an alternative, bottom-up custom rather than commodity software based development strategy. Such systems are tuned for and can offer more directly high performance for small focused problems but they can hardly facilitate development of high functionality realistic large scale metacomputing applications of relevance to the ASCI mission. We view Globus as a useful low level metacomputing toolkit which is however incomplete for building comprehensive high functionality middleware infrastructure for broad based metacomputing. We believe that building high level metacomputing services such as resource management or security can be accomplished much more efficiently on top of commodity technologies and standards rather than in a heroic from scratch development effort. In a similar way, Legion can be viewed as a custom distributed object system and hence several of its components can be now replaced by a suitable mixture of commodity technologies from the POW arsenal, most notably CORBA and DCOM. We can take several design lessons from Globus and Legion when building our environment but some specific implementations need to be reworked in our new overall software engineering framework based on commodity technologies and standards.
On the other hand, our approach naturally matches the 7 conceptual layers of the ASCI DRM model with our JWORB based tier 2 corresponding to the core DRM Resource Management Service, with the DRM User + Security + User Interface layers mapped on our Tier 1 Services and Facilities, and with the DRM Resources, Resource Interfaces and Resource Policies mapped into our Tier 3 Services and Facilities. It will clearly take a lot of experimentation and prototyping to fully specify, implement, test and fine-tune individual components of the ambitious DRM design. We feel that one should start this process from the middleware “software bus” layer and use the Java platform which, due to its portability, growing installation base and functionality, offers now a perfect programming environment for rapid prototyping of powerful middleware. We therefore propose to use JWORB model as a prototype implementation framework for ASCI DRM, to be applicable to all ASCI platforms and to be initially focused on commodity clusters such as C-Plant.
In the simplest, fully homogeneous and perhaps somewhat idealistic mode, a JWORB server operates on each node of a distributed or distance computing environment, and hence on each node of a C-Plant cluster. In a reality checked, fine-tuned and performance optimized production mode, some more structured, hierarchical and heterogeneous approach will be likely more appropriate and/or practical. For example, C-Plant project currently experiments with fat (service) and thin (compute) nodes and develops small dedicated control software modules for managing such scalable units. In such host-node model, JWORB would fit more naturally on the service/host nodes. However, following our HPcc philosophy, we would recommend for the initial bootstrap phase to use JWORB as a core component to be present on all cluster nodes, and to be perhaps scaled down to more optimized, dedicated daemons on the compute nodes later. Thus we suggest clarifying the required functionality and interconnect features of such node daemons after they are prototyped, tested and fine-tuned using the simpler and more robust (albeit potentially lower performance) JWORB based homogeneous model.
With JWORB server in each node of a metacomputing cluster, the overall software architecture becomes clean and powerful. Node application modules are written in the developers’ favored languages and they are wrapped as CORBA, COM or Java components, conforming to some high level programming model. One such distributed model, WebFlow, offering AVS-like computational graph based dataflow with visual authoring tools, is bundled with JWORB as a default horizontal facility for high level visual metacomputing. A mesh of JWORB servers manages itself via a suite of core cluster management services such as heartbeat and failover, and it communicates with the tier 3 computational modules and with the tier 1 user interface modules using suitable component wrappers and one of the JWORB supported protocols. For example HTTP can be naturally used for communication with browsers, IIOP for communication with the front-end ORBlets or the back-end CORBA servers, XML and Java for database linkage, ORPC for communication with the front-end Active X controls or the backend DCOM servers etc.
Detailed selection of specific protocols from the available JWORB suite is left for application developers. Our recommended approach for the optimial usage of the POW assets is the following:
· Use Java as the node implementation language only; minimize the use of Java RMI until Sun improves its implementation and clarifies the relationship with IIOP. The Java Grande forum has proposed high performance enhancements to RMI. However one should monitor new developments for Java distributed computing such as JINI and UC Berkeley’s ninja project which might bring near term implications for commodity cluster computing.
· Use COM for high performance desktop (e.g. DirectX for fast real-time visualization) and SAN (System Area Network) computing (e.g. VIA for fast optimized network communication)
· Use CORBA primarily for the Intranet scale distributed computation. Individual labs or projects will typically use ORB products from different vendors and the wide-area Inter-ORB interoperability, even if formally assured and enforced by CORBA 2, is yet to be verified and fully tested for the large scale Internet applications.
· Use XML for wide-area communication and data transfers between geographically separated labs, different organizations and their security domains. Convert XML data streams to your favored distributed object framework, process on your local Intranet, and transform the results back to XML for shipment to other sites or domains.
Even if we believe JWORB is a good and complete model, there will be always other schools of thought that lead to other metacomputing systems we need to federate with. We feel that federation technologies should come from outside of the core POW technology suite as none of the current Stakeholders has significant experience in federating with competitive approaches. In our opinion, the most promising standard in this area is currently offered by High Level Architecture (HLA) from the DoD Defense Modeling and Simulation Office (DMSO). DoD Modeling and Simulation (M&S) is perhaps the most advanced area of federally funded distributed computing, so far pursued quietly and in separation from the HPC community, but now getting merged via the DoD HPC Modernization Program (HPCMP). NPAC leads the academic part of the FMS (Forces Modeling and Simulation) technology area within the HPCMP and we have therefore the first hand insight into this field.
As a large, complex and heavily funded domain (representing some 10% of the overall DoD software budget), DoD M&S developed a spectrum of systems and paradigms, represented by different time management schemes, object models, communication strategies etc. These various simulation systems were pursued independently until mid 90s but now they are being forced to interoperate and federate due to the DoD downsizing and the associated budget cuts across the whole DoD. The resulting HLA architecture for federating independent simulation subsystems is a high quality, future oriented, domain independent IEEE standard. It is being currently proposed by DMSO to OMG as a new advanced CORBA Federation Facility, and it is already mandated as the DoD-wide simulation interoperability architecture to be fully enforced by the year 2001.
Further developments in HLA are monitored and recommended by the DoD AMG (Architecture Management Group) and facilitates by SISO (Simulation Interoperability Standards Organization). Large manufacturing industry such as Boeing appear to be seriously involved in HLA as this standard is likely to have major impact for Virtual Prototyping, Concurrent Engineering, Collaboratory Manufacturing and Simulation Based Acquisition. Overseas and non-DoD applications of HLA are starting to take off now that the standard is stabilized and passed to IEEE in 1997/1998.
We are involved in HLA activities within the DoD HPCMP and we also view HLA as a promising broader framework for Meta-Metacomputing, i.e. for federating various Metacomputing domains. In our JWORB framework, we developed recently a set of HLA related services, most notably the Object Web RTI i.e. full implementation of DMSO RTI 1.3 which represents the middleware communication layer or software bus of the HLA infrastructure. HLA federations are built, often from pre-existing application modules or subsystems, by specifying Federation Object Model which defines a set of distributed objects to by supported by RTI and to be available for sharing by all federates forming a federation. Each federate can join an existing federation or start a new federation and it can interact with other federates via the FOM elements. Federates communicate by publishing and subscribing to selected FOM objects or their attributes and/or by sending interaction objects (i.e. discrete events). Both the interaction delivery and shared attributes update is handled transparently by RTI via the performance and bandwidth optimized routing spaces.
In this pilot project that effectively started in July after our visit to Sandia, we addressed a set of initial tasks towards building a JWORB based commodity cluster management software with HLA based federation support for meta-metacomputing. We installed at NPAC a small testbed PC cluster to enable experiments with various software platforms, we developed the overall design for the JWORB services as outlined above, and we prototyped selected components including Heartbeat/Failover and XML support for Resource Database Management. We discuss below the specific tasks accomplished so far, followed by the list of unanswered questions or pending issues that can indicate possible next steps for a follow-on project.
We specified, ordered
and installed hardware & software required to construct a simple
heterogeneous commodity cluster that would act as testbed for this project.
Later on, we intend to reconstruct at NPAC as detailed as possible CPLANT
configuration but we decided to start with some natural commodity components to
give the project a quick start. Our cluster includes:
·
4 NT workstations, running standard Microsoft
desktop software, and managed by two NT servers running Microsoft Cluster
Server and other BackOffice servers (SQL Server, Exchange Server); and
·
b) 4-node Linux cluster, managed by Beowulf and
running standard GNU and/or RedHat utilities including compilers, thread
libraries etc.
Each node (both in NT
and Linux sub-clusters) also runs JWORB server. The mesh of these servers is
connected via core cluster management services described below and it also
enables higher level frameworks such WebFlow dataflow, XML datastore and HLA
federation, discussed in the following sections.
A mesh of JWORB servers can build a cluster provided that they know that the other servers are alive. Hence, one of the issues involved in the cluster management is high availability. The basic technique used in clusters to achieve high availability is Failover. The concept is simple enough: One computer watches another computer. If one dies the other takes over. So an important issue is to provide support for reliable and efficient detection of the node failure, followed by a subsequent move of the critical system and application resources from the faulty node to another working node.
Fault
tolerance support in CORBA is at the level of RFP and a set of pending
proposals. Java or XML do not offer any solution yet. COM offers a proprietary
solution called Wolfpack or Microsoft Cluster Server. While monitoring CORBA
Fault Tolerance activities and installing Wolfpack on NPAC cluster, we decided
to develop a simple failover support in JWORB as part of this project. To
provide this functionality, the following two algorithms have been designed and
implemented.
HeartBeat In this approach, a simple HeartBeat object is implemented
as a CORBA object. This Object is activated whenever the server gets activated
and tries to find out other servers that are running. A simplistic heartbeat
functionality can be provided in terms of a fully interconnected cluster or by
dedicating one node to act as a central server acting as a heartbeat monitor.
However, the former solution is not scalable and the latter solution is not
fault tolerant by itself.
Therefore,
our current design builds a hierarchical representation for the servers. Each
sub cluster will have 5-10 servers and each group’s representative forms
another such group on the higher level, and so on. In each group only one
server takes the central server role and it communicates with the other central
servers in upper hierarchy within another group.
Main
elements related to this service are: HeartBeat Daemon and HearBeatListener.
The HeartBeatDaemon offers an interface for providing information about the
state of the cluster based on registration to this service. It also starts up
the HeartBeatListener object. The HeartBeatListener sends periodic messages to
its peers. It also updates the state information by informing HeartBeatDaemon
object. If the central server dies, its role is taken over by the other member
of the group with the minimal ID value.
Keep Alive Ring In this design a cluster or sub-cluster is created by
forming a virtual ring of JWORB servers. This has been implemented by a
creating set of CORBA objects. When a server gets activated it can either join
a cluster or create a new sub-cluster. Since there is no central server it
eliminates a single point of failure. Any number of servers can create a
cluster.
When
a server joins a cluster, the ring is reorganized to include the new server.
Each entity in the ring is aware of its left and right neighbors. Each entity
sends periodic messages to its right neighbor and checks for messages from its
left neighbor. Thus any failure or crash is promptly detected and the cluster
reorganized. If the right node dies, a warning message is reflected back to the
left node and its left neighbors until all active nodes are notified and ready
to adjust the ring topology.
Main
elements in this service are: Daemon, Cluster Monitor and Ring Manager. The Daemon is invoked when the server is
activated and which provides functionality of joining a cluster or creating a
sub cluster. It also starts the ring manager. The Cluster Monitor is a visual
interface and provides information about the state of the cluster. This
information includes the number of servers, current servers and deactivated
servers. It also provides details about current servers like machine name,
neighbor’s etc. The Ring Manager creates threads for passing of messages as
well as receiving of messages. It adjusts the ring when new servers join When
an alive message is not received it tries to contact the neighbors of the
deactivated server to reorganize the
cluster.
WebFlow is a Web based distributed dataflow application, originally constructed at NPAC as 100% pure Java system, running on a mesh of Java Web Servers, using Java servlets for middleware management and offering Java applet based AVS-like visual authoring tools for assembling distributed computational graphs and monitoring the ongoing computations.
Current WebFlow
modules follow a custom Java interface model and the inter-module inter-server
communication is based on Java sockets. We are currently integrating JWORB with
WebFlow so that WebFlow modules could be represented as CORBA objects or
components such as JavaBeans. With core WebFlow middleware packaged as a CORBA
service, we will provide simple cluster management discussed above packaged as
additional CORBA service and layered on top of base WebFlow communication and
session management services.
In parallel with the
general support for high availability summarized above, we are also prototyping
specific cluster management tools for simple trial JWORB configurations and
specific high level application frameworks. Our initial focus is on single
hierarchy layer multi-node dataflow computation such as in our WebFlow
environment. The near term goal is to extend current WebFlow so that
·
module placement is automated via some simple
scheduling algorithm;
·
the system supports fault-tolerance via module
mobility.
We are currently
integrating the hearbeat/failover support for the JWORB cluster with the
WebFlow management services and we are designing simple module scheduling
algorithms based on round robin or minimal workload.
On another front of
Web/Commodity technology tracking, we decided that the time is right to get
seriously involved into XML. Indeed, following the previously specified XML
1.0 which already triggered major industry response in the first part of '98,
W3C also published in August the v1.0 specs for DOM (Document Object Model) and
XSL (XML Style Language) which enable middleware processing (DOM) and front-end
rendering (XSL) for XML files. This way, XML becomes now the prime time Web
delivery framework, useful for arbitrary structured information such as complex
multimedia documents, relational or object
databases etc.
In the context of
cluster management for this project, we view XML as a natural candidate for the
universal middleware framework to represent hardware and software computing
resources. XML’s powerful tag specification capability allows one to unify
metadata and IDL support for the configuration and object structure specification
needed in metacomputing. As the first step in this direction, we constructed
XML support for the HLA/RTI configuration files that are used by federations to
specify FOM (Federation Object Model) objects, shared by all federates. DMSO
HLA comes with custom configuration file formats for describing their FOM
resources. Our XML support represents the natural first step towards extending
HLA/RTI for other domains, such as general purpose cluster management planned
for the next phase of the Sandia project. Our work on developing a specific XML
application or little language discussed here (and called HLAML) is
accompanied with active monitoring of
the broad front of recent XML developments, including WIDL, XML RPC, WebBroker and
Microsoft SOAP.
In another recent XML
activity, targeting directly the clustering infrastructure, we constructed
JWORB servlet support for full XML processing pipeline, starting from the SQL
database or flat file XML input, then using XSL to translate it to HTML and
finally displaying in a Web browser.
Extensible Stylesheet
Language (XSL) is a framework for translating XML documents to other formats,
specified via a set of translation rules. JWORB currently supports this
functionality through the server side solution. Three software modules required
to translate XML to HTML are: XML parser, DOM (Document Object Model) support
for intermediate representation of the parse tree, and XSL processor that
translates the DOM tree to the required external format. We adopted base DOM
code from Docuverse's FreeDom package and we upgraded it to conform to DOM 1.0
Specification. We use Microstar's XML parser and Koala XSL processor developed by INRIA. For all these packages,
Java source code is freely available.
Under JWORB, we
defined servlet like resource handlers
for viewing XML based on file extensions and the prefix of the URI in the HTTP
header. The handler reads the required XML file, parses, and passes through the
XSL processor to obtain the customized HTML output.
Having establishing
navigability in the XML spaces using JWORB and current generation browsers, we
now proceed with building XML datastore for the cluster resource database,
including detailed description of hardware and software components available in
the system and to be used by management tools such as schedulers, load
balancers, performance monitors etc.
In a nutshell, our
metaclustering concepts can be summarized as follows:
·
No cluster management system is perfect for
everyone, but many useful specific solutions are already available;
·
Scalable federation of such specialized
'federates' seems to be the appropriate integration framework for
metaclustering;
·
This integration of coarse grain programs is appropriately
done at the commodity middle tier and does not need high performance. Load balancing
and fine grain management tasks will be implemented within particular federates
and could well involve the high performance tier.
·
HLA already offers a federation based
interoperability framework where federates = simulations;
We propose to extend HLA
to metacluster management with CPLANT scalable units as leaf node commodity
federates, integrated via HLA with other federates given by more traditional
HPC systems or heterogeneous clusters managed
by other cluster management environments.
As a first step in
this ambitious plan, we started to analyze the existing cluster management
systems. While waiting for more information from Sandia on CPLANT specific
management utilities, we are analyzing several well known systems including
Beowulf, BSP, Codine, Condor, LSF, NOW, SHRIMP as well as new Microsoft
products such as Wolfpack, Viper and Falcon. The goal is to identify common
aspects of these systems that could be quantified in the next step in terms of
a suitable Clustering FOM. Information being collected in this study is
available in a set of Web pages on our Sandia project Web site at
http://iwt.npac.syr.edu/projects/sandia.
A full HLA based
metaclustering environment is beyond the scope of this pilot project. By the
end of this year, we expect to have analyzed the existing clustering systems
and constructed the overall design of such a system, to be implemented in the
follow-on project. As discussed previously in our comments on ASCI DRM RFI, we
see natural synergy between our HLA based metaclustering model and the ASCI vision and needs. Our
Web/Commodity and Defense/HLA standards based approach offers a flexible
prototyping platform for ASCI DRM, capable to seamlessly integrate and
coordinate Globus, Legion, Lilith etc. and to add new distributed and distance
computing services coming from the Web, Desktop, Enterprise and Defense
domains.
Multi-language support Cluster
management in JWORB is implemented as a
Java CORBA service but we are also getting ready to support HPC application
modules of relevance for Sandia which are typically written in more
conventional languages such as Fortran, C and C++. We completed the core JWORB
implementation, including support for full interoperability between our (JDK
1.2 compliant) Java ORB offered by JWORB and the C++ ORB domain. As a
representative for the latter, we use omniORB v2.5 public domain GNU ORB under
development by UK Oracle & Olivetti Research Lab.
Capability demos Java/C++
interoperability was tested using DMSO HLA application Jager (a simple
multi-player video game distributed as part of DMSO HLA/RTI release). Original
Jager includes C++ clients, communicating via DMSO C++ RTI bus. We repackaged
it by wrapping Jager clients as CORBA C++ objects (using omniORB), talking (via
IIOP) to JWORB and using our Java based Object Web RTI (OW-RTI) communication
service.
We also tested the
system on a simple distance computing version of the Jager demo. To facilitate
demos on a popular commodity platform such as NT laptop, we replaced the DMSO
Jager front-end by a modified version of the Space Donuts game offered by
Microsoft as part of the DirectX SDK. We demonstrated it recently at SPAWAR,
San Diego during JSIMS/PANDA meeting in August, playing Jager Donuts smoothly
over the phone line between San Diego and Syracuse. The demo attracted interest
of several members of the planned JSIMS/PANDA project including representatives
from SPAWAR, Lockheed Martin and Boeing. The goal of PANDA is to build visual
programming environment for advanced DoD M&S systems such as required by
JSIMS
or Simulation Based
Acquisition.
Application demo options In parallel
with testing core JWORB and OW-RTI technologies for simple application demos,
we are exploring possible application projects. One option discussed during our
visit at Sandia is Alegra Monte Carlo code - here we are waiting for more
information from the Sandia side regarding access to the code. We also made
contact during this summer with Vernon F. Nicolette who is Sandia employee
(vfnicol@cfd.sandia.gov), based in CNY and interested to test the VULCAN ASCI
code on the JWORB cluster. Finally, we developed recently SC'98 demo for the
DoD HPC Modernization Program in the area of Metacomputing Modeling and
Simulations which could provide suitable computational topology for testing the
JWORB cluster management software. The demo includes a set of vehicles running
on various cluster nodes and propagating through a minefield, simulated on a
remote high performance system. We discuss this application in more detail
below.
Remote Application operational at SC'98 We
are using DoD’s Comprehensive Minefield Simulation (CMS) program as the first
trial application for our JWORB based cluster management system. We have just
completed the development of early CMS demo for SC'98. We could naturally align
its components to make a version that would be possibly adequate to mockup the
planned Sandia applications and in this way offer an effective testbed for the
JWORB cluster. CMS includes a set of vehicles, running on independent
workstations at NPAC using ModSAF (Modular Semi-Automatic Forces) simulation
system and connected to the CMS (Comprehensive Mine Simulation) system by Ft.
Belvoir, parallelized by NPAC and running on Origin2000 at ARL MSRC in
Aberdeen, MD and at CEWES MSRC in Mississippi. Visualization front-ends include
high end SGI based Mak Stealth virtual battlefield viewer from Mak Technologies
and our initial commodity software (DirectX) based 3D visualizer, running on NT
laptops.
Commodity version available for Sandia The
SC98 demo described above uses original ModSAF/CMS multicast connectivity over
MBONE. For the Sandia project purposes, we will wrap the demo components
(vehicles and the minefield) as JWORB/RTI federates and we will use JWORB
cluster management tools to control the placement of vehicles over nodes. In
the process of adapting this application as an initial Distance Computing demo for the Sandia
project, we will also further develop and adapt for Sandia application needs
our commodity graphics based DirectX front-end that will replace or operate
concurrently with the high end visualization systems such as Mak Stealth.
Cluster management tested/fine-tuned for Sandia needs We
have licenses and know-how and hence full control at NPAC over government owned
software components of the CMS demo such as ModSAF (which is 1M lines of C
code!) and CMS (which is some 100K lines of C++). We can therefore adapt, instrument and fine-tune both codes to align it best with equivalent
topologies of planned Sandia applications. The general analogy of 'vehicles'
viewed as 'particles' and 'minefields' viewed as 'fields' or 'propagation
medium' seems to indicate that such adaptation and instrumentation is feasible,
assuming we get suitable information about the Sandia application needs.
We conclude the report by listing a set of unanswered questions that could naturally point out towards next steps for a possible follow-on project.
As discussed in Section 3.5, we suggest to start with a virtually homogenous cluster management model, including JWORB server in each cluster node. However, we are also open for experiments with two-level model as pursued by Sandia, including service and compute nodes and running dedicated optimized daemons on the compute nodes. The choice between both models involves the usual tradeoff between performance and functionality, and hence it should be addressed and carefully investigated in one of the next step R&D tasks. Note that even if JWORB runs on a compute node, one can still achieve high performance as JWORB can for example, instantiate an MPI process which communicates outside JWORB in the high performance layer. This approach naturally supports a uniform approach to fault tolerance, database access etc. as each node has fully functional middle and high performance tier capabilities.
Assuming we proceed either directly or in the later stages with multi-component cluster management model, decisions need to be made regarding the software to be used, adopted or developed for the high performance fine grain node components. Following the current Sandia model initiated with the bebpod, pct, pingd and yod modules is one possible option. Exploring public domain offerings available with Beowulf or early commodity solutions such as VIA or Wolfpack are possible options. Emerging HPCC systems such as Apples from UCSD are also relevant. In general existing commercial and academic systems do not have our proposed multi-tier architecture and would need some amount of reworking which we believe is needed anyway to be consistent with commodity capabilities of systems such as Jini.
Security is of paramount importance for successful metacomputing in many domains, and especially for the ASCI Defense Program. We are monitoring capabilities in this area offered by DoE systems such as Globus or Akenti and we are comparing it with the general Security framework of CORBA. It seems that the latter is powerful and flexible enough to accommodate any, even the most demanding domain specific security demands. We are currently initiating the design of CORBA Interceptor support in JWORB which will offer core technology required by CORBA Security. With such Level 0 Support in place, we will be soon in position to address in more detail the Level 1 and Level 2 security requirements for ASCI and to design its possible layering over Kerberos, Akenti and other specific services.
We are starting the development of XML
databases for the Cluster Management
and Metacomputing Management Resources. Armed with powerful XML translation
framework such as XSL, now built into JWORB (see Section 4.4), we will be able
to quickly translate our database into any required format that might emerge as
a standard later on. Some decisions need to be made soon, however, on the type
and level of detail of the information on the individual hardware and software
resources to be collected, placed and maintained in such databases. Here NPAC
and Sandia are involved in the Datorr (Desktop Access to Remote Resources)
community activity, which should develop relevant standards to specify the
structure of both jobs and backend computational, network and information
resources. This project will also address associated services and in particular
could clarify security issues for metacomputing.
Load balancing algorithms need to be designed, developed and integrated with our clustering facilities. We will develop some rudimentary support based on round robin and average workload approaches but more refined load balancing techniques require more information about the specific application domains to be targeted by a cluster environment. Some of the current cluster management systems discussed in Section 5.2 include this service. Note we are here discussing the coarse grain aspect of load balancing. We view this assignment of cluster (CPLANT) nodes to a job to be part of the commodity tier. In contrast, placement and movement of fine grain data such as particles and grid points within different nodes assigned to a job, is naturally part of the HPCC tier. Often the load balancing needs of an application can be handled without linking the load balancing services on each tier. Eventually one should link them so as to be handle dynamic application requirements where the HPC layer could request more resources from the commodity tier. Some implementations may need to support dynamic resource assignment (perhaps due to detected faults or need to support other high priority jobs) imposed by the middle tier on the HPCC subsystem.
Metadata framework need to be established for coherent management of diverse cluster resources including hardware, software, people, applications, products, projects etc. We are currently exploring for these purposes the MOF (Meta Object Facility) model by OMG and the associated XMI (XML Metadata Interchange) model for the XML metadata streaming. This general-purpose commodity approach needs to be coordinated with more specialized application or domain oriented approaches taken by various projects and communities. As XML naturally integrates metadata and general object structure specification, we expect such a metadata framework to be developed as part of the Datorr process described above.
The current JWORB clustering system offers WebFlow i.e. visual distributed dataflow as one specific built-in high level application framework, suitable for coarse grain composition of distributed applications with relatively static connection topologies. Other, more dynamic frameworks might be required by specialized application domains. Such application domains needs to be identified and prioritized, and then grouped into categories corresponding to various high level frameworks to be identified, designed and supported in JWORB. In this way, one could define frameworks for areas such as Distributed Interactive Simulations, High Level Architecture, Metacomputing Federations, Agent based Computing, and Virtual Prototyping Environments for Concurrent Engineering and Simulation Based Acquisition.
We believe that our project will be more successful, if we could obtain more information about specific application requirements of relevance for the Sandia, C-Plant and ASCI mission. So far, we are proceeding along a rather general-purpose support path, but there is no system that can satisfy everyone needs and hence some specificity will be helpful to normalize our prototypes to Sandia and ASCI reality requirements.
We include here the list of our published 1998 papers to illustrate the positive initial response from various computational communities to our HPcc, POW and JWORB concepts. Communities addressed by the publications listed below include: General Book Publishers [1], High Performance Computing [2, 4, 5, 6, 13], DoD Modeling and Simulation [7, 14], Web Simulation [3], Test and Evaluation [9, 10, 11, 12], Distributed Computing [15], Internet Computing [8], Virtual Reality Modeling [16].