Subject:
Re: Some Background
From:
Tilt Thompkins <tiltt@ncsa.uiuc.edu>
Date:
Thu, 19 Apr 2001 12:42:46 -0500
To:
fox@csit.fsu.edu



Hi Geoffrey

I want to try and work on two tracks for a bit. One track being the proposal draft which in many ways is independent of the real content and a second track which explores the real content. This message is an attempt to start the content track by responding to your Some Background message.

First of all, there was so much “content” in this message, that it has taken me quite a while to understand, or least generate my understanding of it. I suspect we will find out in short order if my improved understanding matches what you intended to say.

My first problem was to figure out where in the discussion thread to enter and respond. Your paragraph of”

One could conjecture that one can always look at a complex system in fine enough
detail that it can be represented in this multi-phase loosely synchronous fashion.
This would be a critical observation as it says we now have the key abstractions
to tackle arbitrary complex systems simulation.

seems best for me.

My view is that I don't care if “all possible” complex systems can be approached this way, but only that many very interesting ones can. My way of expressing this would be : (I think we are each using the word planning to mean different things, so beware in reading this.)
Many important public sector and private sector systems can be usefully modeled as demand-response processes that are resource constrained. Examples include business supply chains, military or business logistics chains, airline operations, product development systems and so on. Simulations of such systems are difficult because of the complexity of the demand-response processes and the large number of resource and constraint variables. Fundamental progress with respect to these difficulties has made using improved computing resources, i.e., hardware, networks, and grids, and computational strategies. We expect continued simulation progress for those systems that can be can be described as interacting, but loosely or asynchronous coupled objects where massive parallel processing techniques can be usefully applied. Current programs such as ???, ???, and ??? are rapidly advancing the simulation state-of-the-art, and we believe that many of the most important systems fall into this category..

Optimizations on real-world systems are made still more difficult as these demand-response processes usually operate in plan-evolve-plan-evolve …  cycles where planning outputs change as a function of process state and policy choices.  Examples include airline yield management systems, military unit deployments, new product introductions and so on.  We refer to optimizations that evaluate and improve planning and operating strategies or choose optimum policy parameters as policy-level optimization, and our objective is to implement a framework that supports policy-level optimizations in ways that are mathematically sound and support scaling, parallel operation and large-scale distributed operation.

I think we both agree that the key to effectively integrating the simulations and approaches is the DAOF (Decision Application Object Framework). I don't think the object model is critical (I like Java too), but do think the communication space is critical. Rough translation, I almost like JavaSpaces. (I hated Linda, but that is a different story for another day.)  I think if we changed the JavaSpaces associative matching and serialization (to be based on XML object descriptions ) that we would have most of the needed functionality. I also suspect we can get big gains in performance and reliability if we give up the notion of communicating with every computer and toaster on the net.

I really don't know much about HLA. Is there a “short” description available somewhere. I think we need to link to HLA.

Comments and brickbats requested and expected!!

Tilt



At 10:49 AM 4/3/2001 -0700, you wrote:
I enclose my understanding of approaches to "decision applications" and how Los Alamos
approach fits in.
We can invent a short description!
It would be a honor to be involved with your program.

-----------------------------------------------
Context: Towards a Software Architecture for Decision Applications
----------------------------------------------------------
We need to support the rich requirements of these applications in a way
that is mathematically sound and can lead to scaling parallelism. By
scaling, we mean that as problems get larger, we can expect to effectively
use parallel machines with larger number of nodes. The term "effectively" will
depend on details of the hardware and application but roughly one aims at "scaling speedup"
i.e. that if one increases size of both problem and machine by a common factor F,
then one achieves a performance (measured say in total teraflops/sec ) increase of F.
Particle dynamics and partial differential equation based applications typically
exhibit scaling speedup.

Note that all applications can be implemented as a set of objects generating and
responding to events. Events are just messages carrying a time stamp and when latter
is omitted (as say in most MPI parallel codes), one can easily see that it is either
implicit or irrelevant due to special features of application. Objects can be
described in terms of actors; fragments of Fortran code; DMSO's HLA; Java's Enterprise
Javabeans and many other frameworks. These correspond to different tradeoffs in ease
of use, functionality and performance.
This engineering decision of the appropriate framework, is not trivial but is usually
easier than deciding on a model and algorithm which leads to an object description
which can be both conveniently and efficiently implemented.

One can divide problems into a few coarse categories described below. These can all
be thought of as interacting objects. what counts is the size and sophistication
of the object and the time sequencing of the objects.

Loosely Synchronous:
--------------------
These applications have a natural global synchronizing parameter -- typically the
time of a simulation or an iteration count. The computation is divided into stages
by this parameter and the "simulation objects" typically do some computation at
a fixed value of this parameter and then have a stage where all exchange
information. The computations can be (spatially) very inhomogeneous but the temporal
structure of computation is quite regular. Parallel implementations are
very efficient and MPI is natural model as all nodes are synchronized by the
act of exchanging messages. Static or dynamic load balancing may be necessary
but this well understood.
There is a special case of this -- synchronous applications -- where the
computations are homogeneous. These are the problems where machines like the
CM2 can be used.
For particle dynamics the "objects" are particles and information is
exchanged between objects linked by a "force" in the physics.
For partial differential equations (PDE), the "objects" are grid points or collections
thereof. For an iterative solution technique, the connection between objects is
usually defined by the "stencil" of the PDE algorithm.
Looked at in fine enough detail, "Nature" is always (maybe too strong) loosely
synchronous as the fundamental equations of physics have such a synchronizable
structure. The majority of the compute intensive kernels of ASCI codes are
of this form as are all the fundamental science simulations on NSF supercomputers.
We have great confidence that these applications will continue to scale and
be able to make use of the expected petaflop machines of 2010.

Note that in loosely synchronous problems, the "objects" and messages expressing their
interconnection are typically small. This implies that one needs the low latency and
high bandwidth of "massively parallel machines". In judging needed bandwidth, one
notes that on very general principles ALL (whether loosely synchronous or not)
applications have a ratio of internal complexity to number of external links which
decreases as size of object increases. This translates into a needed ratio of CPU
power to needed communication power in any distributed or parallel computer used to
simulate complex system. One can quantify this in terms of a surface over volume ratio
expressed in terms of the dimension of the complex system.

Pleasingly Parallel
--------------------
These are the computationally simplest class of applications but many extremely
important and sophisticated problems fall into this category. Here one can have a
collection of arbitarily complicated objects which evolve in "time" in an
essentially independent fashion. Examples include
1) Objects are neutrons in neutron transport problems where each is evolved
independently through matter. Many but not all Monte Carlo problems have similar
structure.
2) Objects are credit card transactions across the world.
3) Objects are Web client-Web Server interactions (the problem AOL must solve),
 whether file access or search.

These are a major driver for distributed computing technology and are the
dominant commercial applications of high performance computing. They need
communication bandwidth to support load balancing (distribution of client
requests amongst the 1000's of EXODUS or AOL Servers) and access to common data.
Communication is not needed for inter-object communication. The "messages" are time
stamped but the time between different objects does not need to be synchronized.

These applications will also see scaling speedup and will both exploit and indeed
drive the commercial development of high performance systems.

Asynchronous
-------------
These are problems formulated in terms of inter linked objects exchanging
information encapsulated in terms of events i.e. time stamped messages.
Unlike loosely synchronous problems, there is no natural synchronizing
parameter and naively the problem is intrinsically sequential as one
must execute first the object with lowest time stamp and then globally
look for next object that can be evolved.
One typically gets this problem class at any time one views nature in terms
of "macroscopic objects" as is natural in fact in most decision applications.
Essentially all war-game and forces modelling applications fall into this class.
Very few academic or "fundamental" science or engineering problems are
formulated this way which is one reason they have received little attention
in the academic community. However essentially all DMSO (Defense Modernization
and Simulation Office) work only looks at asynchronous problems. There is a
large and unfortunate gulf between DMSO and high performance computing community.

DMSO developed a sophisticated set of standards for simulation objects
(called HLA for High Level Architecture) and their run time support (RTI).
This does not address in a profound way any of the essential difficulties
although it improves over previous DIS standards. HLA predated many recent
developments in distributed objects but I expect that any DAOF
(Decision Application Object Framework) could find helpful lessons in
HLA and RTI.

Asynchronous applications are addressed by so called event driven
simulations using one of two methods to address the fundamental difficulty
of scheduling events. This issue is particularly hard on parallel machines
for as commented above, naively objects cannot be evolved in parallel. For
"small" asynchronous problems the implementation is straightforward on
sequential machines. Achieving "scaling parallelism" is central difficulty.
Some DMSO applications are implemented in "real-time" on distributed systems.
Now (computer) clock time and simulation time are equivalent. We have
natural synchronization but typically very inefficient implementations.

1) Conservative Event Driven Simulation:
  One uses current simulation times of objects and bounds on when new
  events could be generated to identify those objects that can be
  simulated.
2) Optimistic Event Driven Simulation:
  One takes regular snapshots of object state and evolves objects in each
  node of a parallel machine whether or not one can guarantee correctness.
  If an event arrives at an object in its past; then object is rolled back
  to last snapshot and evolved forward in time.

I am not aware of any good reason to expect scaling parallelism except for
applications that are essentially loosely synchronous.
The software system SPEEDES developed originally at JPL and commercialized by
a small company METRON is current best practice for large discrete event
simulations. Unfortunately, SPEEDES includes much "home-grown" distributed
object technology which is probably better done by the rapidly evolving
commercial Object Web systems. SPEEDES is supported by DoD's JSIMS project
in DMSO and by the HPCMO (High performance Computing and Modernization
Office).

Meta-Problems
-------------
These can be considered as a special case of asynchronous problems. One has
multiple applications linked together in a single "meta-problem" -- analgous to a
"meta-computer" which is an aggregation of individual computers. In a meta-
computer, each component could be a massively parallel machine. Correspondingly in
a meta-problem, components can be large loosely synchronous codes. The essential
characteristic of meta-problems is that scaling parallelism is achieved internally
to each component and not from the structure of the meta-problem linkage. Thus
meta-problems will scale well even if the linkage between components prevents the
components from running in parallel with each other. Successes of so called
"computational grids" in the high performance computing field are largely either
meta-problems or pleasingly parallel applications. Examples of meta-problems
include

1) Data access, Simulation and Visualization stages in typical computer
 simulation in case where these are implemented as separate subsystems
 on different hardware.
2) Multi-disciplinary applications with say linked fluid flow, structures,
 acoustic and electromagnetic signature applications. This type is very
 common in large ASCI simulations. Often these massively parallel
 loosely synchronous applications will be asynchronously linked together
 and to some sequential control module containing optimization logic.
 Even if one component of a meta-problem is a small sequential application
 the full system will still have scaling parallelism.
3) In D division work, the IIT (Information Integration Technology) produces
 linked graphs of macro size objects,. This is a meta-problem.
4) Systems like AVS or Khoros support linked filters for Image Processing which
 give similar graphical representations to IIT.
5) Coupled Infrastructure simulations -- say Gas and Electricity as done by
 D division are meta-problems
6) In industry areas such "application integration" represent meta-problems
 where different parts of a business represent components.
Modern Internet (Object Web) technology has developed a lot of new capabilities
for handling meta-problems. Typically one sets such systems up in a "multi-tier
architecture" with interacting objects. Such interacting objects are of the
type imagined in SOFIA but objects are always large and so tradeoffs
are different from systems aimed at handling smaller objects. One formaulates
such systems with web front ends (portals) and with "services" such as
security, collaboration, job submittal, visualization ...

Decision Applications
---------------------
These are certainly meta-problems if one links together multiple infrastructures
(gas, electricity, roads) or just different types of objects.
For instance in a military simulation, one could consider aircraft, vehicles
and troops as separate components.
As already mentioned IIT naturally generates meta-problems with "environmental
conditions", "seeker performance" etc. as seperate components.
Meta problems are often built hierarchically; a meta-application representing
the simulation would be itself one component in a C4I command system which
would have multiple simulations, databases, real-time sensors, planning tools
linked together in a collaborative environment.

Usually the base simulations in a decision application are of the "asynchronous
type" described above and are forced to use event driven simulations. It is
then hard to do large scale simulations. For example, there is essentially no
DMSO style forces modelling problems currently running on the parallel machines
of the HPC modernization program.

Although a simulation like TRANSIMS is typically formulated as interacting
asynchronous objects, the Sequential Dynamical systems (SDS) approach breaks the
problem into phases where each phase is of a type where massive parallelism is
possible. Roughly one starts with a planning phase which is loosely synchronous
where the individual entities make plans for the next macro time step. Entities are
individuals, vehicles or communication packets. This is followed by a micro time
simulation phase where the entities are evolved in time. In each phase, one is
operating at a fixed global simulation time and so the natural scaled parallelism
of loosely synchronization or pleasingly parallel problems is achieved. This
process is iterated to achieve consistent answers.
SDS has similarities with other approaches -- in particular cellular automata or
Petri nets -- but has some key features which do not appear to be present in these
other cases. Cellular automata have massive parallelism but the sophisticated
planning phase of SDS is not normally present. Petri nets can represent complex
linkages but don't naturally give scaling parallelism.

One could conjecture that one can always look at a complex system in fine enough
detail that it can be represented in this multi-phase loosely synchronous fashion.
This would be a critical observation as it says we now have the key abstractions
to tackle arbitary complex systems simulation. We need

a) Distributed Object Web technology to support meta-problems including C4I and
 linked complex systems. This technology also supports pleasingly parallel problems.
b) Classic asynchronous event driven simulations for "small" problems where
 we wish to quickly get answers without the significant analysis effort
 needed by SDS
c) Loosely synchronous simulations for either SDS or classic scientific simulations.

My statement
"Roughly one starts with a planning phase which is loosely synchronous
where the individual entities make plans for the next time interval. Entities are
individuals, vehicles or communication packets. This is followed by a simulation
phase where the entities are evolved in time."
needs to be made more precise.
I think it would be useful to take the current SDS applications -- especially
the different types such as Transsims, ArchSIMS and AdhopNet -- and address this
issue. We must quantify what is the essential reason SDS can get
loosely synchronous scaling structure where the other approachs have failed.

DAOF (Decision Application Object Framework)
--------------------------------------------
There are many object paradigms -- Java, COM, CORBA, W3C(SOAP) and each
has pluses and minuses. although I happen to prefer Java, the decision on
implementation paradigm should be kept open. However I suggest that re-use
of components between different D division projects can be helped by
development of a common framework. This will enable re-use of both objects
and services such as collaboration and security.
HLA is probably not the right way to go except as an exemplar and as a
framework with which you may need to integrate.
I recommend designing such a framework and specifying all the objects and
their services in XML. This can be mapped into the particular interfaces
used by the paradigm needed. I have had success with such XML specification
supporting either java or CORBA objects.