Subject: Re: Some Background From: Tilt Thompkins Date: Mon, 09 Apr 2001 11:12:01 -0500 To: fox@csit.fsu.edu CC: tiltt@ncsa.uiuc.edu Hi I am back from some vacation, and am looking at this and a project proposal again. My basic project plan idea would be to compare this approach to results from event simulation approach, application to a selected in conjunction with the customers. Will use PET Project as an example. Resources would be a student/person at FSU, 1/2 of one of my staff here, travel, equipment? How is your time accounted for and would it help to budget any of your time? Any other suggestions/mods? Have you heard any feedback on the PET-2 proposal or presentation? Tilt At 10:49 AM 4/3/2001 -0700, you wrote: I enclose my understanding of approaches to "decision applications" and how Los Alamos approach fits in. We can invent a short description! It would be a honor to be involved with your program. ----------------------------------------------- Context: Towards a Software Architecture for Decision Applications ---------------------------------------------------------- We need to support the rich requirements of these applications in a way that is mathematically sound and can lead to scaling parallelism. By scaling, we mean that as problems get larger, we can expect to effectively use parallel machines with larger number of nodes. The term "effectively" will depend on details of the hardware and application but roughly one aims at "scaling speedup" i.e. that if one increases size of both problem and machine by a common factor F, then one achieves a performance (measured say in total teraflops/sec ) increase of F. Particle dynamics and partial differential equation based applications typically exhibit scaling speedup. Note that all applications can be implemented as a set of objects generating and responding to events. Events are just messages carrying a time stamp and when latter is omitted (as say in most MPI parallel codes), one can easily see that it is either implicit or irrelevant due to special features of application. Objects can be described in terms of actors; fragments of Fortran code; DMSO's HLA; Java's Enterprise Javabeans and many other frameworks. These correspond to different tradeoffs in ease of use, functionality and performance. This engineering decision of the appropriate framework, is not trivial but is usually easier than deciding on a model and algorithm which leads to an object description which can be both conveniently and efficiently implemented. One can divide problems into a few coarse categories described below. These can all be thought of as interacting objects. what counts is the size and sophistication of the object and the time sequencing of the objects. Loosely Synchronous: -------------------- These applications have a natural global synchronizing parameter -- typically the time of a simulation or an iteration count. The computation is divided into stages by this parameter and the "simulation objects" typically do some computation at a fixed value of this parameter and then have a stage where all exchange information. The computations can be (spatially) very inhomogeneous but the temporal structure of computation is quite regular. Parallel implementations are very efficient and MPI is natural model as all nodes are synchronized by the act of exchanging messages. Static or dynamic load balancing may be necessary but this well understood. There is a special case of this -- synchronous applications -- where the computations are homogeneous. These are the problems where machines like the CM2 can be used. For particle dynamics the "objects" are particles and information is exchanged between objects linked by a "force" in the physics. For partial differential equations (PDE), the "objects" are grid points or collections thereof. For an iterative solution technique, the connection between objects is usually defined by the "stencil" of the PDE algorithm. Looked at in fine enough detail, "Nature" is always (maybe too strong) loosely synchronous as the fundamental equations of physics have such a synchronizable structure. The majority of the compute intensive kernels of ASCI codes are of this form as are all the fundamental science simulations on NSF supercomputers. We have great confidence that these applications will continue to scale and be able to make use of the expected petaflop machines of 2010. Note that in loosely synchronous problems, the "objects" and messages expressing their interconnection are typically small. This implies that one needs the low latency and high bandwidth of "massively parallel machines". In judging needed bandwidth, one notes that on very general principles ALL (whether loosely synchronous or not) applications have a ratio of internal complexity to number of external links which decreases as size of object increases. This translates into a needed ratio of CPU power to needed communication power in any distributed or parallel computer used to simulate complex system. One can quantify this in terms of a surface over volume ratio expressed in terms of the dimension of the complex system. Pleasingly Parallel -------------------- These are the computationally simplest class of applications but many extremely important and sophisticated problems fall into this category. Here one can have a collection of arbitarily complicated objects which evolve in "time" in an essentially independent fashion. Examples include 1) Objects are neutrons in neutron transport problems where each is evolved independently through matter. Many but not all Monte Carlo problems have similar structure. 2) Objects are credit card transactions across the world. 3) Objects are Web client-Web Server interactions (the problem AOL must solve), whether file access or search. These are a major driver for distributed computing technology and are the dominant commercial applications of high performance computing. They need communication bandwidth to support load balancing (distribution of client requests amongst the 1000's of EXODUS or AOL Servers) and access to common data. Communication is not needed for inter-object communication. The "messages" are time stamped but the time between different objects does not need to be synchronized. These applications will also see scaling speedup and will both exploit and indeed drive the commercial development of high performance systems. Asynchronous ------------- These are problems formulated in terms of inter linked objects exchanging information encapsulated in terms of events i.e. time stamped messages. Unlike loosely synchronous problems, there is no natural synchronizing parameter and naively the problem is intrinsically sequential as one must execute first the object with lowest time stamp and then globally look for next object that can be evolved. One typically gets this problem class at any time one views nature in terms of "macroscopic objects" as is natural in fact in most decision applications. Essentially all war-game and forces modelling applications fall into this class. Very few academic or "fundamental" science or engineering problems are formulated this way which is one reason they have received little attention in the academic community. However essentially all DMSO (Defense Modernization and Simulation Office) work only looks at asynchronous problems. There is a large and unfortunate gulf between DMSO and high performance computing community. DMSO developed a sophisticated set of standards for simulation objects (called HLA for High Level Architecture) and their run time support (RTI). This does not address in a profound way any of the essential difficulties although it improves over previous DIS standards. HLA predated many recent developments in distributed objects but I expect that any DAOF (Decision Application Object Framework) could find helpful lessons in HLA and RTI. Asynchronous applications are addressed by so called event driven simulations using one of two methods to address the fundamental difficulty of scheduling events. This issue is particularly hard on parallel machines for as commented above, naively objects cannot be evolved in parallel. For "small" asynchronous problems the implementation is straightforward on sequential machines. Achieving "scaling parallelism" is central difficulty. Some DMSO applications are implemented in "real-time" on distributed systems. Now (computer) clock time and simulation time are equivalent. We have natural synchronization but typically very inefficient implementations. 1) Conservative Event Driven Simulation: One uses current simulation times of objects and bounds on when new events could be generated to identify those objects that can be simulated. 2) Optimistic Event Driven Simulation: One takes regular snapshots of object state and evolves objects in each node of a parallel machine whether or not one can guarantee correctness. If an event arrives at an object in its past; then object is rolled back to last snapshot and evolved forward in time. I am not aware of any good reason to expect scaling parallelism except for applications that are essentially loosely synchronous. The software system SPEEDES developed originally at JPL and commercialized by a small company METRON is current best practice for large discrete event simulations. Unfortunately, SPEEDES includes much "home-grown" distributed object technology which is probably better done by the rapidly evolving commercial Object Web systems. SPEEDES is supported by DoD's JSIMS project in DMSO and by the HPCMO (High performance Computing and Modernization Office). Meta-Problems ------------- These can be considered as a special case of asynchronous problems. One has multiple applications linked together in a single "meta-problem" -- analgous to a "meta-computer" which is an aggregation of individual computers. In a meta- computer, each component could be a massively parallel machine. Correspondingly in a meta-problem, components can be large loosely synchronous codes. The essential characteristic of meta-problems is that scaling parallelism is achieved internally to each component and not from the structure of the meta-problem linkage. Thus meta-problems will scale well even if the linkage between components prevents the components from running in parallel with each other. Successes of so called "computational grids" in the high performance computing field are largely either meta-problems or pleasingly parallel applications. Examples of meta-problems include 1) Data access, Simulation and Visualization stages in typical computer simulation in case where these are implemented as separate subsystems on different hardware. 2) Multi-disciplinary applications with say linked fluid flow, structures, acoustic and electromagnetic signature applications. This type is very common in large ASCI simulations. Often these massively parallel loosely synchronous applications will be asynchronously linked together and to some sequential control module containing optimization logic. Even if one component of a meta-problem is a small sequential application the full system will still have scaling parallelism. 3) In D division work, the IIT (Information Integration Technology) produces linked graphs of macro size objects,. This is a meta-problem. 4) Systems like AVS or Khoros support linked filters for Image Processing which give similar graphical representations to IIT. 5) Coupled Infrastructure simulations -- say Gas and Electricity as done by D division are meta-problems 6) In industry areas such "application integration" represent meta-problems where different parts of a business represent components. Modern Internet (Object Web) technology has developed a lot of new capabilities for handling meta-problems. Typically one sets such systems up in a "multi-tier architecture" with interacting objects. Such interacting objects are of the type imagined in SOFIA but objects are always large and so tradeoffs are different from systems aimed at handling smaller objects. One formaulates such systems with web front ends (portals) and with "services" such as security, collaboration, job submittal, visualization ... Decision Applications --------------------- These are certainly meta-problems if one links together multiple infrastructures (gas, electricity, roads) or just different types of objects. For instance in a military simulation, one could consider aircraft, vehicles and troops as separate components. As already mentioned IIT naturally generates meta-problems with "environmental conditions", "seeker performance" etc. as seperate components. Meta problems are often built hierarchically; a meta-application representing the simulation would be itself one component in a C4I command system which would have multiple simulations, databases, real-time sensors, planning tools linked together in a collaborative environment. Usually the base simulations in a decision application are of the "asynchronous type" described above and are forced to use event driven simulations. It is then hard to do large scale simulations. For example, there is essentially no DMSO style forces modelling problems currently running on the parallel machines of the HPC modernization program. Although a simulation like TRANSIMS is typically formulated as interacting asynchronous objects, the Sequential Dynamical systems (SDS) approach breaks the problem into phases where each phase is of a type where massive parallelism is possible. Roughly one starts with a planning phase which is loosely synchronous where the individual entities make plans for the next macro time step. Entities are individuals, vehicles or communication packets. This is followed by a micro time simulation phase where the entities are evolved in time. In each phase, one is operating at a fixed global simulation time and so the natural scaled parallelism of loosely synchronization or pleasingly parallel problems is achieved. This process is iterated to achieve consistent answers. SDS has similarities with other approaches -- in particular cellular automata or Petri nets -- but has some key features which do not appear to be present in these other cases. Cellular automata have massive parallelism but the sophisticated planning phase of SDS is not normally present. Petri nets can represent complex linkages but don't naturally give scaling parallelism. One could conjecture that one can always look at a complex system in fine enough detail that it can be represented in this multi-phase loosely synchronous fashion. This would be a critical observation as it says we now have the key abstractions to tackle arbitary complex systems simulation. We need a) Distributed Object Web technology to support meta-problems including C4I and linked complex systems. This technology also supports pleasingly parallel problems. b) Classic asynchronous event driven simulations for "small" problems where we wish to quickly get answers without the significant analysis effort needed by SDS c) Loosely synchronous simulations for either SDS or classic scientific simulations. My statement "Roughly one starts with a planning phase which is loosely synchronous where the individual entities make plans for the next time interval. Entities are individuals, vehicles or communication packets. This is followed by a simulation phase where the entities are evolved in time." needs to be made more precise. I think it would be useful to take the current SDS applications -- especially the different types such as Transsims, ArchSIMS and AdhopNet -- and address this issue. We must quantify what is the essential reason SDS can get loosely synchronous scaling structure where the other approachs have failed. DAOF (Decision Application Object Framework) -------------------------------------------- There are many object paradigms -- Java, COM, CORBA, W3C(SOAP) and each has pluses and minuses. although I happen to prefer Java, the decision on implementation paradigm should be kept open. However I suggest that re-use of components between different D division projects can be helped by development of a common framework. This will enable re-use of both objects and services such as collaboration and security. HLA is probably not the right way to go except as an exemplar and as a framework with which you may need to integrate. I recommend designing such a framework and specifying all the objects and their services in XML. This can be mapped into the particular interfaces used by the paradigm needed. I have had success with such XML specification supporting either java or CORBA objects.