Given by Geoffrey C. Fox, Yuhong Wen, Wojtek Furmanski, Tom Haupt at NASA Workshop on Performance Engineered Information Systems on Sept 28-29 98. Foils prepared Sept 30 98
Outside Index
Summary of Material
PetaSIM Motivation and basic ideas
|
PetaSIM Design and Examples
|
Sample PetaSIM Experimental Results |
A hybrid object web model for IPG (Information Power Grid) |
Possible use of PetaSIM in IPG for two classes of applications
|
Outside Index Summary of Material
Geoffrey Fox, Wojtek Furmanski, Tom Haupt and Yuhong Wen |
Northeast Parallel Architecture Center (NPAC) |
Syracuse University |
gcf,furm,haupt,wen@npac.syr.edu |
URL is http://kopernik.npac.syr.edu:4096/petasim/V1.0/PetaSIM.html |
PetaSIM Motivation and basic ideas
|
PetaSIM Design and Examples
|
Sample PetaSIM Experimental Results |
A hybrid object web model for IPG (Information Power Grid) |
Possible use of PetaSIM in IPG for two classes of applications
|
"Back of the envelope estimates" where you get good intuition for why the performance is what it is |
and Precise Simulations |
Communication Overhead in classical data parallel (edge over area) with NO memory hierarchy NO latency where dim is geometric dimension and tcomm and tfloat are |
typical communication and computation times |
Imagine engineer using applet to change system parameters interactively and examine new network/machine designs quickly -- easier to change system parameters than generate new applications -- so need to accumulate benchmark set |
Specify Nodes |
Specify Links |
Specify Datasets |
Specify Execution |
We define an object structure for computer (including network) and data
|
Architecture Description
|
Data Description
|
Application Description -- needs further refinement
|
PetaSIM |
Performance Estimation |
Nodeset |
Linkset |
Dataset |
Distribution |
UMD Emulators Automatic Script Generation |
Execution Script |
Hand Coded |
Script |
Applications |
View as a bunch of nodesets joined by a bunch of linksets Each component defined as "objects" which are valuable outside PetaSIM (in defining object |
structure of computers) |
Only One |
member of |
most nodesets |
shown |
Just one member |
of each CPU level |
nodeset |
shown in this |
more detailed |
Architecture |
for SP2 |
Name: one per nodeset object |
type: choose from memory, cache, disk, CPU, pathway |
number: number of members of this nodeset in the architecture |
grainsize: size in bytes of each member of this nodeset (for memory, cache, disk) |
bandwidth: maximum bandwidth allowed in any one member of this nodeset |
floatspeed: CPU's float calculating speed |
calculate(): method used by CPU nodeset to perform computation |
cacherule: controls persistence of data in a memory or cache |
portcount: number of ports on each member of nodeset |
portname[]: ports connected to linkset |
portlink[]: name of linkset connecting to this port |
nodeset_member_list: list of nodeset members in this nodeset (for nodeset member identification) |
Name: one per dataset object |
type: of dataset choose from grid1dim, grid2dim, grid3dim |
bytesperunit: number of bytes in each unit |
floatsperunit: update cost as a floating point arithmetic count |
operationsperunit: operations in each unit |
update(): method that updates given dataset which is contained in a CPU nodeset and a grainsize controlled by last memory nodeset visited |
transmit(): method that calculates cost of transmission of dataset between memory levels either as communication or as movement up and down hierarchy
|
Measured Execution Time |
PetaSIM Running Time |
Estimated Application Execution Time |
Measured Execution Time |
PetaSIM Running Time |
Estimated Application Execution Time |
PetaSIM Running Time |
Estimated Application Execution Time |
Measured Execution Time |
Measured Execution Time |
PetaSIM Running Time |
Estimated Application Execution Time |
W is Web Server |
PD Parallel Database |
DC Distributed Computer |
PC Parallel Computer |
O Object Broker |
N Network Server |
e.g. Netsolve, Ninf |
T Collaboratory Server |
Clients |
Middle Layer (Server Tier) |
Third Backend Tier |
3-(or more)-tier architecture - Web browser front-ends, legacy (e.g. databases, HPC modules) backends; fat middleware |
Use as appropriate the alternative / competing Middleware models:
|
Each model has different tradeoffs (most elegant, powerful, fastest, simplest) |
POW integrates various models and services either by linking multiple brokers/servers or in terms of a single multi-protocol middleware server (JWORB)
|
JWORB - Java Web Object Request Broker - multi-protocol middleware network server (HTTP + IIOP + DCE RPC + RMI transport) |
Current prototype integrates HTTP and IIOP i.e. acts as Web Server and CORBA Broker
|
Next step: add DCE RPC support to include Microsoft COM |
JWORB - our trial implementation of Pragmatic Object Web |
First non DMSO implementation of RTI -- HLA (distributed event driven simulation) Runtime at 5% cost(!) |
JacORB |
JWORB |
ORBIX |
RMI |
Variable Size Integer Arrays |
Adopt multi-tier enterprise computing model
|
Use Commodity Software Model as basic distributed computing infrastructure in middle tier |
Use Pragmatic Object Web(POW) as the abstraction of today's commodity distributed object software infrastructure |
Middle tier uses high functionality protocols such as Web HTTP, CORBA IIOP, Java RMI etc. with MPI (Nexus or Globus) as optimized "machine code" at backend |
So we have a natural hierarchical model with perhaps an SP2 seen as a single POW distributed object in middleware and 128 nodes at backend |
Nodes in a Cluster host JWORB server when seen at middle tier and this controls the MPI process view at backend |
Note logical progression below which is opposite to some other approaches |
Backend Parallel Computing Nodes running Classic HPCC -- MPI becomes Globus to use distributed resources |
Middle Control Tier -- JWORB runs on all Nodes |
SPMD Program |
SPMD Program |
SPMD Program |
SPMD Program |
MPI |
JWORB |
JWORB |
JWORB |
JWORB |
RTI |
Use separation of control and data transfer |
to support RTI(IIOP) on control layer and MPI |
on fast transport layer simultaneously |
RTI |
RTI |
MPI |
MPI |
Middle and Backend on Each Node |
Client Tier |
Middle |
Tier |
IIOP |
Backend: Globus MPI etc. |
IIOP |
Consider large class of problems that can be thought of a set of coarse grain entities which could be internally data parallel and the coarse grain structure is "functional or task" parallelism |
Use (Enterprise) JavaBeans to represent modules at (server) client level |
Use UML (and related technologies) to specify application and system structure |
WebFlow is graphical (Java Applet) composition palette (Beanbox for computational modules) |
Use "To be Agreed Seamless Computing Interface" to implement linkage of proxies to backend hardware |
We can support any given paradigm at either high functionality (web server) or high performance (backend) level |
HPCC Messaging could be a pure Java/RMI middle tier version of MPI or Nexus/Optimized Machine specific MPI at backend |
Full Heterogeneous MetaProblem |
Module |
Aggregates of grid points etc. |
Module |
Module |
Module |
Module |
Components |
Components |
Task Parallelism |
Data Parallelism in Loosely |
Synchronized Computation |
Split into |
Levels |
Memory Hierarchy |
including I/O |
Fine Grain Simulations |
One or more |
levels of |
Coarse Grain |
Simulation |
PetaSIM can be used at either coarse or fine grain |
Working at coarse grain, can abstract fine grain in a simple model or model in detail |
Coarse Grain Entities can be time synchronized simulations and use MPI(HPF?) at either middle or back end tier or as in DMSO simulations a federate running a custom discrete event simulation |
Use DMSO Object model HLA to specify object structure of jobs and systems at middle tier level |
A HLA Federation could be the set of all jobs to be run on a particular site
|
A HLA Federate could be a job consisting of multiple possibly shared objects |
Use DMSO Runtime Infrastructure RTI to implement dynamic management
|
We can exploit hierarchical view of metaproblem and use PetaSIM to model the collection of "middle-tier" modules
|
Rather than support general case, consider two examples
|
There are (at least) two types of data streams in collaborative engineering
|
IPG: Real Time Multimedia and Asynchronous Data |
Collaborative Sessions have new measures of performance |
Quality of received multimedia -- especially audio -- which is streamed in real time and data thrown out if comes too late |
Results of simulations and resources (e.g. web pages accessed) used in session can be transmitted asynchronously. If network congestion, the results are typically correct but received late
|
All of these features and implied goodness measures should be built into PetaSIM -- this appears straightforward |
This use of PetaSIM can be used upfront to deploy appropriate network for collaborative engineering |
Further it can be used dynamically by collaboration tool to make decisions on
|
Collaborative software (such as TangoInteractive) can support API allowing monitoring of delays so PetaSIM can determine when material arrives at each engineers desk and adjust algorithm |
In example below, need to model performance of 6 modules |
Visualization |
One or more Client Server Applications are a special case of Dataflow
|
Can be applied either to a geographically distributed network or to a cluster of PC's or workstations |
In PetaSIM, need to define "middle-tier" objects and their performance parameters and "simple estimates" as in PetaSIM fits the coarse graining of applications |
General problem is multiple "dataflow graphs" and collaborative sessions to be modeled in context of a given network model
|
Ninf (and NetSolve) are examples of POW Middle tier servers |
Coarse Grain Dataflow is an example of "general task graph" described in POEMS |
Coarse Grain Dataflow and Collaborative Session are examples of AppLeS Templates |
PetaSIM could be thought of as modeling tool for AppLeS (aimed at coarse grain quick runtime estimations) |
Web and Object Servers used synchronously as in collaborative sessions have very high re-use and very different trade-offs (proxies, mirror sites more important) from classic asynchronous access |