Given by Yuhong Wen, Geoffrey C. Fox at Darpa Workshop on Performance Engineered Systems Annapolis Md. on August 19-21 1998. Foils prepared August 22 98
Outside Index
Summary of Material
PetaSIM Motivation and basic ideas
|
PetaSIM Design and Examples
|
Sample PetaSIM Experimental Results |
Current Progress and next steps |
URL is http://kopernik.npac.syr.edu:4096/petasim/V1.0/PetaSIM.html
|
Outside Index
Summary of Material
Geoffrey Fox and Yuhong Wen |
Northeast Parallel Architecture Center (NPAC) |
Syracuse University |
gcf,wen@npac.syr.edu |
PetaSIM Motivation and basic ideas
|
PetaSIM Design and Examples
|
Sample PetaSIM Experimental Results |
Current Progress and next steps |
URL is http://kopernik.npac.syr.edu:4096/petasim/V1.0/PetaSIM.html
|
"Back of the envelope estimates" where you get good intuition for why the performance is what it is |
and Precise Simulations |
Communication Overhead in classical data parallel (edge over area) with NO memory hierarchy NO latency where dim is geometric dimension and tcomm and tfloat are |
typical communication and computation times |
tcomm |
tfloat |
(grain size) |
1/dim |
a |
Full Heterogeneous MetaProblem |
Module |
Aggregate |
Aggregate |
Loosely |
Synchronized |
Computation |
Module |
Module |
Module |
Module |
Components |
Components |
Task Parallelism |
Data Parallelism |
Split into |
Levels |
Memory Hierarchy |
including I/O |
PetaSIM |
Metaproblem Hierarchy |
Detailed |
Simulations |
Compare to |
Measurements |
We define an object structure for computer (including network) and data
|
Architecture Description
|
Data Description
|
Application Description -- needs further refinement
|
PetaSIM |
Performance Estimation |
Nodeset |
Linkset |
Dataset |
Distribution |
UMD Emulators Automatic Script Generation |
Execution Script |
Hand Coded |
Script |
Applications |
C++ Simulator |
Multi-User |
Java Server |
Standard Java Applet Client |
Standard Java Applet Client |
Specify Nodes |
Specify Links |
Specify Datasets |
Specify Execution |
View as a bunch of nodesets joined by a bunch of linksets Each component defined as "objects" which are valuable outside PetaSIM (in defining object |
structure of computers) |
More Detailed |
Architecture |
Description |
PetaSIM Estimation Time |
Predicted Execution Time |
Measured Execution Time |
PetaSIM Estimation Time |
Predicted Execution Time |
Measured Execution Time |
PetaSIM was designed to allow "qualitative" (good enough) performance estimates" where in particular the design of machine is particularly easy to change
|
The project will build up a suite of benchmark applications which can be used in future activities such as "Petaflop" architectures studies |
Applications are to be derived "by hand" or by automatic generation from Maryland Application Emulators |
Special attention to support of hierarchical memory machines and data intensive applications |
Support parallelism and representation at different grain sizes |
Support simulation of "pure data-parallel" and composition of linked loosely synchronous data parallel modules |
Easy modified Architecture and Application description |
Architecture Description (nodeset & linkset)
|
Application Description (dataset & execution script) |
Supports Loosely Synchronous Data Parallel Model & Custom Control |
Link to Maryland Application Emulators |
Jacobi hand-written example -- add SWEEP3D?
|
Pathfinder, Titan, VMScope real applications (Generated by UMD's Emulator) -- data intensive |
Look at UML for Interface and coarse grain specification |
Fast and reasonably accurate performance estimation (PetaSIM runs on single processor) |
Java applet based user Interface |
About 6000 lines of C++ (server) and 4000 lines of Java (client) |