Full HTML for

Basic foilset Details of PetaSIM and its relationship to Performance Specification Languages

Given by Yuhong Wen, Geoffrey C. Fox at Darpa Workshop on Performance Engineered Systems Annapolis Md. on August 19-21 1998. Foils prepared August 22 98
Outside Index Summary of Material


Remember PetaSIM Motivation and basic ideas
  • (Using aircraft analogy) aimed at conceptual design of computer architecture and applications ( as opposed to later preliminary and detailed design)
  • Java applet as friendly user interface; C++ execution engine
PetaSIM Design and Examples
  • as conceptual level, can use to estimate performance of applications in any language
  • Like RMI, use Java itself as IDL to specify object structure of computers and problems
  • Nodeset & Linkset; Dataset & Distribution; Execution Script
  • Relation to general PSL (Performance Specification Language) concept
Sample PetaSIM Experimental Results
Current Progress and next steps

Table of Contents for full HTML of Details of PetaSIM and its relationship to Performance Specification Languages

Denote Foils where Image Critical
Denote Foils where Image has important information
Denote Foils where HTML is sufficient

1 Detailed Discussion of A Simple(-minded) Performance Estimator --- PetaSIM and how fits into a PSL Performance Specification Language Context Darpa Performance Engineered Systems Workshop Annapolis Maryland August 19-21 1998
2 Summary of Detailed Discussion of PetaSIM
3 Architecture of PetaSIM
4 PetaSIM Basic Components
5 Performance Specification Languages and PSL
6 Three PSL Tradeoffs
7 PSL Features in PetaSIM
8 Petasim Estimator & Emulator
9 IBM SP2 Architecture I used in tests
10 IBM SP2 Architecture II used in tests
11 Nodeset Object Structure
12 Nodeset Member Object
13 Linkset Object Structure
14 Linkset Member Object
15 Distribution Object Structure
16 Dataset Object Structure
17 Execution Script
18 PetaSIM Estimation Approach
19 Jacobi Example -- Nodeset I
20 Jacobi Example -- Nodeset II
21 Jacobi Example -- Linkset I
22 Jacobi Example -- Linkset II
23 Jacobi Example
24 Jacobi Example -- Data Parallel Version (SP2 Architecture I)
25 Jacobi Example -- Execution Script I
26 Jacobi Example -- Execution Script II
27 Interface between Emulator and PetaSIM Part of Pathfinder Application
28 Interface between Emulator and PetaSIM -- Pathfinder
29 Interface between Emulator and PetaSIM -- Pathfinder
30 Interface between Emulator and PetaSIM Pathfinder
31 Pathfinder Performance Estimation Results (SP2 Architecture II)
32 Pathfinder Scaling: Performance v. # Processors
33 Pathfinder Estimation Results II
34 PetaSIM Estimation Results (Titan Architecture I)
35 Titan Estimation Results (Architecture II)
36 Titan Estimation Results (Fixed)
37 VMScope Performance Estimation Results (Architecture II)
38 VMScope Estimation Results
39 PetaSIM Features
40 Comparison with other Simulator Approaches
41 Summary of PetaSIM
42 Possible Future Work

Outside Index Summary of Material



HTML version of Basic Foils prepared August 22 98

Foil 1 Detailed Discussion of A Simple(-minded) Performance Estimator --- PetaSIM and how fits into a PSL Performance Specification Language Context Darpa Performance Engineered Systems Workshop Annapolis Maryland August 19-21 1998

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Geoffrey Fox and Yuhong Wen
Northeast Parallel Architecture Center (NPAC)
Syracuse University
gcf,wen@npac.syr.edu
URL is: http://kopernik.npac.syr.edu:4096/petasim/V1.0/PetaSIM.html

HTML version of Basic Foils prepared August 22 98

Foil 2 Summary of Detailed Discussion of PetaSIM

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Remember PetaSIM Motivation and basic ideas
  • (Using aircraft analogy) aimed at conceptual design of computer architecture and applications ( as opposed to later preliminary and detailed design)
  • Java applet as friendly user interface; C++ execution engine
PetaSIM Design and Examples
  • as conceptual level, can use to estimate performance of applications in any language
  • Like RMI, use Java itself as IDL to specify object structure of computers and problems
  • Nodeset & Linkset; Dataset & Distribution; Execution Script
  • Relation to general PSL (Performance Specification Language) concept
Sample PetaSIM Experimental Results
Current Progress and next steps

HTML version of Basic Foils prepared August 22 98

Foil 3 Architecture of PetaSIM

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
C++ Simulator
Multi-User
Java Server
Standard Java Applet Client
Standard Java Applet Client

HTML version of Basic Foils prepared August 22 98

Foil 4 PetaSIM Basic Components

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
We define an object structure for resources (computer, I/O, network) and data
  • These object representations can also be used in dynamic tools in areas of scheduling (eg. Condor, Legion or Globus) and seamless interfaces (eg. UNICORE, WebSubmit, SWeb) as well as other performance projects (especially POEMS and Warwick)
  • Java Grande Forum will propose a "community standard compute resource" object structure as part of "Java Framework for Computation"
Architecture Description
  • nodeset & linkset
  • (describe the memory hierarchy of architecture)
Data Description
  • dataset & distribution (not stressed in current state of development)
Application Description
  • execution script (needs further refinement)

HTML version of Basic Foils prepared August 22 98

Foil 5 Performance Specification Languages and PSL

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
A PSL is the (intermediate) representation used to specify application in way that performance "tools" have `sufficient information to operate"
PSL Needs to specify application, computational resources and their interaction
  • Need to worry about parallelism and where information is stored in memory hierarchy
  • Need to address implicit data movement and parallelism

HTML version of Basic Foils prepared August 22 98

Foil 6 Three PSL Tradeoffs

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
1) Application Programmer wanting to design application on a fixed computer
  • essential to minimize effort to generate PSL from conventional (changing) code
2) Hardware designer wanting to develop a new machine that performs well against a fixed application set
  • essential to make it easy to change hardware9systems software) specification
  • May need to change implicit data movements/parallelism in application set
3) Systems Integrator (such as MSTAR team) wishing to explore changes in both application and hardware
  • hardware in MSTAR case is heterogeneous cluster (computational grid)
PetaSIM had 2) in mind when designed but use with application emulators is nearer 3)

HTML version of Basic Foils prepared August 22 98

Foil 7 PSL Features in PetaSIM

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Original concept was for "user" to write applications in "execution script" which made explicit all parallelism and data movement.
  • Using data parallel operations operating on coarse grain data entities to generate elegant application specification
I/O Intensive Maryland applications view "PetaSIM execution script" as an "intermediate representation"
Application specified as set of (emulating) methods
PetaSIM specification of computational resources
Translator of method calls
PetaSIM Execution Script
PetaSIM

HTML version of Basic Foils prepared August 22 98

Foil 8 Petasim Estimator & Emulator

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
PetaSIM
Performance Estimation
Nodeset
Linkset
Dataset
Distribution
UMD Emulators Automatic Script Generation
Execution Script
Hand Coded
Script
Applications

HTML version of Basic Foils prepared August 22 98

Foil 9 IBM SP2 Architecture I used in tests

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
View as a bunch of nodesets joined by a bunch of linksets Each component defined as "objects" which are valuable outside PetaSIM (in defining object
structure of computers)
Only One member of most nodesets shown

HTML version of Basic Foils prepared August 22 98

Foil 10 IBM SP2 Architecture II used in tests

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Just one member
of each CPU level
nodeset
shown in this
more detailed
Architecture
for SP2

HTML version of Basic Foils prepared August 22 98

Foil 11 Nodeset Object Structure

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Name: one per nodeset object
type: choose from memory, cache, disk, CPU, pathway
number: number of members of this nodeset in the architecture
grainsize: size in bytes of each member of this nodeset (for memory, cache, disk)
bandwidth: maximum bandwidth allowed in any one member of this nodeset
floatspeed: CPU's float calculating speed
calculate(): method used by CPU nodeset to perform computation
cacherule: controls persistence of data in a memory or cache
portcount: number of ports on each member of nodeset
portname[]: ports connected to linkset
portlink[]: name of linkset connecting to this port
nodeset_member_list: list of nodeset members in this nodeset (for nodeset member identification)

HTML version of Basic Foils prepared August 22 98

Foil 12 Nodeset Member Object

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Inherit from Nodeset Class
Static (User Specified) Information
Contain each nodeset member's linkage relation
  • portname[]: (name of its neighbors nodeset member)
  • linkname[]: (name of the linkset member)
Dynamic Information generated by Simulator
  • receivetime: dataset arrival time
  • sendouttime: dataset send out time

HTML version of Basic Foils prepared August 22 98

Foil 13 Linkset Object Structure

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Name: one per linkset object
type: choose from updown, across
nodesetbegin: name of initial nodeset joined by this linkset
nodesetend: name of final nodeset joined buy this linkset
topology: used for across networks to specify linkage between members of a single nodeset
duplex: choose from full or half
number: number of members of this linkset in the architecture
latency: time to send zero length message across any member of linkset
bandwidth: maximum bandwidth allowed in any link of this linkset
send(): method that calculates cost of sending a message across the linkset
distribution: name of geometric distribution controlling this linkset
linkset_member_list: list of linkset members in this linkset ( for linkset member identification )

HTML version of Basic Foils prepared August 22 98

Foil 14 Linkset Member Object

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Inherit from Linkset Class
Static (user specified information)
Contain each linkset member's linkage relation
  • name: linkset member's unique name
  • nodebegin: name of the initial nodeset member
  • nodeend: name of the initial linkset member
Dynamic information generated by PetaSIM
  • transstart[]: virtual time at which the linkset member starts its use
  • transend[]: virtual time at which the linkset member stops its use

HTML version of Basic Foils prepared August 22 98

Foil 15 Distribution Object Structure

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Relevant when use implicit data parallel execution script based on aggregates
Should be consistent with "HPJava" (Java implementation of HPF concepts) specification of data parallelism
Current properties are just:
  • Name: one per distribution object
  • type: choose from block1dim, block2dim, block3dim
  • Obviously will add more choices here!
Most of our current work uses explicitly specified parallelism generated from application emulators and so does NOT use distribution object
Jacobi example in data parallel version uses this

HTML version of Basic Foils prepared August 22 98

Foil 16 Dataset Object Structure

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Name: one per dataset object
choose from grid1dim, grid2dim, grid3dim, specifies type of dataset
bytesperunit: number of bytes in each unit
floatsperunit: update cost as a floating point arithmetic count
operationsperunit: operations in each unit
update(): method that updates given dataset which is contained in a CPU nodeset and a grainsize controlled by last memory nodeset visited
transmit(): method that calculates cost of transmission of dataset between memory levels either communication or movement up and down hierarchy
  • Methods can use other parameters or be custom

HTML version of Basic Foils prepared August 22 98

Foil 17 Execution Script

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Currently a few instruction types which stress (unlike most languages) movement of data through memory hierarchies
Can be applied as aggregates (data parallel) or to each member of a set
send DATAFAMILY from MEM-LEVEL-L to MEM-LEVEL-K
  • These reference object names for data and memory nodesets (aggregates) or nodeset members (explicit parallel applications)
  • Current implementation only supports "primitive"(one level) movement -- can naturally extend to multi-level movement
move DATAFAMILY from MEM-LEVEL-L to MEM-LEVEL-K
Use distribution DISTRIBUTION from MEM-LEVEL-L to MEM-LEVEL-K
compute DATAFAMILY-A, DATAFAMILY-B,... on MEM-LEVEL-L
synchronize (synchronizes all processors --- loosely synchronous barrier)

HTML version of Basic Foils prepared August 22 98

Foil 18 PetaSIM Estimation Approach

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Based on ASCII execution script defining the applications
  • Not directly work flow graphs -- rather "classic programming language"
Each nodeset member has usage control block to record when dataset arrives and when to send out to next nodeset member
  • Similarily each linkset member has usage control block to record at what time the linkset member is free or occupied
Supports both data parallel mode and individual operation on each nodeset, linkset member mode (See Jacobi Example on next pages)
  • Data parallel mode has much faster PetaSIM execution time as essentially independent of # of processors
Data parallel model is time stepped while individual model is simple event driven model

HTML version of Basic Foils prepared August 22 98

Foil 19 Jacobi Example -- Nodeset I

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
cpu CPU 8 32 1 1.56116e-7
mem link1
cpu0 1 mem0 link10
cpu1 1 mem1 link11
cpu2 1 mem2 link12
cpu3 1 mem3 link13
cpu4 1 mem4 link14
cpu5 1 mem5 link15
cpu6 1 mem6 link16
cpu7 1 mem7 link17
name Type number grainsize portlink (floatspeed)
nodeset_name linkset_name ( replicated #times = # links for nodeset in first line)
nodeset_member link_number nodeset_member linkset_member (pair replicated again)
disks Disk 8 2147483648 1
ctl1 link3
d0 1 ctl10 link30
d1 1 ctl11 link31
d2 1 ctl12 link32
d3 1 ctl13 link33
d4 1 ctl14 link34
d5 1 ctl15 link35
d6 1 ctl16 link35
d7 1 ctl17 link37
mem Memory 8 134217728 2
cpu link1 ctl2 link2
mem0 2 cpu0 link10 ctl20 link20
mem1 2 cpu1 link11 ctl21 link21
mem2 2 cpu2 link12 ctl22 link22
mem3 2 cpu3 link13 ctl23 link23
mem4 2 cpu4 link14 ctl24 link24
mem5 2 cpu5 link15 ctl25 link25
mem6 2 cpu6 link16 ctl26 link26
mem7 2 cpu7 link17 ctl27 link27
Continued on next Page

HTML version of Basic Foils prepared August 22 98

Foil 20 Jacobi Example -- Nodeset II

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
ctl2 Pathway 8 0 3
mem link2 ctl1 link4 network link5
ctl20 3 mem0 link20 ctl10 link40 network0 link50
ctl21 3 mem1 link21 ctl11 link41 network0 link51
ctl22 3 mem2 link22 ctl12 link42 network0 link52
ctl23 3 mem3 link23 ctl13 link43 network0 link53
ctl24 3 mem4 link24 ctl14 link44 network0 link54
ctl25 3 mem5 link25 ctl15 link45 network0 link55
ctl26 3 mem6 link26 ctl16 link46 network0 link56
ctl27 3 mem7 link27 ctl17 link47 network0 link57
network Switch 1 0 1
ctl2 link5
network0 8 ctl20 link50 ctl21 link51 ctl22 link52 ctl23 link53 ctl24 link54 ctl25 link55 ctl26 link56 ctl27 link57
ctl1 Pathway 8 0 2
disks link3 ctl2 link4
ctl10 2 d0 link30 ctl20 link40
ctl11 2 d1 link31 ctl21 link41
ctl12 2 d2 link32 ctl22 link42
ctl13 2 d3 link33 ctl23 link43
ctl14 2 d4 link34 ctl24 link44
ctl15 2 d5 link35 ctl25 link45
ctl16 2 d6 link36 ctl26 link46
ctl17 2 d7 link37 ctl27 link47
Continued from previous Page

HTML version of Basic Foils prepared August 22 98

Foil 21 Jacobi Example -- Linkset I

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
link1 Updown Full 8 0.0 524288000
mem cpu
link10 mem0 cpu0
link11 mem1 cpu1
link12 mem2 cpu2
link13 mem3 cpu3
link14 mem4 cpu4
link15 mem5 cpu5
link16 mem6 cpu6
link17 mem7 cpu7
link2 Updown Full 8 0.0 83886080
mem ctl2
link20 mem0 ctl20
link21 mem1 ctl21
link22 mem2 ctl22
link23 mem3 ctl23
link24 mem4 ctl24
link25 mem5 ctl25
link26 mem6 ctl26
link27 mem7 ctl27
Excerpt from Linkset Definitions
1)name Type Duplex number latency bandwidth
2)nodesetbegin nodesetend
3)linkset_member nodeset_member_begin nodeset_member_end
link3 Updown Full 8 2.0e-4 8388608
ctl1 disks
link30 ctl10 d0
link31 ctl11 d1
link32 ctl12 d2
link33 ctl13 d3
link34 ctl14 d4
link35 ctl15 d5
link36 ctl16 d6
link37 ctl17 d7
Continued on next Page

HTML version of Basic Foils prepared August 22 98

Foil 22 Jacobi Example -- Linkset II

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Excerpt from Linkset Definitions:
name Type Duplex number latency bandwidth
nodesetbegin nodesetend
linkset_member nodeset_member_begin nodeset_member_end
link4 Updown Full 8 0.0 83886080
ctl1 ctl2
link40 ctl10 ctl20
link41 ctl11 ctl21
link42 ctl12 ctl22
link43 ctl13 ctl23
link44 ctl14 ctl24
link45 ctl15 ctl25
link46 ctl16 ctl26
link47 ctl17 ctl27
link5 Updown Full 8 4.0e-5 83886080
ctl2 network
link50 ctl20 network0
link51 ctl21 network0
link52 ctl22 network0
link53 ctl23 network0
link54 ctl24 network0
link55 ctl25 network0
link56 ctl26 network0
link57 ctl27 network0
Continued from previous Page

HTML version of Basic Foils prepared August 22 98

Foil 23 Jacobi Example

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Jacobi Grid2dim 1000000 4 4 10
Dataset Definition is very simple (by design):
name type size bytesperunit floatperunit operationperunit

HTML version of Basic Foils prepared August 22 98

Foil 24 Jacobi Example -- Data Parallel Version (SP2 Architecture I)

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
move Jacobi from disks to ctl1
move Jacobi from ctl1 to ctl2
move Jacobi from ctl2 to mem
move Jacobi from mem to ctl2
move Jacobi from ctl2 to network
move Jacobi from network to ctl2
move Jacobi from ctl2 to mem
move Jacobi from mem to cpu
compute Jacobi on cpu
move Jacobi from mem to ctl2
move Jacobi from ctl2 to ctl1
move Jacobi from ctl1 to disks
synchronize

HTML version of Basic Foils prepared August 22 98

Foil 25 Jacobi Example -- Execution Script I

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
compute Jacobi on cpu4
Move Jacobi from d1 to ctl11
move Jacobi from ctl11 to ctl21
move Jacobi from ctl21 to mem1
move Jacobi from mem1 to ctl21
move Jacobi from ctl21 to network0
move Jacobi from network0 to ctl20
move Jacobi from ctl20 to mem0
move Jacobi from mem0 to cpu0
compute Jacobi on cpu0
move Jacobi from mem1 to ctl21
move Jacobi from ctl21 to network0
move Jacobi from network0 to ctl22
move Jacobi from ctl22 to mem2
move Jacobi from mem2 to cpu2
compute Jacobi on cpu2
move Jacobi from mem1 to ctl21
move Jacobi from ctl21 to network0
move Jacobi from network0 to ctl25
move Jacobi from ctl25 to mem5
move Jacobi from mem5 to cpu5
compute Jacobi on cpu5
Move Jacobi from d0 to ctl10
move Jacobi from ctl10 to ctl20
move Jacobi from ctl20 to mem0
move Jacobi from mem0 to ctl20
move Jacobi from ctl20 to network0
move Jacobi from network0 to ctl21
move Jacobi from ctl21 to mem1
move Jacobi from mem1 to cpu1
compute Jacobi on cpu1
move Jacobi from mem0 to ctl20
move Jacobi from ctl20 to network0
move Jacobi from network0 to ctl24
move Jacobi from ctl24 to mem4
move Jacobi from mem4 to cpu4
Note simpler data parallel version on previous page
Continued on next Page

HTML version of Basic Foils prepared August 22 98

Foil 26 Jacobi Example -- Execution Script II

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Move Jacobi from d5 to ctl15
move Jacobi from ctl15 to ctl25
move Jacobi from ctl25 to mem5
move Jacobi from mem5 to ctl25
move Jacobi from ctl25 to network0
move Jacobi from network0 to ctl21
move Jacobi from ctl21 to mem1
move Jacobi from mem1 to cpu1
compute Jacobi on cpu1
move Jacobi from mem6 to ctl26
move Jacobi from ctl26 to network0
move Jacobi from network0 to ctl27
move Jacobi from ctl27 to mem7
move Jacobi from mem7 to cpu7
compute Jacobi on cpu7
Move Jacobi from d7 to ctl17
move Jacobi from ctl17 to ctl27
move Jacobi from ctl27 to mem7
move Jacobi from mem7 to ctl27
move Jacobi from ctl27 to network0
move Jacobi from network0 to ctl23
move Jacobi from ctl23 to mem3
move Jacobi from mem3 to cpu3
compute Jacobi on cpu3
move Jacobi from mem7 to ctl27
move Jacobi from ctl27 to network0
move Jacobi from network0 to ctl26
move Jacobi from ctl26 to mem6
move Jacobi from mem6 to cpu6
compute Jacobi on cpu6
synchronize
Continued from previous Page

HTML version of Basic Foils prepared August 22 98

Foil 27 Interface between Emulator and PetaSIM Part of Pathfinder Application

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Excerpt from Nodeset Definitions
name Type number grainsize portlink (floatspeed)
nodeset_name linkset_name
nodeset_member link_number nodeset_member linkset_member
cpu CPU 8 32 1 1.530000e-06
bus1 link2
cpu0 1 bus10 link20
cpu1 1 bus11 link21
cpu2 1 bus12 link22
cpu3 1 bus13 link23
cpu4 1 bus14 link24
cpu5 1 bus15 link25
cpu6 1 bus16 link26
cpu7 1 bus17 link27
...................
mem Memory 8 134217728 1
bus1 link1
mem0 1 bus10 link10
mem1 1 bus11 link11
mem2 1 bus12 link12
mem3 1 bus13 link13
mem4 1 bus14 link14
mem5 1 bus15 link15
mem6 1 bus16 link16
mem7 1 bus17 link17

HTML version of Basic Foils prepared August 22 98

Foil 28 Interface between Emulator and PetaSIM -- Pathfinder

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Excerpt from Linkset Definitions:
name Type Duplex number latency bandwidth
nodesetbegin nodesetend
linkset_member nodeset_member_begin nodeset_member_end
link1 Updown Full 8 0.000000e+00 134217728
bus1 mem
link10 bus10 mem0
link11 bus11 mem1
link12 bus12 mem2
link13 bus13 mem3
link14 bus14 mem4
link15 bus15 mem5
link16 bus16 mem6
link17 bus17 mem7
link2 Updown Full 8 0.000000e+00 134217728
bus1 cpu
link20 bus10 cpu0
link21 bus11 cpu1
link22 bus12 cpu2
link23 bus13 cpu3
link24 bus14 cpu4
link25 bus15 cpu5
link26 bus16 cpu6
link27 bus17 cpu7

HTML version of Basic Foils prepared August 22 98

Foil 29 Interface between Emulator and PetaSIM -- Pathfinder

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Excerpt from Dataset Definitions:
name type size bytesperunit floatperunit operationperunit
data0 grid2dim 46920 4 1 1
data1 grid2dim 46920 4 1 1
data2 grid2dim 46920 4 1 1
data3 grid2dim 46920 4 1 1
data4 grid2dim 46920 4 1 1
data5 grid2dim 46920 4 1 1
data6 grid2dim 46920 4 1 1
data7 grid2dim 46920 4 1 1
data8 grid2dim 46920 4 1 1
data9 grid2dim 46920 4 1 1
data10 grid2dim 46920 4 1 1
data11 grid2dim 46920 4 1 1
data12 grid2dim 46920 4 1 1
data13 grid2dim 46920 4 1 1
data14 grid2dim 46920 4 1 1
........................

HTML version of Basic Foils prepared August 22 98

Foil 30 Interface between Emulator and PetaSIM Pathfinder

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Excerpt from Execution Script:
move data0 from disks0 to bus30
move data0 from bus30 to bus20
move data0 from bus20 to bus10
move data0 from bus10 to mem0
move data0 from mem0 to bus10
move data0 from bus10 to bus20
move data0 from bus20 to nwa0
move data0 from nwa0 to network0
move data0 from network0 to nwa11
move data0 from nwa11 to bus211
move data0 from bus211 to bus111
move data0 from bus111 to mem11
move data0 from mem11 to bus111
move data0 from bus111 to cpu11
compute data0 on cpu11

HTML version of Basic Foils prepared August 22 98

Foil 31 Pathfinder Performance Estimation Results (SP2 Architecture II)

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Maryland Resource Parameters Manufacturer Parameters
Execution estimate(appl)
Execution estimate(max)

HTML version of Basic Foils prepared August 22 98

Foil 32 Pathfinder Scaling: Performance v. # Processors

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Measured Execution Time
PetaSIM Running Time
Estimated Application Execution Time

HTML version of Basic Foils prepared August 22 98

Foil 33 Pathfinder Estimation Results II

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Measured Execution Time
PetaSIM Running Time
Estimated Application Execution Time

HTML version of Basic Foils prepared August 22 98

Foil 34 PetaSIM Estimation Results (Titan Architecture I)

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Execution estimate

HTML version of Basic Foils prepared August 22 98

Foil 35 Titan Estimation Results (Architecture II)

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Maryland Resource Parameters Manufacturer Parameters
Execution estimate(appl)
Execution estimate(max)

HTML version of Basic Foils prepared August 22 98

Foil 36 Titan Estimation Results (Fixed)

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
PetaSIM Running Time
Estimated Application Execution Time
Measured Execution Time

HTML version of Basic Foils prepared August 22 98

Foil 37 VMScope Performance Estimation Results (Architecture II)

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Maryland Resource Parameters Manufacturer Parameters
Execution estimate(appl)
Execution estimate(max)

HTML version of Basic Foils prepared August 22 98

Foil 38 VMScope Estimation Results

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Measured Execution Time
PetaSIM Running Time
Estimated Application Execution Time

HTML version of Basic Foils prepared August 22 98

Foil 39 PetaSIM Features

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Reasonably Accurate estimation
Friendly user interface
  • Easy to modify the architecture design
  • Easy to monitor the effect of the design change
  • Advantage of Applet frontend so can use anywhere
Reasonably Fast Estimation (run at suitable grain size)
Can get detailed performance estimation
  • Provides detailed usage of each individual nodeset and linkset member in the memory hierarchy

HTML version of Basic Foils prepared August 22 98

Foil 40 Comparison with other Simulator Approaches

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Idiosyncratic Simulation Approach
  • PetaSIM does not run the real application
  • Uses an execution script (operation abstraction) to allow variable grain size and support of memory hierarchies
  • explicit data movement
PetaSIM exploits loosely synchronous collective computation
PetaSIM runs on single processor
PetaSIM can easily deal with different kinds of computer architecture
PetaSIM can get detailed information for any aspect of the architecture

HTML version of Basic Foils prepared August 22 98

Foil 41 Summary of PetaSIM

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Easy modified Architecture and Application description
Architecture Description (nodeset & linkset)
  • will link to proposed Java Framework for Computation
  • Other projects here can provide input?
Application Description (dataset & execution script)
Supports Loosely Synchronous Data Parallel Model & Custom Control
Link to Maryland Application Emulators
Jacobi hand-written example -- add SWEEP3D?
  • Can estimate Fortran C C++ or Java applications
Pathfinder, Titan, VMScope real applications (Generated by UMD's Emulator) -- data intensive
Look at UML for Interface and coarse grain specification
Fast and reasonably accurate performance estimation (PetaSIM runs on single processor)
Java applet based user Interface
About 6000 lines of C++ (server) and 4000 lines of Java (client)

HTML version of Basic Foils prepared August 22 98

Foil 42 Possible Future Work

From Details of PetaSIM and its relationship to Performance Specification Languages Darpa Workshop on Performance Engineered Systems Annapolis Md. -- August 19-21 1998. *
Full HTML Index
Richer set of applications using standard benchmarks and specific DoD (NSF DoE?) applications
Relate object model to those used in "seamless interfaces" / metacomputing i.e. to efforts to establish (distributed) object model for computation
Review very simple execution script -- should we add more complex (loosely synchronous) primitives or regard "application emulators" as this complex script
Binary format ("compiled PetaSIM") of architecture and application description ( ASCII format will make execution script very large)
  • Translation tool from ASCII format to binary format (to retain the friendly user interface)
Upgrade performance evaluation model
Run performance simulation in parallel (i.e. PetaSIM running on multi-processors)

© Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sat Aug 22 1998