Name: one per nodeset object
|
type: choose from memory, cache, disk, CPU, pathway
|
number: number of members of this nodeset in the architecture
|
grainsize: size in bytes of each member of this nodeset (for memory, cache, disk)
|
bandwidth: maximum bandwidth allowed in any one member of this nodeset
|
floatspeed: CPU's float calculating speed
|
calculate(): method used by CPU nodeset to perform computation
|
cacherule: controls persistence of data in a memory or cache
|
portcount: number of ports on each member of nodeset
|
portname[]: ports connected to linkset
|
portlink[]: name of linkset connecting to this port
|
nodeset_member_list: list of nodeset members in this nodeset (for nodeset member identification)
|
Name: one per linkset object
|
type: choose from updown, across
|
nodesetbegin: name of initial nodeset joined by this linkset
|
nodesetend: name of final nodeset joined buy this linkset
|
topology: used for across networks to specify linkage between members of a single nodeset
|
duplex: choose from full or half
|
number: number of members of this linkset in the architecture
|
latency: time to send zero length message across any member of linkset
|
bandwidth: maximum bandwidth allowed in any link of this linkset
|
send(): method that calculates cost of sending a message across the linkset
|
distribution: name of geometric distribution controlling this linkset
|
linkset_member_list: list of linkset members in this linkset ( for linkset member identification )
|
Name: one per dataset object
|
choose from grid1dim, grid2dim, grid3dim, specifies type of dataset
|
bytesperunit: number of bytes in each unit
|
floatsperunit: update cost as a floating point arithmetic count
|
operationsperunit: operations in each unit
|
update(): method that updates given dataset which is contained in a CPU nodeset and a grainsize controlled by last memory nodeset visited
|
transmit(): method that calculates cost of transmission of dataset between memory levels either communication or movement up and down hierarchy
-
Methods can use other parameters or be custom
|
cpu CPU 8 32 1 1.56116e-7
|
mem link1
|
cpu0 1 mem0 link10
|
cpu1 1 mem1 link11
|
cpu2 1 mem2 link12
|
cpu3 1 mem3 link13
|
cpu4 1 mem4 link14
|
cpu5 1 mem5 link15
|
cpu6 1 mem6 link16
|
cpu7 1 mem7 link17
|
name Type number grainsize portlink (floatspeed)
|
nodeset_name linkset_name ( replicated #times = # links for nodeset in first line)
|
nodeset_member link_number nodeset_member linkset_member (pair replicated again)
|
disks Disk 8 2147483648 1
|
ctl1 link3
|
d0 1 ctl10 link30
|
d1 1 ctl11 link31
|
d2 1 ctl12 link32
|
d3 1 ctl13 link33
|
d4 1 ctl14 link34
|
d5 1 ctl15 link35
|
d6 1 ctl16 link35
|
d7 1 ctl17 link37
|
mem Memory 8 134217728 2
|
cpu link1 ctl2 link2
|
mem0 2 cpu0 link10 ctl20 link20
|
mem1 2 cpu1 link11 ctl21 link21
|
mem2 2 cpu2 link12 ctl22 link22
|
mem3 2 cpu3 link13 ctl23 link23
|
mem4 2 cpu4 link14 ctl24 link24
|
mem5 2 cpu5 link15 ctl25 link25
|
mem6 2 cpu6 link16 ctl26 link26
|
mem7 2 cpu7 link17 ctl27 link27
|
ctl2 Pathway 8 0 3
|
mem link2 ctl1 link4 network link5
|
ctl20 3 mem0 link20 ctl10 link40 network0 link50
|
ctl21 3 mem1 link21 ctl11 link41 network0 link51
|
ctl22 3 mem2 link22 ctl12 link42 network0 link52
|
ctl23 3 mem3 link23 ctl13 link43 network0 link53
|
ctl24 3 mem4 link24 ctl14 link44 network0 link54
|
ctl25 3 mem5 link25 ctl15 link45 network0 link55
|
ctl26 3 mem6 link26 ctl16 link46 network0 link56
|
ctl27 3 mem7 link27 ctl17 link47 network0 link57
|
network Switch 1 0 1
|
ctl2 link5
|
network0 8 ctl20 link50 ctl21 link51 ctl22 link52 ctl23 link53 ctl24 link54 ctl25 link55 ctl26 link56 ctl27 link57
|
ctl1 Pathway 8 0 2
|
disks link3 ctl2 link4
|
ctl10 2 d0 link30 ctl20 link40
|
ctl11 2 d1 link31 ctl21 link41
|
ctl12 2 d2 link32 ctl22 link42
|
ctl13 2 d3 link33 ctl23 link43
|
ctl14 2 d4 link34 ctl24 link44
|
ctl15 2 d5 link35 ctl25 link45
|
ctl16 2 d6 link36 ctl26 link46
|
ctl17 2 d7 link37 ctl27 link47
|
compute Jacobi on cpu4
|
Move Jacobi from d1 to ctl11
|
move Jacobi from ctl11 to ctl21
|
move Jacobi from ctl21 to mem1
|
move Jacobi from mem1 to ctl21
|
move Jacobi from ctl21 to network0
|
move Jacobi from network0 to ctl20
|
move Jacobi from ctl20 to mem0
|
move Jacobi from mem0 to cpu0
|
compute Jacobi on cpu0
|
move Jacobi from mem1 to ctl21
|
move Jacobi from ctl21 to network0
|
move Jacobi from network0 to ctl22
|
move Jacobi from ctl22 to mem2
|
move Jacobi from mem2 to cpu2
|
compute Jacobi on cpu2
|
move Jacobi from mem1 to ctl21
|
move Jacobi from ctl21 to network0
|
move Jacobi from network0 to ctl25
|
move Jacobi from ctl25 to mem5
|
move Jacobi from mem5 to cpu5
|
compute Jacobi on cpu5
|
Move Jacobi from d0 to ctl10
|
move Jacobi from ctl10 to ctl20
|
move Jacobi from ctl20 to mem0
|
move Jacobi from mem0 to ctl20
|
move Jacobi from ctl20 to network0
|
move Jacobi from network0 to ctl21
|
move Jacobi from ctl21 to mem1
|
move Jacobi from mem1 to cpu1
|
compute Jacobi on cpu1
|
move Jacobi from mem0 to ctl20
|
move Jacobi from ctl20 to network0
|
move Jacobi from network0 to ctl24
|
move Jacobi from ctl24 to mem4
|
move Jacobi from mem4 to cpu4
|
Also a simpler data parallel version
|
Move Jacobi from d5 to ctl15
|
move Jacobi from ctl15 to ctl25
|
move Jacobi from ctl25 to mem5
|
move Jacobi from mem5 to ctl25
|
move Jacobi from ctl25 to network0
|
move Jacobi from network0 to ctl21
|
move Jacobi from ctl21 to mem1
|
move Jacobi from mem1 to cpu1
|
compute Jacobi on cpu1
|
move Jacobi from mem6 to ctl26
|
move Jacobi from ctl26 to network0
|
move Jacobi from network0 to ctl27
|
move Jacobi from ctl27 to mem7
|
move Jacobi from mem7 to cpu7
|
compute Jacobi on cpu7
|
Move Jacobi from d7 to ctl17
|
move Jacobi from ctl17 to ctl27
|
move Jacobi from ctl27 to mem7
|
move Jacobi from mem7 to ctl27
|
move Jacobi from ctl27 to network0
|
move Jacobi from network0 to ctl23
|
move Jacobi from ctl23 to mem3
|
move Jacobi from mem3 to cpu3
|
compute Jacobi on cpu3
|
move Jacobi from mem7 to ctl27
|
move Jacobi from ctl27 to network0
|
move Jacobi from network0 to ctl26
|
move Jacobi from ctl26 to mem6
|
move Jacobi from mem6 to cpu6
|
compute Jacobi on cpu6
|
synchronize
|
Excerpt from Dataset Definitions:
|
name type size bytesperunit floatperunit operationperunit
|
data0 grid2dim 46920 4 1 1
|
data1 grid2dim 46920 4 1 1
|
data2 grid2dim 46920 4 1 1
|
data3 grid2dim 46920 4 1 1
|
data4 grid2dim 46920 4 1 1
|
data5 grid2dim 46920 4 1 1
|
data6 grid2dim 46920 4 1 1
|
data7 grid2dim 46920 4 1 1
|
data8 grid2dim 46920 4 1 1
|
data9 grid2dim 46920 4 1 1
|
data10 grid2dim 46920 4 1 1
|
data11 grid2dim 46920 4 1 1
|
data12 grid2dim 46920 4 1 1
|
data13 grid2dim 46920 4 1 1
|
data14 grid2dim 46920 4 1 1
|
........................
|
Excerpt from Execution Script:
|
move data0 from disks0 to bus30
|
move data0 from bus30 to bus20
|
move data0 from bus20 to bus10
|
move data0 from bus10 to mem0
|
move data0 from mem0 to bus10
|
move data0 from bus10 to bus20
|
move data0 from bus20 to nwa0
|
move data0 from nwa0 to network0
|
move data0 from network0 to nwa11
|
move data0 from nwa11 to bus211
|
move data0 from bus211 to bus111
|
move data0 from bus111 to mem11
|
move data0 from mem11 to bus111
|
move data0 from bus111 to cpu11
|
compute data0 on cpu11
|
Architecture Description (nodeset & linkset)
|
Application Description (dataset & execution script)
|
Link to Application Emulators
|
Jacobi hand-written example
|
Pathfinder, Titan, VMScope real applications (Generated by UMD's Emulator)
|
Easy modified Architecture and Application description
|
Fast and relatively Accurate performance estimation (PetaSIM running on single processor)
|
Java applet based user Interface
|
About 5000 lines of C++ and 4000 lines of Java (client and server)
|
|