Full HTML for

Basic foilset Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing

Given by Peter Kogge Notre Dame at PAWS 96 Mandalay Beach on April 21-26 1996. Foils prepared June 1996
Outside Index Summary of Material


This was part of a set of PAWS 96(Mandalay Beach) Presentations
Kogge and Collaboraters describe PIM as an emerging architecture where logic and memory combined on same chip which increases memory bandwidth naturally
Conventional Architectures tend to waste transistors measured in terms silicon used per unit operation
Both Existing designs and projections to PetaFlop timescale(2007) are given

Table of Contents for full HTML of Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing

Denote Foils where Image Critical
Denote Foils where HTML is sufficient

1 Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing
2 Acknowledgements
3 Memory & CPU Bandwidth Gap
4 Key Points
5 This Talk: A Better Way!
6 Observations on This Talk
7 HPCC & TeraFlops
8 Petaflop Chain of Events
9 Results from Pasadena `94
10 PetaFlops Applications
11 Pasadena Architectures
12 Bodega Bay: Primary Memory
13 Bodega Bay: Secondary Memory
14 Bodega Bay: Aggregate I/O
15 Cost Considerations: Processors
16 Cost Considerations: Memory
17 The "Hidden Costs" of Modern Systems
18 The Overlooked Bandwidth
19 Modern "Alternative" RAMs
20 Processing In Memory (PIM): Reclaiming the Bandwidth
21 PIM: Optimizing the System
22 Market Demand for Dense Processing
23 Current PIM Chips
24 Key Problem: Memory Density
25 Vendors with Known DRAM PIM Capability
26 EXECUBE: The First High Density PIM
27 Execube Processing Node
28 Tiling of Execube Processing Nodes
29 Lessons Learned from EXECUBE
30 New "Strawman" PIM Processing Node Macro
31 "Strawman" Chip Floorplan
32 Strawman Chip Interfaces
33 Strawman PIM Chip with I/O Macros
34 Strawman Properties
35 Strawman PIM "Memory Card"
36 Choosing the Processing Macro
37 Performance Per Transistor
38 SIA-Based PIM Chip Projections
39 Silicon Area for a Teraflop
40 Parallelism
41 Petaflop PIM System Size
42 Power Projections (Logic)
43 Power Per Sq. Cm
44 3D Stacking
45 Potential PIM Cube
46 Potential PIM Cube
47 Further Work: Hardware
48 Further Work:Algorithm Development
49 Further Work: Software Development
50 Current ND PIM Work In Progress
51 Conclusion

Outside Index Summary of Material



HTML version of Basic Foils prepared June 1996

Foil 1 Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Dr. Peter M. Kogge
McCourtney Prof. of Computer Science & Engr.
IEEE Fellow, IBM Fellow
Dr. Jay B. Brockman, Assistant Prof.
Dept. of Computer Science & Engr.
University of Notre Dame
219-631-6763
kogge@cse.nd.edu

HTML version of Basic Foils prepared June 1996

Foil 2 Acknowledgements

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Project Origins: IBM FSD EXECUBE chip
  • With foundry service from IBM Japan, Yasu
Current project funding includes:
  • NASA Grant NAG 5-2998 "PIM Architectures for Petaflops Computing"
  • NEC Research Institute: "High Speed Image Retrieval Techniques"
  • NSF Grant MIP95-03682, "Inherently Low Power Computers"

HTML version of Basic Foils prepared June 1996

Foil 3 Memory & CPU Bandwidth Gap

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
As technology expands, so does the Gap
"Bridging the Gap:" significant cost implications
PIM changes the game's groundrules
Gap

HTML version of Basic Foils prepared June 1996

Foil 4 Key Points

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Today's High Performance Systems
  • Expensive, Multiple Chips and Chip types
  • Based on conventional wisdom
    • Use fastest possible uP and densest DRAM
    • Coupled with fast but costly hierarchy
  • This is Not Optimal!
    • Separate chips waste inherent bandwidth
      • Forcing more chips
    • Poor use of silicon AND chip-chip contacts
    • Poor technology insertion
      • New uP ==> New system
    • Poor scaling

HTML version of Basic Foils prepared June 1996

Foil 5 This Talk: A Better Way!

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
PIM: Combining memory & logic
Emerging single part type PIM designs
Future projections
Ongoing projects
Lessons learned

HTML version of Basic Foils prepared June 1996

Foil 6 Observations on This Talk

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
New Design Space Exploration
At Best First Order Interactions Identified
Software Tools, in particular, need detail
Analysis only as good as projections

HTML version of Basic Foils prepared June 1996

Foil 7 HPCC & TeraFlops

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
HPCC Program: Conceived in late 1980's
Key Goal: Tera(fl)op by turn of century
  • Spurred new architectures, esp. MPPs
  • 1 TF in sight! Eg. Recent Intel Award
Major impediment: 1 TF = BIG MACHINE!
Final component: Embedded HPCC
  • NASA's REE Program recently initiated
We have declared Success!
Next Goal: Petaflops!

HTML version of Basic Foils prepared June 1996

Foil 8 Petaflop Chain of Events

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Feb. `94: Pasadena Workshop on Enabling Technologies for Petaflops Computing Systems
March `95: Petaflops Workshop at Frontiers'95
Aug. `95: Bodega Bay Workshop on Applications
PETA online: http://cesdis.gsfc.nasa.gov/petaflops/peta.html
Jan. `96: NSF Call for 100 TF "Point Designs"
April `96: Oxnard Petaflops Architecture Workshop (PAWS) on Architectures
June `96: Bodega Bay Petaflops Workshop on System Software
Oct. `96: Workshop at Frontiers `96

HTML version of Basic Foils prepared June 1996

Foil 9 Results from Pasadena `94

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Goal: Systems, Applications, & Software for PetaFlop (10^15)
Applications: Significant number
Technology: Several, inc. CMOS
Architecture: At least 3
Software:"Like your crazy Uncle Fred who no one wants to talk about"

HTML version of Basic Foils prepared June 1996

Foil 10 PetaFlops Applications

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Huge Problem Size
  • Astrophysics, particle physics
  • Oil Reservoir, ground water modeling
  • Quantum chemical studies (eg. of AIDS viruses)
  • Genome search problems
Real Time Computations
  • Global Weather
  • 3D Heart modeling "in the operating room"
  • Global image database queries
  • Video image fusion with virtual environments
Indirect Needs for Petaflops technology
  • Ubiquitous, keyboardless computers
  • Virtual reality, educational "what if"
  • High volume consumer applications (video games)

HTML version of Basic Foils prepared June 1996

Foil 11 Pasadena Architectures

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 12 Bodega Bay: Primary Memory

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 13 Bodega Bay: Secondary Memory

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 14 Bodega Bay: Aggregate I/O

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 15 Cost Considerations: Processors

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Today:
  • 100 MF uP => 1,000,000 CPUs for 100 TF
  • At 10 Support Chips/CPU => 10,000,000 Chips
Future:
  • 10X Clock
  • 4X In Issues/Cycle
  • Still 250,000 Chips
NOT COUNTING MEMORY!
"Glows in the Dark"

HTML version of Basic Foils prepared June 1996

Foil 16 Cost Considerations: Memory

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 17 The "Hidden Costs" of Modern Systems

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Wasted silicon
Wasted bandwidth
Wasted contacts
Wasted power
Unnecessary complexity
Memory
Subsystem
Bus Interface
Secondary
Cache
Cache
CPU
Bandwidth
Loss
Bandwidth
Loss

HTML version of Basic Foils prepared June 1996

Foil 18 The Overlooked Bandwidth

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
With modern DRAMs:
  • 1-16 GB/sec at the row buffers!
  • Even more for SRAMs!
Multiplexor
Row Buffer
Row Buffer
Multiple
internal
banks
256-4096 bits wide
select 1-9 bits: maybe 30-50 MB/s

HTML version of Basic Foils prepared June 1996

Foil 19 Modern "Alternative" RAMs

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Nibble, Page, Fast Page Mode
Video RAMs
Pipelined Extended Data Out RAMs
Dual Bank Synchronous RAMs
Block Transfer RAMBUS
We are adding logic to speed up bandwidth,
BUT STILL LIMITED BY TAKING DATA OFF CHIP!

HTML version of Basic Foils prepared June 1996

Foil 20 Processing In Memory (PIM): Reclaiming the Bandwidth

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Place processing logic on the memory chip
Use logic to consume memory bandwidth directly
  • Utilize row buffers directly
  • Avoid need for off-chip resources
Utilize newly "liberated" chip contacts
  • To communicate directly with other PIMs
  • To interface directly with sensors and I/O

HTML version of Basic Foils prepared June 1996

Foil 21 PIM: Optimizing the System

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Ample bandwidth changes design philosophy
  • Do: Optimize silicon to solve problems efficiently
    • Minimize total system silicon
    • Minimize total system power
    • Minimize design complexity
  • Don't: Force designers into
    • Single highest performance CPU solution
    • With complex bandwidth recovery logic
  • Natural outgrowth of PIM Philosophy
    • Simple, single part type
    • Scalable solutions
    • Inherently parallel
    • VERY "DENSE" PROCESSING

HTML version of Basic Foils prepared June 1996

Foil 22 Market Demand for Dense Processing

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
"Big Science" Supercomputing
  • Grand challenges (teraflops)
  • "Grander" challenges (petaflops)
"Point design" Accelerators
Industrial Supercomputing
Commodity PCs & Workstations
Consumer Applications
Architectural
Design
Complexity
Available
Development
Resources
"Across-the-Board"
need for cheaper,
denser, lower power

HTML version of Basic Foils prepared June 1996

Foil 23 Current PIM Chips

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Storage
0.5 MB
0.5 MB
0.05 MB
0.128 MB
Chip
EXECUBE
AD SHARC
TI MVP
MIT MAP
Terasys PIM
First
Silicon
1993
1994
1994
1996
1993
Peak
50 Mips
120 Mflops
2000 Mops
800 Mflops
625 M bit
ops
0.016 MB
MB/
Perf.
0.01
MB/Mip
0.005
MB/MF
0.000025
MB/Mop
0.00016
MB/MF
0.000026
MB/bit op
Organization
16 bit
SIMD/MIMD CMOS
Single CPU and
Memory
1 CPU, 4 DSP's
4 Superscalar
CPU's
1024
16-bit ALU's

HTML version of Basic Foils prepared June 1996

Foil 24 Key Problem: Memory Density

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Conventional GP Computation Wisdom:
  • Linear: 1 Byte of memory per flop
May be somewhat reduced at high end
  • Large scale simulations: Storage = Flops^3/4
  • Recent petaflops studies: Some large computations at 0.01 B/Flop
Graphics & other embedded applications
  • Current << 1 B/op
  • BUT!: Future requires more
    • Increased resolution/color
    • 3D rendering
    • Full motion video compression/decompression
  • Conclusion: DRAM Essential!

HTML version of Basic Foils prepared June 1996

Foil 25 Vendors with Known DRAM PIM Capability

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
IBM: 4 Mbit (1991), 16 Mbit (1994)
Toshiba: 8 Mbit (1994)
Mitsubitsi: 10 Mbit (1994)
Hitachi: 4 Mbit (1995)
Samsung: 4 Mbit (1996)
NEC: 8 Mbit (1996)

HTML version of Basic Foils prepared June 1996

Foil 26 EXECUBE: The First High Density PIM

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
4 Mbit DRAM + 100K Gate base: 5V, 2.7W
Single part type: NO GLUE!
SIMD & MIMD - In Any Combination
Huge increase in BW, pins, silicon.. utilization
register
instruction
register
register
array
register
32Kx9 DRAM
macro
32Kx9 DRAM
macro
ALU
SIMD
MIMD
mode
SIMD
Broadcast Bus
decode
logic
DMA
logic

HTML version of Basic Foils prepared June 1996

Foil 27 Execube Processing Node

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
DMA Channel and
Link Control
16-Bit CPU

HTML version of Basic Foils prepared June 1996

Foil 28 Tiling of Execube Processing Nodes

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 29 Lessons Learned from EXECUBE

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Absolute Need: DENSEST POSSIBLE Memory
Single Part Type => True Scalable Systems
Bandwidth "Next Door" => Simpler CPU
Integrated, Fast I/O => Simpler Apps
Mixed SIMD/MIMD => Simple Parallelization
Next Time:
  • Memory macro organization is Key
  • Add more CPU visibility into Memory Macro
  • "Generalize" SIMD Bus
  • Support for Virtual Shared Memory
  • I/O for Chip-Chip & System-System
  • Floating point

HTML version of Basic Foils prepared June 1996

Foil 30 New "Strawman" PIM Processing Node Macro

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 31 "Strawman" Chip Floorplan

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 32 Strawman Chip Interfaces

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Combined Memory Bus/Chip Control
  • "Look like memory" to a host
  • Memory mapped SIMD Instruction Broadcast
  • Memory mapped I/O (esp. System) Control
Chip to Chip Links to support scalable MPPs
External system links:
  • To other chip clusters
  • To high speed I/O

HTML version of Basic Foils prepared June 1996

Foil 33 Strawman PIM Chip with I/O Macros

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 34 Strawman Properties

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Huge bandwidths available to processing logic
  • "Free" 4 line cache at the sense amps
  • Minimal addressing delays => minimal latency
Tremendous internode bandwidths
  • "Built in" local shared memory
Huge bandwidths available at chip periphery
2D tiling prevents wires "over memory"
Opportunity for "mix and match"
  • Memory macros
  • Processing logic
  • External I/O protocols

HTML version of Basic Foils prepared June 1996

Foil 35 Strawman PIM "Memory Card"

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 36 Choosing the Processing Macro

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Today's "conventional wisdom:"
  • Complex memory hierarchy driving superscalar, superpipelined, branch predictions, fast TLBs, multiple function units, multi ported register files,.....
Does that make sense in PIM environment?
  • Large bandwidth from direct row buffer access
  • Reduced latency (no chip crossings)
  • Naturally closely coupled parallelism
Answer: No! Better choice: design for:
  • Maximum performance "per transistor"
  • Minimize power per mip

HTML version of Basic Foils prepared June 1996

Foil 37 Performance Per Transistor

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Performance data from uP vendors
Transistor count excludes on-chip caches
Performance normalized by clock rate
Conclusion: Simplest is best! (250K Transistor CPU)
Millions of Transistors (CPU)
Millions of Transistors (CPU)
Normalized SPECINTS
Normalized SPECFLTS

HTML version of Basic Foils prepared June 1996

Foil 38 SIA-Based PIM Chip Projections

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
MB per cm2
MF per cm2
MB/MF ratios
MB/MF ratios

HTML version of Basic Foils prepared June 1996

Foil 39 Silicon Area for a Teraflop

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 40 Parallelism

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Processors needed for Teraflop
MB/MF ratios
Processors per cm2

HTML version of Basic Foils prepared June 1996

Foil 41 Petaflop PIM System Size

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 42 Power Projections (Logic)

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
By 2010: Same logic =
* 1/50 to 1/100th the power
* 100X the performance per Watt

HTML version of Basic Foils prepared June 1996

Foil 43 Power Per Sq. Cm

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
1 MB/MF
We Must! Look at Lower Power Designs

HTML version of Basic Foils prepared June 1996

Foil 44 3D Stacking

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Goal: create 3D cubes of silicon
Process:
  • "Thin" die to 7 mils or smaller
  • "Glue" together
  • Plate wires on sides
Stacks of 70 or more have been demonstrated
Ideal for PIMs
  • Same chip type throughout
  • Most side wires: common or chip-chip
  • "Contact explosion" avoidable
Problems Today: >2 side wiring, Power

HTML version of Basic Foils prepared June 1996

Foil 45 Potential PIM Cube

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 46 Potential PIM Cube

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Power, Power, Power!

HTML version of Basic Foils prepared June 1996

Foil 47 Further Work: Hardware

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Core "conventional" CPU selection:
  • More accurate area & performance estimates
  • Power projections
Optimal PIM ISA & Organization
  • Memory structures program visible
  • Embedded PIM MPP support
Optimized PIM memory macro
  • Scalable size
  • Processing "at the sense amps"
Selection of I/O Protocols
Integrated CAD support

HTML version of Basic Foils prepared June 1996

Foil 48 Further Work:Algorithm Development

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Find inherently low MB/MF algorithms
Express as "In the Memory" operations
Utilize huge degrees of parallelism
Utilize rich mix of parallel styles
  • MIMD parallelism
  • SIMD setup & synchronization
  • Concurrent conventional scalar overhead control
Utilize very large inter node bandwidths

HTML version of Basic Foils prepared June 1996

Foil 49 Further Work: Software Development

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Local node code for "In the Memory" operations
  • Expression, Partitioning, Allocation, Scheduling
Node run time kernal
  • Message passing, Virtual Shared Memory, Synchronization
Host control
  • Memory allocation, code initiation, synchronization

HTML version of Basic Foils prepared June 1996

Foil 50 Current ND PIM Work In Progress

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
Graduate Student Projects:
  • EXECUBE-Based PCMCIA Card
  • PIM Parallel Databases, Postscript
NASA: Future PIM design space for petaflops
NSF: Inherently low power ISAs
NEC: PIM-based Image database search
ARPA: PIM Foundries & DA Tools
NSF:Point Designs for 100 TFlops
NASA: Rad Hard PIM for Spacecraft

HTML version of Basic Foils prepared June 1996

Foil 51 Conclusion

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *
Full HTML Index
PIM: Potential Breakthrough
  • Natural Evolution of Technology
  • Huge complexity reductions
  • Highly scalable
BUT: "Breaking the Rules & Changing the Game"
  • Need to revisit ALL OUR ASSUMPTIONS
To make PIM mainstream:
  • Simple integrated Logic/Memory CAD tools
  • Strong "Proof of Concept" Demos
  • "Really Massive" Parallel Development Tools
  • Reoptimization of everything: space & power

Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Apr 11 1999