Full HTML for Basic Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing

Full HTML for

Basic foilset Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing

Given by Peter Kogge Notre Dame at PAWS 96 Mandalay Beach on April 21-26 1996. Foils prepared June 1996
Outside Index Summary of Material

This was part of a set of PAWS 96(Mandalay Beach) Presentations

Kogge and Collaboraters describe PIM as an emerging architecture where logic and memory combined on same chip which increases memory bandwidth naturally

Conventional Architectures tend to waste transistors measured in terms silicon used per unit operation

Both Existing designs and projections to PetaFlop timescale(2007) are given

Table of Contents for full HTML of Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing

Denote Foils where Image Critical

Denote Foils where HTML is sufficient

1

Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing
2

Acknowledgements
3

Memory & CPU Bandwidth Gap
4

Key Points
5

This Talk: A Better Way!
6

Observations on This Talk
7

HPCC & TeraFlops
8

Petaflop Chain of Events
9

Results from Pasadena `94
10

PetaFlops Applications
11

Pasadena Architectures
12

Bodega Bay: Primary Memory
13

Bodega Bay: Secondary Memory
14

Bodega Bay: Aggregate I/O
15

Cost Considerations: Processors
16

Cost Considerations: Memory
17

The "Hidden Costs" of Modern Systems
18

The Overlooked Bandwidth
19

Modern "Alternative" RAMs
20

Processing In Memory (PIM): Reclaiming the Bandwidth
21

PIM: Optimizing the System
22

Market Demand for Dense Processing
23

Current PIM Chips
24

Key Problem: Memory Density
25

Vendors with Known DRAM PIM Capability
26

EXECUBE: The First High Density PIM
27

Execube Processing Node
28

Tiling of Execube Processing Nodes
29

Lessons Learned from EXECUBE
30

New "Strawman" PIM Processing Node Macro
31

"Strawman" Chip Floorplan
32

Strawman Chip Interfaces
33

Strawman PIM Chip with I/O Macros
34

Strawman Properties
35

Strawman PIM "Memory Card"
36

Choosing the Processing Macro
37

Performance Per Transistor
38

SIA-Based PIM Chip Projections
39

Silicon Area for a Teraflop
40

Parallelism
41

Petaflop PIM System Size
42

Power Projections (Logic)
43

Power Per Sq. Cm
44

3D Stacking
45

Potential PIM Cube
46

Potential PIM Cube
47

Further Work: Hardware
48

Further Work:Algorithm Development
49

Further Work: Software Development
50

Current ND PIM Work In Progress
51

Conclusion

Outside Index Summary of Material

HTML version of Basic Foils prepared June 1996

Foil 1 Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Dr. Peter M. Kogge

McCourtney Prof. of Computer Science & Engr.

IEEE Fellow, IBM Fellow

Dr. Jay B. Brockman, Assistant Prof.

Dept. of Computer Science & Engr.

University of Notre Dame

219-631-6763

kogge@cse.nd.edu

HTML version of Basic Foils prepared June 1996

Foil 2 Acknowledgements

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Project Origins: IBM FSD EXECUBE chip

With foundry service from IBM Japan, Yasu

Current project funding includes:

NASA Grant NAG 5-2998 "PIM Architectures for Petaflops Computing"
NEC Research Institute: "High Speed Image Retrieval Techniques"
NSF Grant MIP95-03682, "Inherently Low Power Computers"

HTML version of Basic Foils prepared June 1996

Foil 3 Memory & CPU Bandwidth Gap

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

As technology expands, so does the Gap

"Bridging the Gap:" significant cost implications

PIM changes the game's groundrules

Gap

HTML version of Basic Foils prepared June 1996

Foil 4 Key Points

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Today's High Performance Systems

Expensive, Multiple Chips and Chip types
Based on conventional wisdom
- Use fastest possible uP and densest DRAM
- Coupled with fast but costly hierarchy
This is Not Optimal!
- Separate chips waste inherent bandwidth
  - Forcing more chips
- Poor use of silicon AND chip-chip contacts
- Poor technology insertion
  - New uP ==> New system
- Poor scaling

HTML version of Basic Foils prepared June 1996

Foil 5 This Talk: A Better Way!

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

PIM: Combining memory & logic

Emerging single part type PIM designs

Future projections

Ongoing projects

Lessons learned

HTML version of Basic Foils prepared June 1996

Foil 6 Observations on This Talk

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

New Design Space Exploration

At Best First Order Interactions Identified

Software Tools, in particular, need detail

Analysis only as good as projections

HTML version of Basic Foils prepared June 1996

Foil 7 HPCC & TeraFlops

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HPCC Program: Conceived in late 1980's

Key Goal: Tera(fl)op by turn of century

Spurred new architectures, esp. MPPs
1 TF in sight! Eg. Recent Intel Award

Major impediment: 1 TF = BIG MACHINE!

Final component: Embedded HPCC

NASA's REE Program recently initiated

We have declared Success!

Next Goal: Petaflops!

HTML version of Basic Foils prepared June 1996

Foil 8 Petaflop Chain of Events

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Feb. `94: Pasadena Workshop on Enabling Technologies for Petaflops Computing Systems

March `95: Petaflops Workshop at Frontiers'95

Aug. `95: Bodega Bay Workshop on Applications

PETA online: http://cesdis.gsfc.nasa.gov/petaflops/peta.html

Jan. `96: NSF Call for 100 TF "Point Designs"

April `96: Oxnard Petaflops Architecture Workshop (PAWS) on Architectures

June `96: Bodega Bay Petaflops Workshop on System Software

Oct. `96: Workshop at Frontiers `96

HTML version of Basic Foils prepared June 1996

Foil 9 Results from Pasadena `94

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Goal: Systems, Applications, & Software for PetaFlop (10^15)

Applications: Significant number

Technology: Several, inc. CMOS

Architecture: At least 3

Software:"Like your crazy Uncle Fred who no one wants to talk about"

HTML version of Basic Foils prepared June 1996

Foil 10 PetaFlops Applications

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Huge Problem Size

Astrophysics, particle physics
Oil Reservoir, ground water modeling
Quantum chemical studies (eg. of AIDS viruses)
Genome search problems

Real Time Computations

Global Weather
3D Heart modeling "in the operating room"
Global image database queries
Video image fusion with virtual environments

Indirect Needs for Petaflops technology

Ubiquitous, keyboardless computers
Virtual reality, educational "what if"
High volume consumer applications (video games)

HTML version of Basic Foils prepared June 1996

Foil 11 Pasadena Architectures

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 12 Bodega Bay: Primary Memory

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 13 Bodega Bay: Secondary Memory

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 14 Bodega Bay: Aggregate I/O

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 15 Cost Considerations: Processors

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Today:

100 MF uP => 1,000,000 CPUs for 100 TF
At 10 Support Chips/CPU => 10,000,000 Chips

Future:

10X Clock
4X In Issues/Cycle
Still 250,000 Chips

NOT COUNTING MEMORY!

"Glows in the Dark"

HTML version of Basic Foils prepared June 1996

Foil 16 Cost Considerations: Memory

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 17 The "Hidden Costs" of Modern Systems

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Wasted silicon

Wasted bandwidth

Wasted contacts

Wasted power

Unnecessary complexity

Memory

Subsystem

Bus Interface

Secondary

Cache

CPU

Bandwidth

Loss

Bandwidth

Loss

HTML version of Basic Foils prepared June 1996

Foil 18 The Overlooked Bandwidth

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

With modern DRAMs:

1-16 GB/sec at the row buffers!
Even more for SRAMs!

Multiplexor

Row Buffer

Multiple

internal

banks

256-4096 bits wide

select 1-9 bits: maybe 30-50 MB/s

HTML version of Basic Foils prepared June 1996

Foil 19 Modern "Alternative" RAMs

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Nibble, Page, Fast Page Mode

Video RAMs

Pipelined Extended Data Out RAMs

Dual Bank Synchronous RAMs

Block Transfer RAMBUS

We are adding logic to speed up bandwidth,

BUT STILL LIMITED BY TAKING DATA OFF CHIP!

HTML version of Basic Foils prepared June 1996

Foil 20 Processing In Memory (PIM): Reclaiming the Bandwidth

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Place processing logic on the memory chip

Use logic to consume memory bandwidth directly

Utilize row buffers directly
Avoid need for off-chip resources

Utilize newly "liberated" chip contacts

To communicate directly with other PIMs
To interface directly with sensors and I/O

HTML version of Basic Foils prepared June 1996

Foil 21 PIM: Optimizing the System

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Ample bandwidth changes design philosophy

Do: Optimize silicon to solve problems efficiently
- Minimize total system silicon
- Minimize total system power
- Minimize design complexity
Don't: Force designers into
- Single highest performance CPU solution
- With complex bandwidth recovery logic
Natural outgrowth of PIM Philosophy
- Simple, single part type
- Scalable solutions
- Inherently parallel
- VERY "DENSE" PROCESSING

HTML version of Basic Foils prepared June 1996

Foil 22 Market Demand for Dense Processing

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

"Big Science" Supercomputing

Grand challenges (teraflops)
"Grander" challenges (petaflops)

"Point design" Accelerators

Industrial Supercomputing

Commodity PCs & Workstations

Consumer Applications

Architectural

Design

Complexity

Available

Development

Resources

"Across-the-Board"

need for cheaper,

denser, lower power

HTML version of Basic Foils prepared June 1996

Foil 23 Current PIM Chips

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Storage

0.5 MB

0.05 MB

0.128 MB

Chip

EXECUBE

AD SHARC

TI MVP

MIT MAP

Terasys PIM

First

Silicon

1993

1994

1996

1993

Peak

50 Mips

120 Mflops

2000 Mops

800 Mflops

625 M bit

ops

0.016 MB

MB/

Perf.

0.01

MB/Mip

0.005

MB/MF

0.000025

MB/Mop

0.00016

MB/MF

0.000026

MB/bit op

Organization

16 bit

SIMD/MIMD CMOS

Single CPU and

Memory

1 CPU, 4 DSP's

4 Superscalar

CPU's

1024

16-bit ALU's

HTML version of Basic Foils prepared June 1996

Foil 24 Key Problem: Memory Density

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Conventional GP Computation Wisdom:

Linear: 1 Byte of memory per flop

May be somewhat reduced at high end

Large scale simulations: Storage = Flops^3/4
Recent petaflops studies: Some large computations at 0.01 B/Flop

Graphics & other embedded applications

Current << 1 B/op
BUT!: Future requires more
- Increased resolution/color
- 3D rendering
- Full motion video compression/decompression
Conclusion: DRAM Essential!

HTML version of Basic Foils prepared June 1996

Foil 25 Vendors with Known DRAM PIM Capability

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

IBM: 4 Mbit (1991), 16 Mbit (1994)

Toshiba: 8 Mbit (1994)

Mitsubitsi: 10 Mbit (1994)

Hitachi: 4 Mbit (1995)

Samsung: 4 Mbit (1996)

NEC: 8 Mbit (1996)

HTML version of Basic Foils prepared June 1996

Foil 26 EXECUBE: The First High Density PIM

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

4 Mbit DRAM + 100K Gate base: 5V, 2.7W

Single part type: NO GLUE!

SIMD & MIMD - In Any Combination

Huge increase in BW, pins, silicon.. utilization

register

instruction

register

array

register

32Kx9 DRAM

macro

32Kx9 DRAM

macro

ALU

SIMD

MIMD

mode

SIMD

Broadcast Bus

decode

logic

DMA

logic

HTML version of Basic Foils prepared June 1996

Foil 27 Execube Processing Node

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

DMA Channel and

Link Control

16-Bit CPU

HTML version of Basic Foils prepared June 1996

Foil 28 Tiling of Execube Processing Nodes

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 29 Lessons Learned from EXECUBE

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Absolute Need: DENSEST POSSIBLE Memory

Single Part Type => True Scalable Systems

Bandwidth "Next Door" => Simpler CPU

Integrated, Fast I/O => Simpler Apps

Mixed SIMD/MIMD => Simple Parallelization

Next Time:

Memory macro organization is Key
Add more CPU visibility into Memory Macro
"Generalize" SIMD Bus
Support for Virtual Shared Memory
I/O for Chip-Chip & System-System
Floating point

HTML version of Basic Foils prepared June 1996

Foil 30 New "Strawman" PIM Processing Node Macro

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 31 "Strawman" Chip Floorplan

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 32 Strawman Chip Interfaces

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Combined Memory Bus/Chip Control

"Look like memory" to a host
Memory mapped SIMD Instruction Broadcast
Memory mapped I/O (esp. System) Control

Chip to Chip Links to support scalable MPPs

External system links:

To other chip clusters
To high speed I/O

HTML version of Basic Foils prepared June 1996

Foil 33 Strawman PIM Chip with I/O Macros

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 34 Strawman Properties

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Huge bandwidths available to processing logic

"Free" 4 line cache at the sense amps
Minimal addressing delays => minimal latency

Tremendous internode bandwidths

"Built in" local shared memory

Huge bandwidths available at chip periphery

2D tiling prevents wires "over memory"

Opportunity for "mix and match"

Memory macros
Processing logic
External I/O protocols

HTML version of Basic Foils prepared June 1996

Foil 35 Strawman PIM "Memory Card"

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 36 Choosing the Processing Macro

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Today's "conventional wisdom:"

Complex memory hierarchy driving superscalar, superpipelined, branch predictions, fast TLBs, multiple function units, multi ported register files,.....

Does that make sense in PIM environment?

Large bandwidth from direct row buffer access
Reduced latency (no chip crossings)
Naturally closely coupled parallelism

Answer: No! Better choice: design for:

Maximum performance "per transistor"
Minimize power per mip

HTML version of Basic Foils prepared June 1996

Foil 37 Performance Per Transistor

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Performance data from uP vendors

Transistor count excludes on-chip caches

Performance normalized by clock rate

Conclusion: Simplest is best! (250K Transistor CPU)

Millions of Transistors (CPU)

Normalized SPECINTS

Normalized SPECFLTS

HTML version of Basic Foils prepared June 1996

Foil 38 SIA-Based PIM Chip Projections

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

MB per cm2

MF per cm2

MB/MF ratios

HTML version of Basic Foils prepared June 1996

Foil 39 Silicon Area for a Teraflop

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 40 Parallelism

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Processors needed for Teraflop

MB/MF ratios

Processors per cm2

HTML version of Basic Foils prepared June 1996

Foil 41 Petaflop PIM System Size

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 42 Power Projections (Logic)

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

By 2010: Same logic =

* 1/50 to 1/100th the power

* 100X the performance per Watt

HTML version of Basic Foils prepared June 1996

Foil 43 Power Per Sq. Cm

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

1 MB/MF

We Must! Look at Lower Power Designs

HTML version of Basic Foils prepared June 1996

Foil 44 3D Stacking

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Goal: create 3D cubes of silicon

Process:

"Thin" die to 7 mils or smaller
"Glue" together
Plate wires on sides

Stacks of 70 or more have been demonstrated

Ideal for PIMs

Same chip type throughout
Most side wires: common or chip-chip
"Contact explosion" avoidable

Problems Today: >2 side wiring, Power

HTML version of Basic Foils prepared June 1996

Foil 45 Potential PIM Cube

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

HTML version of Basic Foils prepared June 1996

Foil 46 Potential PIM Cube

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Power, Power, Power!

HTML version of Basic Foils prepared June 1996

Foil 47 Further Work: Hardware

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Core "conventional" CPU selection:

More accurate area & performance estimates
Power projections

Optimal PIM ISA & Organization

Memory structures program visible
Embedded PIM MPP support

Optimized PIM memory macro

Scalable size
Processing "at the sense amps"

Selection of I/O Protocols

Integrated CAD support

HTML version of Basic Foils prepared June 1996

Foil 48 Further Work:Algorithm Development

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Find inherently low MB/MF algorithms

Express as "In the Memory" operations

Utilize huge degrees of parallelism

Utilize rich mix of parallel styles

MIMD parallelism
SIMD setup & synchronization
Concurrent conventional scalar overhead control

Utilize very large inter node bandwidths

HTML version of Basic Foils prepared June 1996

Foil 49 Further Work: Software Development

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Local node code for "In the Memory" operations

Expression, Partitioning, Allocation, Scheduling

Node run time kernal

Message passing, Virtual Shared Memory, Synchronization

Host control

Memory allocation, code initiation, synchronization

HTML version of Basic Foils prepared June 1996

Foil 50 Current ND PIM Work In Progress

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

Graduate Student Projects:

EXECUBE-Based PCMCIA Card
PIM Parallel Databases, Postscript

NASA: Future PIM design space for petaflops

NSF: Inherently low power ISAs

NEC: PIM-based Image database search

ARPA: PIM Foundries & DA Tools

NSF:Point Designs for 100 TFlops

NASA: Rad Hard PIM for Spacecraft

HTML version of Basic Foils prepared June 1996

Foil 51 Conclusion

From Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing PAWS 96 Mandalay Beach -- April 21-26 1996. *

Full HTML Index

PIM: Potential Breakthrough

Natural Evolution of Technology
Huge complexity reductions
Highly scalable

BUT: "Breaking the Rules & Changing the Game"

Need to revisit ALL OUR ASSUMPTIONS

To make PIM mainstream:

Simple integrated Logic/Memory CAD tools
Strong "Proof of Concept" Demos
"Really Massive" Parallel Development Tools
Reoptimization of everything: space & power

Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Apr 11 1999