This was part of a set of PAWS 96(Mandalay Beach) Presentations |
Kogge and Collaboraters describe PIM as an emerging architecture where logic and memory combined on same chip which increases memory bandwidth naturally |
Conventional Architectures tend to waste transistors measured in terms silicon used per unit operation |
Both Existing designs and projections to PetaFlop timescale(2007) are given |
001 Processing-In-Memory (PIM) Architectures for Very High Performance MPP Computing 002 Acknowledgements 003 Memory & CPU Bandwidth Gap 004 Key Points 005 This Talk: A Better Way! 006 Observations on This Talk 007 HPCC & TeraFlops 008 Petaflop Chain of Events 009 Results from Pasadena `94 010 PetaFlops Applications 011 Pasadena Architectures 012 Bodega Bay: Primary Memory 013 Bodega Bay: Secondary Memory 014 Bodega Bay: Aggregate I/O 015 Cost Considerations: Processors 016 Cost Considerations: Memory 017 The "Hidden Costs" of Modern Systems 018 The Overlooked Bandwidth 019 Modern "Alternative" RAMs 020 Processing In Memory (PIM): Reclaiming the Bandwidth 021 PIM: Optimizing the System 022 Market Demand for Dense Processing 023 Current PIM Chips 024 Key Problem: Memory Density 025 Vendors with Known DRAM PIM Capability 026 EXECUBE: The First High Density PIM 027 Execube Processing Node 028 Tiling of Execube Processing Nodes 029 Lessons Learned from EXECUBE 030 New "Strawman" PIM Processing Node Macro 031 "Strawman" Chip Floorplan 032 Strawman Chip Interfaces 033 Strawman PIM Chip with I/O Macros 034 Strawman Properties 035 Strawman PIM "Memory Card" 036 Choosing the Processing Macro 037 Performance Per Transistor 038 SIA-Based PIM Chip Projections 039 Silicon Area for a Teraflop 040 Parallelism 041 Petaflop PIM System Size 042 Power Projections (Logic) 043 Power Per Sq. Cm 044 3D Stacking 045 Potential PIM Cube 046 Potential PIM Cube 047 Further Work: Hardware 048 Further Work:Algorithm Development 049 Further Work: Software Development 050 Current ND PIM Work In Progress 051 Conclusion