HELP! * YELLOW=global GREY=local Full HTML for

GLOBAL foilset HPCC Current Status: Hardware MPP

Given by Geoffrey Fox at Trip to China on July 12-28,96. Foils prepared July 6 1996
Abstract * Foil Index for this file See also color IMAGE

We describe basic technology driver -- the CMOS Juggernaut -- and some new approaches that could be important 10-20 years from now
We describe from elementary point of view the basics of parallel(MPP) architectures
We discuss current situation for tightly coupled systems -- convergence to distributed shared memory
We discuss clusters of PC's/workstations -- MetaComputing

Table of Contents for full HTML of HPCC Current Status: Hardware MPP


1 Status of "Classic" HPCC -- June1996
Summary of MPP Hardware

2 Abstract of HPCC Hardware Status 1996
3 Some HPCC Hardware Architectures and Their Status - I
4 The Technology
Driving Forces for HPCC

5 Effect of Feature Size on Performance
6 Growing Logic Chip Density
7 Trends in Feature and Die Size as a Function of Time
8 Supercomputer Memory Sizes and trends in RAM Density
9 Comparison of Trends in RAM Density and CPU Performance Increases
10 National Roadmap for Semiconductor Technology --1992
11 CMOS Technology and Parallel Processor Chip Projections
12 Parallel Computer Architecture Issues
13 Granularity of Parallel Components
14 Types of Parallel Memory Architectures
-- Logical Structure

15 Types of Parallel Memory Architectures -- Physical Characteristics
16 Diagrams of Shared and Distributed Memories
17 Classes of Communication Network include ...
18 Examples of Interconnection Topologies
19 Latency and Bandwidth of a Network
20 Transfer Time in Microseconds for both Shared Memory Operations and Explicit Message Passing
21 Latency/Bandwidth Space for 0-byte message(Latency) and 1 MB message(bandwidth).
22 Some HPCC Hardware Architectures and Their Status - II
23 Shared versus Distributed Memory
24 Gordon Bell's SNAP Architecture - I
25 Gordon Bell's SNAP Architecture - II
26 Gordon Bell's SNAP Architecture - III
27 Mark Baker's Review of MetaComputing/Cluster Management Projects
28 Alternative Supercomputing Resources
29 Parallel/Distributed Computing - Communications Characteristics
30 Some Comments about Parallel and Distributed Computing
31 Communications Performance of Some Parallel and Distributed Systems
32 Distributed Systems: Some Problems

This table of Contents Abstract



HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 1 Status of "Classic" HPCC -- June1996
Summary of MPP Hardware

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
http://www.npac.syr.edu/users/gcf/hpcc96hardware/index.html
Presented during Trip to China July 12-28,1996
Geoffrey Fox
NPAC
Syracuse University
111 College Place
Syracuse NY 13244-4100

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 2 Abstract of HPCC Hardware Status 1996

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
We describe basic technology driver -- the CMOS Juggernaut -- and some new approaches that could be important 10-20 years from now
We describe from elementary point of view the basics of parallel(MPP) architectures
We discuss current situation for tightly coupled systems -- convergence to distributed shared memory
We discuss clusters of PC's/workstations -- MetaComputing

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 3 Some HPCC Hardware Architectures and Their Status - I

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
The future sees:
  • 1:Mainly the relentless drive of CMOS juggernaut with Moore's law!
  • 2:The growth of World Wide networked computers from settop boxes to MPP's
  • 3:Processor in Memory (PIM)
  • 4:Superconducting CPU's and interconnect (but no memory?)
  • 5:Optical Interconnects
  • 6:Quantum Computing (see Scientific American Oct 95 p140 by Seth Lloyd(MIT)
  • 7:DNA or molecular computing
We will discuss the first two here, the next three in "Petaflop futures" and I leave the last two to a future generation!

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 4 The Technology
Driving Forces for HPCC

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Secs 14 Full HTML Index

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 5 Effect of Feature Size on Performance

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Secs 63 Full HTML Index

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 6 Growing Logic Chip Density

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Secs 30 Full HTML Index

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 7 Trends in Feature and Die Size as a Function of Time

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Secs 25 Full HTML Index

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 8 Supercomputer Memory Sizes and trends in RAM Density

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Secs 57 Full HTML Index
RAM density increases by about a factor of 50 in 8 years
Supercomputers in 1992 have memory sizes around 32 gigabytes (giga = 109)
Supercomputers in year 2000 should have memory sizes around 1.5 terabytes (tera = 1012)

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 9 Comparison of Trends in RAM Density and CPU Performance Increases

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Secs 27 Full HTML Index
Computer Performance is increasing faster than RAM density

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 10 National Roadmap for Semiconductor Technology --1992

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Secs 56 Full HTML Index
See Chapter 5 of Petaflops Report -- July 95

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 11 CMOS Technology and Parallel Processor Chip Projections

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Secs 47 Full HTML Index
See Chapter 5 of Petaflops Report -- July 95

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 12 Parallel Computer Architecture Issues

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
Two critical Issues are:
Memory Structure
  • Distributed
  • Shared
  • Cached
and Heterogeneous mixtures
Control and synchronization
SIMD -lockstep synchronization
MIMD -synchronization can take several forms
  • Simplest: program controlled message passing
  • "Flags" in memory - typical shared memory construct
  • Special hardware - as in cache and its coherency (coordination between nodes)

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 13 Granularity of Parallel Components

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
Coarse-grain: Task is broken into a handful of pieces, each executed by powerful processors. Pieces, processors may be heterogeneous. Computation/ communication ratio very high -- Typical of Networked Metacomputing
Medium-grain: Tens to few thousands of pieces, typically executed by microprocessors. Processors typically run the same code.(SPMD Style) Computation/communication ration often hundreds or more. Typical of MIMD Parallel Systems such as SP2 CM5 Paragon T3D
Fine-grain: Thousands to perhaps millions of small pieces, executed by very small, simple processors (several per chip) or through pipelines. Processors typically have instructions broadcasted to them.Computation/ Communication ratio often near unity. Typical of SIMD but seen in a few MIMD systems such as Dally's J Machine or commercial Myrianet (Seitz)
Note that a machine of one type can be used on algorithms of the same or finer granularity

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 14 Types of Parallel Memory Architectures
-- Logical Structure

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
Shared (Global): There is a global memory space, accessible by all processors. Processors may also have some local memory. Algorithms may use global data structures efficiently. However "distributed memory" algorithms may still be important as memory is NUMA (Nonuniform access times)
Distributed (Local, Message-Passing): All memory is associated with processors. To retrieve information from another processors' memory a message must be sent there. Algorithms should use distributed data structures.

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 15 Types of Parallel Memory Architectures -- Physical Characteristics

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
Uniform: All processors take the same time to reach all memory locations.
Nonuniform (NUMA): Memory access is not uniform so that it takes a different time to get data by a given processor from each memory bank. This is natural for distributed memory machines but also true in most modern shared memory machines
  • DASH (Hennessey at Stanford) is best known example of such a virtual shared memory machine which is logicall shared but physically distributed.
  • ALEWIFE from MIT is a similar project
  • TERA (from Burton Smith) is Uniform memory access logically shared memory machine
Most NUMA machines these days have two memory access times
  • Local memory (divided in registers caches etc) and
  • Nonlocal memory with little or no difference in access time for different nonlocal memories

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 16 Diagrams of Shared and Distributed Memories

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Full HTML Index

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 17 Classes of Communication Network include ...

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
Classes of networks include:
Bus: All processors (and memory) connected to a common bus or busses.
  • Memory access fairly uniform, but not very scalable due to contention
  • Bus machines can be NUMA if memory consists of directly accessed local memory as well as memory banks accessed by BUS. The BUS accessed memories can be local memories on other processors
Switching Network: Processors (and memory) connected to routing switches like in telephone system.
  • Switches might have queues, "combining logic", which improve functionality but increase latency.
  • Switch settings may be determined by message headers or preset by controller.
  • Connections can be packet-switched (messages no longer than some fixed size) or circuit-switched (connection remains as long as needed)
  • Usually NUMA, blocking, often scalable and upgradable

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 18 Examples of Interconnection Topologies

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Full HTML Index
Two dimensional grid, Binary tree, complete interconnect and 4D Hypercube.
Communication (operating system) software ensures that systems appears fully connected even if physical connections partial

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 19 Latency and Bandwidth of a Network

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Full HTML Index
Transmission Time for message of n bytes:
T0 + T1 n where
T0 is latency containing a term proportional to number of hops. It also has a term representing interrupt processing time at beginning at and for communication network and processor to synchronize
T0 = TS + Td . Number of hops
T1 is the inverse bandwidth -- it can be made small if pipe is large size.
In practice TS and T1 are most important and Td is unimportant

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 20 Transfer Time in Microseconds for both Shared Memory Operations and Explicit Message Passing

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Full HTML Index
Dongarra and Dunigan: Message-Passing Performance of Various Computers, August 1995

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 21 Latency/Bandwidth Space for 0-byte message(Latency) and 1 MB message(bandwidth).

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Full HTML Index
Square blocks indicate shared memory copy performance
Dongarra and Dunigan: Message-Passing Performance of Various Computers, August 1995

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 22 Some HPCC Hardware Architectures and Their Status - II

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
Today we see the following "CMOS Juggernaut" Architectures
SIMD: No commercial or academic acceptance except for special purpose military (signal processing) and commercial(database indexing) applications
Special Purpose: Such as GRAPE N-body machine which achieves a Teraflop today and a petaflop in a few years -- requires small memory and small CPU's
MIMD Distributed Memory:
  • Merging with Shared memory in tightly coupled systems
  • Growing importance with World Wide Web and MetaComputing
Shared Memory
  • Tera Computer is isolated interesting attempt to build UMA system
  • SGI (based on Stanford Work and merging Cray and SGI lines) and Convex will base high end on distributed shared memory which merges distributed and shared memory
  • Physically distributed but logically NUMA shared memory

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 23 Shared versus Distributed Memory

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
Expected Architectures of Future will be:
  • Physically distributed but hardware support of shared memory for tightly coupled MPP's such as future IBM SP-X, Convex Exemplar, SGI (combined with Cray)
  • Physically distributed but without hardware support -- NOW's and COW's -- The World Wide Web as a Metacomputer
Essentially all problems run efficiently on a distributed memory BUT
Software is easier to develop on a shared memory machine
Some Shared Memory Issues:
  • Cost - Performance : additional hardware (functionality, network bandwidth) to support shared memory
  • Scaling. Can you build very big shared memory machines?
    • Yes for NUMA distributed shared memory
  • Compiler challenges for distributed shared memory are difficult and major focus of academic and commercial work
  • This is not practically important now as 32 node KSR-2 (from past) or SGI Power Challenge (cost ~< $2m) is already at high end of important commercial market

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 24 Gordon Bell's SNAP Architecture - I

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
Scalable Network (ATM) and Platforms (PC's running Windows 95)

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 25 Gordon Bell's SNAP Architecture - II

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
MetaComputing Built from PC's and ATM as commodity parts (COTS)

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 26 Gordon Bell's SNAP Architecture - III

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
The Computing World from Smart Card to Enterprise Server

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 27 Mark Baker's Review of MetaComputing/Cluster Management Projects

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Full HTML Index
see Original

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 28 Alternative Supercomputing Resources

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
Vast numbers of under utilised workstations available to use.
Huge numbers of unused processor cycles and resources that could be put to good use in a wide variety of applications areas.
Reluctance to buy Supercomputer due to their cost and short life span.
Distributed compute resources fit better into todays funding model.

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 29 Parallel/Distributed Computing - Communications Characteristics

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
Parallel Computing
Communication - high bandwidth and low latency.
Low flexibility in messages (point-to-point).
Distributed Computing
Communication can be high or low bandwidth.
Latency typically high -- can be very flexible messages involving fault tolerance, sophisticated routing, etc.

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 30 Some Comments about Parallel and Distributed Computing

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * Critical Information in IMAGE
Full HTML Index
Why use Distributed Computing Techniques ?
Expense of buying, maintaining and using traditional MPP systems.
Rapid increase in commodity processor performance.
Commodity networking technology (ATM/FCS/SCI) of greater than 200 Mbps at present with expected Gbps performance in the very near future.
The pervasive nature of workstations in academia and industry.
Price/Performance of using existing hardware/software.

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 31 Communications Performance of Some Parallel and Distributed Systems

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
Comms1 - From ParkBench Suite

HELP! * YELLOW=global GREY=local HTML version of GLOBAL Foils prepared July 6 1996

Foil 32 Distributed Systems: Some Problems

From HPCC Current Status: Hardware MPP Trip to China -- July 12-28,96. * See also color IMAGE
Full HTML Index
High Initial and Maintenance Costs
  • - Cost of distributed software - Systems support staff - Technical expertise
Applications Development
  • - Relatively immature technology with few standards- Immature software development tools - Applications difficult and time consuming to develop - Difficult to tune and optimise across all platforms- Load balancing

Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Wed Feb 19 1997