Given by Geoffrey C. Fox at CPS615 Basic Simulation Track for Computational Science on Fall Semester 96. Foils prepared 27 August 1996
Outside Index
Summary of Material
Secs 80
Overview of Course Itself! -- and then introductory material on basic curricula |
Overview of National Program -- The Grand Challenges |
Overview of Technology Trends leading to petaflop performance in year 2007 (hopefully) |
Overview of Syracuse and National programs in computational science |
Parallel Computing in Society |
Why Parallel Computing works |
Simple Overview of Computer Architectures
|
General Discussion of Message Passing and Data Parallel Programming Paradigms and a comparison of languages |
Outside Index
Summary of Material
Geoffrey Fox |
NPAC |
Room 3-131 CST |
111 College Place |
Syracuse NY 13244-4100 |
Overview of Course Itself! -- and then introductory material on basic curricula |
Overview of National Program -- The Grand Challenges |
Overview of Technology Trends leading to petaflop performance in year 2007 (hopefully) |
Overview of Syracuse and National programs in computational science |
Parallel Computing in Society |
Why Parallel Computing works |
Simple Overview of Computer Architectures
|
General Discussion of Message Passing and Data Parallel Programming Paradigms and a comparison of languages |
Instructor: Geoffrey Fox gcf@npac.syr.edu 3154432163 Room 3-131 CST |
Backup: Nancy McCracken njm@npac.syr.edu 3154434687 Room 3-234 CST |
NPAC Administrative support: Nora Downey-Easter nora@npac.syr.edu 3154431722 Room 3-206 CST |
CPS615 Powers that be above can be reached at cps615ad@npac.syr.edu |
CPS615 Students can be reached by mailing cps615@npac.syr.edu |
Homepage will be: |
http://www.npac.syr.edu/projects/cps615fall96 |
See my paper SCCS 736 as an overview of HPCC status |
Graded on the basis of Approximately 8 Homeworks which will be due Thursday of week following day given out (Tuesday or Thursday) |
Plus one modest sized project at the end of class -- must involve "real" running parallel code! |
No finals or written exams |
All material will be placed on World Wide Web(WWW) |
Preference given to work returned on the Web |
Overview of National Scene -- Why is High Performance Computing Important
|
What is Computational Science -- The Program at Syracuse |
Basic Technology Situation -- Increasing density of transistors on a chip
|
Elementary Discussion of Parallel Computing including use in society
|
Computer Architecture -- Parallel and Sequential
|
Simple base example -- Laplace's Equation
|
This is followed by two sections -- software technologies and applications which are interspersed with each other and "algorithm" modules |
Programming Models -- Message Passing and Data Parallel Computing -- MPI and HPF (Fortran 90)
|
Some real applications analysed in detail
|
This introduction is followed by a set of "vignettes" discussing problem classes which illustrate parallel programming and parallel algorithms |
Ordinary Differential Equations
|
Numerical Integration including adaptive methods |
Floating Point Arithmetic |
Monte Carlo Methods including Random Numbers |
Full Matrix Algebra as in
|
Partial Differential Equations implemented as sparse matrix problems (as in Computational Fluid Dynamics)
|
For "Convential" MPP/Distributed Shared Memory Architecture |
Now(1996) Peak is 0.1 to 0.2 Teraflops in Production Centers
|
In 1999, one will see production 1 Teraflop systems |
In 2003, one will see production 10 Teraflop Systems |
In 2007, one will see production 50-100 Teraflop Systems |
Memory is Roughly 0.25 to 1 Terabyte per 1 Teraflop |
If you are lucky/work hard: Realized performance is 30% of Peak |
RAM density increases by about a factor of 50 in 8 years |
Supercomputers in 1992 have memory sizes around 32 gigabytes (giga = 109) |
Supercomputers in year 2000 should have memory sizes around 1.5 terabytes (tera = 1012) |
Computer Performance is increasing faster than RAM density |
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994 |
L=Logic, D=DRAM, A=ASIC, mP = microprocessor |
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994 |
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994 |
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994 |
See Chapter 5 of Petaflops Report -- July 95 |
Parallel Computing Works! |
Technology well understood for Science and Engineering
|
Supercomputing market small (few percent at best) and probably decreasing in size
|
Data Parallelism - universal form of scaling parallelism |
Functional Parallelism - Important but typically modest speedup. - Critical in multidisciplinary applications. |
On any machine architecture
|
Performance of both communication networks and computers will increase by a factor of 1000 during the 1990's
|
Competitive advantage to industries that can use either or both High Performance Computers and Communication Networks. (United States clearly ahead of Japan and Europe in these technologies.) |
No silver programming bullet -- I doubt if new language will revolutionize parallel programmimng and make much easier
|
Social forces are tending to hinder adoption of parallel computing as most applications are areas where large scale computing already common
|
Switch from conventional to new types of technology is a phase transition |
Needs headroom (Carver Mead) which is large (factor of 10 ?) due to large new software investment |
Machines such as the nCUBE-1 and CM-2 were comparable in cost performance to conventional supercomputers
|
Cray T3D, Intel Paragon, CM-5, DECmpp (Maspar MP-2), IBM SP-2, nCUBE-3 have enough headroom to take over from traditional computers ? |
ATM networks have rapidly transitioned from research Gigabit networks to commercial deployment
|
Computer Hardware trends imply that all computers (PC's ---> Supercomputers) will be parallel by the year 2000
|
Software is challenge and could prevent/delay hardware trend that suggests parallelism will be a mainline computer architecture
|
High Energy Physics |
Semiconductor Industry, VLSI Design |
Graphics and Virtual Reality |
Weather and Ocean Modeling |
Visualization |
Oil Industry |
Automobile Industry |
Chemicals and Pharmaceuticals Industry |
Financial Applications |
Business Applications |
Airline Industry |
Originally $2.9 billion over 5 years starting in 1992 and
|
The Grand Challenges
|
Nearly all grand challenges have industrial payoff but technology transfer NOT funded by HPCCI |
High Performance Computing Act of 1991 |
Computational performance of one trillion operations per second on a wide range of important applications |
Development of associated system software, tools, and improved algorithms |
A national research network capable of one billion bits per second |
Sufficient production of PhDs in computational science and engineering |
1992: Grand Challenges |
1993: Grand Challenges |
1994: Toward a National Information Infrastructure |
1995: Technology for the National Information Infrastructure |
1996: Foundation for America's Information Future |
ATM ISDN Wireless Satellite advancing rapidly in commercial arena which is adopting research rapidly |
Social forces (deregulation in the U.S.A.) are tending to accelerate adoption of digital communication technologies
|
Not clear how to make money on Web(Internet) but growing interest/acceptance by general public
|
Integration of Communities and Opportunities
|
Technology Opportunities in Integration of High Performance Computing and Communication Systems
|
New Business opportunities linking Enterprise Information Systems to Community networks to current cable/network TV journalism |
New educational needs at interface of computer science and communications/information applications |
Major implications for education -- the Virtual University |
Different machines |
New types of computers |
New libraries |
Rewritten Applications |
Totally new fields able to use computers .... ==> Need new educational initiatives Computational Science |
Will be a nucleus for the phase transition |
and accelerate use of parallel computers in the real world |
Computational Science is an interdisciplinary field that integrates computer science and applied mathematics with a wide variety of application areas that use significant computation to solve their problems |
Includes the study of computational techniques
|
Includes the study of new algorithms, languages and models in computer science and applied mathematics required by the use of high performance computing and communications in any (?) important application
|
Includes computation of complex systems using physical analogies such as neural networks and genetic optimization. |
Formal Master's Program with reasonable curriculum and course material |
PhD called Computer and Information Science but can choose computational science research |
Certificates(Minors) in Computational Science at both the Masters and PhD Level |
Undergraduate Minors in Computational Science |
All Programs are open to both computer science and application (computer user) students |
Currently have both an "Science and Engineering Track" ("parallel computing") and an "Information oriented Track" ("the web") |
Conclusions of DOE Conference on Computational Science Education, Feb 1994 |
Industry and government laboratories want graduates with Computational Science and Engineering training - don't care what degree is called |
Universities - want graduates with Computational Science and Engineering training - want degrees to have traditional names |
Premature to have BS Computational Science and Engineering |
Master's Degree in Computational Science Course Requirements: |
Core Courses:
|
Application Area:
|
It is required to take one course in 3 out of the following 4 areas:
|
Minors in Computational Science |
Masters Level Certificate:
|
Doctoral level Certificate:
|
Doctoral level Certificate in Computational Neuroscience:
|
Example Course Module
|
CPS 713 Case Studies in Computational Science |
This course emphasizes a few applications and gives an in-depth treatment of the more advanced computing techniques, aiming for a level of sophistication representing the best techniques currently known by researchers in the field.
|
Instructor: Professor Geoffrey Fox, Computer Science and Physics |
Computer Science -- Nationally viewed as central activity
|
Computer Engineering -- Historically Mathematics and Electrical Engineering have spawned Computer Science programs -- if from electrical engineering, the field is sometimes called computer engineering |
Applied Mathematics is a very broad field in U.K. where equivalent to Theoretical Physics. In USA applied mathematics is roughly mathematics associated with fluid flow
|
Computational Physics -- Practioners will be judged by their contribtion to physics and not directly by algorithms and software innovations.
|
The fundamental principles behind the use of concurrent computers are identical to those used in society - in fact they are partly why society exists. |
If a problem is too large for one person, one does not hire a SUPERman, but rather puts together a team of ordinary people... |
cf. Construction of Hadrians Wall |
Domain Decomposition is Key to Parallelism |
Need "Large" Subdomains l >> l overlap |
AMDAHL"s LAW or |
Too many cooks spoil the broth |
Says that |
Speedup S is small if efficiency e small |
or for Hadrian's wall |
equivalently S is small if length l small |
But this is irrelevant as we do not need parallel processing unless problem big! |
"Pipelining" or decomposition by horizontal section is:
|
Hadrian's Wall is one dimensional |
Humans represent a flexible processor node that can be arranged in different ways for different problems |
The lesson for computing is: |
Original MIMD machines used a hypercube topology. The hypercube includes several topologies including all meshes. It is a flexible concurrent computer that can tackle a broad range of problems. Current machines use different interconnect structure from hypercube but preserve this capability. |
Comparing Computer and Hadrian's Wall Cases |
At the finest resolution, collection of neurons sending and receiving messages by axons and dendrites |
At a coarser resolution |
Society is a collection of brains sending and receiving messages by sight and sound |
Ant Hill is a collection of ants (smaller brains) sending and receiving messages by chemical signals |
Lesson: All Nature's Computers Use Message Passing |
With several different Architectures |
Problems are large - use domain decomposition Overheads are edge effects |
Topology of processor matches that of domain - processor with rich flexible node/topology matches most domains |
Regular homogeneous problems easiest but |
irregular or |
Inhomogeneous |
Can use local and global parallelism |
Can handle concurrent calculation and I/O |
Nature always uses message passing as in parallel computers (at lowest level) |
Data Parallelism - universal form of scaling parallelism |
Functional Parallelism - Important but typically modest speedup. - Critical in multidisciplinary applications. |
On any machine architecture
|
Simple, but general and extensible to many more nodes is domain decomposition |
All successful concurrent machines with
|
Have obtained parallelism from "Data Parallelism" or "Domain Decomposition" |
Problem is an algorithm applied to data set
|
The three architectures considered here differ as follows:
|
2 Different types of Mappings in Physical Spaces |
Both are static
|
Different types of Mappings -- A very dynamic case without any underlying Physical Space |
c)Computer Chess with dynamic game tree decomposed onto 4 nodes |
And the corresponding poor workload balance |
And excellent workload balance |
The case of Programming a Hypercube |
Each node runs software that is similar to sequential code |
e.g., FORTRAN with geometry and boundary value sections changed |
Geometry irregular but each brick takes about the same amount of time to lay. |
Decomposition of wall for an irregular geometry involves equalizing number of bricks per mason, not length of wall per mason. |
Fundamental entities (bricks, gargoyles) are of different complexity |
Best decomposition dynamic |
Inhomogeneous problems run on concurrent computers but require dynamic assignment of work to nodes and strategies to optimize this |
(we use neural networks, simulated annealing, spectral bisection etc.) |
Global Parallelism
|
Local Parallelism
|
Local and Global Parallelism |
Should both be Exploited |
Disk (input/output) Technology is better matched to several modest power processors than to a single sequential supercomputer |
Concurrent Computers natural in databases, transaction analysis |
Each node is CPU and 6 memory chips -- CPU Chip integrates communication channels with floating, integer and logical CPU functions |
We can choose technology and architecture separately in designing our high performance system |
Technology is like choosing ants people or tanks as basic units in our society analogy
|
In HPCC arena, we can distinguish current technologies
|
Near term technology choices include
|
Further term technology choices include
|
It will cost $40 Billion for next industry investment in CMOS plants and this huge investment makes it hard for new technologies to "break in" |
Architecture is equivalent to organization or design in society analogy
|
We can distinguish formal and informal parallel computers |
Informal parallel computers are typically "metacomputers"
|
Metacomputers are a very important trend which uses similar software and algorithms to conventional "MPP's" but have typically less optimized parameters
|
Formal high performance computers are the classic (basic) object of study and are |
"closely coupled" specially designed collections of compute nodes which have (in principle) been carefully optimized and balanced in the areas of
|
In society, we see a rich set of technologies and architectures
|
With several different communication mechanisms with different trade-offs
|
Quantum-Mechanical Computers by Seth Lloyd, Scientific American, Oct 95 |
Chapter 6 of The Feynman Lectures on Computation edited by Tony Hey and Robin Allen, Addison-Wesley, 1996 |
Quantum Computing: Dream or Nightmare? Haroche and Raimond, Physics Today, August 96 page 51 |
Basically any physical system can "compute" as one "just" needs a system that gives answers that depend on inputs and all physical systems have this property |
Thus one can build "superconducting" "DNA" or "Quantum" computers exploiting respectively superconducting molecular or quantum mechanical rules |
For a "new technology" computer to be useful, one needs to be able to
|
Conventional computers are built around bit ( taking values 0 or 1) manipulation |
One can build arbitarily complex arithmetic if have some way of implementing NOT and AND |
Quantum Systems naturally represent bits
|
Interactions between quantum systems can cause "spin-flips" or state transitions and so implement arithmetic |
Incident photons can "read" state of system and so give I/O capabilities |
Quantum "bits" called qubits have another property as one has not only
|
Lloyd describes how such coherent states provide new types of computing capabilities
|
Superconductors produce wonderful "wires" which transmit picosecond (10^-12 seconds) pulses at near speed of light
|
Niobium used in constructing such superconducting circuits can be processed by similar fabrication techniques to CMOS |
Josephson Junctions allow picosecond performance switches |
BUT IBM (!969-1983) and Japan (MITI 1981-90) terminated major efforts in this area |
New ideas have resurrected this concept using RSFQ -- Rapid Single Flux Quantum -- approach |
This naturally gives a bit which is 0 or 1 (or in fact n units!) |
This gives interesting circuits of similar structure to CMOS systems but with a clock speed of order 100-300GHz -- factor of 100 better than CMOS which will asymptote at around 1 GHz (= one nanosecond cycle time) |
At least two major problems: |
Semiconductor industry will invest some some $40B in CMOS "plants" and infrastructure
|
Cannot build memory to match CPU speed and current designs have superconducting CPU's (with perhaps 256 Kbytes superconducting memory per processor) but conventional CMOS memory
|
Superconducting technology also has a bad "name" due to IBM termination! |
Sequential or von Neuman Architecture |
Vector (Super)computers |
Parallel Computers
|
Instructions and data are stored in the same memory for which there is a single link (the von Neumann Bottleneck) to the CPU which decodes and executues instructions |
The CPU can have multiple functional units |
The memory access can be enhanced by use of caches made from faster memory to allow greater bandwidth and lower latency |
Fig 1.14 of Aspects of Computational Science |
Editor Aad van der Steen |
published by NCF |
This design enhances performance by noting that many applications calculate "vector-like" operations
|
This allows one to address two performance problems
|
They are typified by Cray 1, XMP, YMP, C-90, CDC-205, ETA-10 and Japaneses Supercomputers from NEC Fujitsu and Hitachi |
A pipeline for vector addition looks like:
|
Vector machines pipeline data through the CPU |
They are not so popular/relevant as in the past as
|
In fact excellence of say, Cray C-90 is due to its very good memory architecture allowing one to get enough operands to sustain pipeline. |
Most workstation class machines have "good" CPU's but can never get enough data from memory to sustain good performance except for a few cache intensive applications |
Three Instructions are shown overlapped -- each starting one clock cycle after last |
Very high speed computing systems,Proc of IEEE 54,12,p1901-1909(1966) and |
Some Computer Organizations and their Effectiveness, IEEE Trans. on Comp. C-21,948-960(1972) -- both papers by M.J. Flynn |
SISD -- Single Instruction stream, Single Data Stream -- i.e. von Neumann Architecture |
MISD -- Multiple Instruction stream, Single Data Stream -- Not interesting |
SIMD -- Single Instruction stream, Multiple Data Stream |
MIMD -- Multiple Instruction stream and Multiple Data Stream -- dominant parallel system with ~one to ~one match of instruction and data streams. |
Memory Structure of Parallel Machines
|
and Heterogeneous mixtures |
Shared (Global): There is a global memory space, accessible by all processors.
|
Distributed (Local, Message-Passing): All memory is associated with processors.
|
Memory can be accessed directly (analogous to a phone call) as in red lines below or indirectly by message passing (green line below) |
We show two processors in a MIMD machine for distributed (left) or shared(right) memory architectures |
Uniform: All processors take the same time to reach all memory locations. |
Nonuniform (NUMA): Memory access is not uniform so that it takes a different time to get data by a given processor from each memory bank. This is natural for distributed memory machines but also true in most modern shared memory machines
|
Most NUMA machines these days have two memory access times
|
This simple two level memory access model gets more complicated in proposed 10 year out "petaflop" designs |
SIMD -lockstep synchronization
|
MIMD - Each Processor executes independent instruction streams |
MIMD Synchronization can take several forms
|
MIMD Distributed Memory
|
MIMD with logically shared memory but usually physically distributed. The latter is sometimes called distributed shared memory.
|
A special case of this is a network of workstations (NOW's) or personal computers (metacomputer) |
Issues include:
|
SIMD -- Single Instruction Multiple Data -- can have logically distributed or shared memory
|
CM2 - 64 K processors with 1 bit arithmetic - hypercube network, broadcast network can also combine , "global or" network |
Maspar, DECmpp - 16 K processors with 4 bit (MP-1), 32 bit (MP-2) arithmetic, fast two-dimensional mesh and slower general switch for communication |
Also have heterogeneous compound architecture (metacomputer) gotten by arbitrary combination of MIMD or SIMD, Sequential or Parallel machines. |
Metacomputers can vary from full collections of several hundred PC's/Settop boxes on the (future) World Wide Web to a CRAY C-90 connected to a CRAY T3D |
This is a critical future architecture which is intrinsically distributed memory as multi-vendor heterogenity implies that one cannot have special hardware enhanced shared memory
|
Cluster of workstations or PC's |
Heterogeneous MetaComputer System |
One example is an Associative memory - SIMD or MIMD or content addressable memories |
This is an an example of a special purpose "signal" processing machine which can in fact be built from "conventional" SIMD or "MIMD" architectures |
This type of machine is not so popular as most applications are not dominated by computations for which good special purpose devices can be designed |
If only 10% of a problem is say "track-finding" or some special purpose processing, then who cares if you reduce that 10% by a factor of 100
|
N body problems (e.g. Newton's laws for one million stars in a globular cluster) can have succesful special purpose devices |
See GRAPE (GRAvity PipE) machine (Sugimoto et al. Nature 345 page 90,1990)
|
Note GRAPE uses EXACTLY same parallel algorithm that one finds in the books (e.g. Solving Problems on Concurrent Processors) for N-body problems on classic distributed memory MIMD machines |
GRAPE will execute the classic O(N^2) (parallel) N body algorithm BUT this is not the algorithm used in most such computations |
Rather there is the O(N) or O(N)logN so called "fast-multipole" algorithm which uses hierarchical approach
|
So special purpose devices cannot usually take advantage of new nifty algorithms! |
Coarse-grain: Task is broken into a handful of pieces, each executed by powerful processors.
|
Medium-grain: Tens to few thousands of pieces, typically executed by microprocessors.
|
Fine-grain: Thousands to perhaps millions of small pieces, executed by very small, simple processors (several per chip) or through pipelines.
|
Note that a machine of given granularity can be used on algorithms of the same or finer granularity |
The last major architectural feature of a parallel machine is the network or design of hardware/software connecting processors and memories together. |
Bus: All processors (and memory) connected to a common bus or busses.
|
Switching Network: Processors (and memory) connected to routing switches like in telephone system.
|
Switch |
Bus |
Two dimensional grid, Binary tree, complete interconnect and 4D Hypercube. |
Communication (operating system) software ensures that systems appears fully connected even if physical connections partial |
Useful terms include: |
Scalability: Can network be extended to very large systems? Related to wire length (synchronization and driving problems), degree (pinout) |
Fault Tolerance: How easily can system bypass faulty processor, memory, switch, or link? How much of system is lost by fault? |
Blocking: Some communication requests may not get through, due to conflicts caused by other requests. |
Nonblocking: All communication requests succeed. Sometimes just applies as long as no two requests are for same memory cell or processor. |
Latency (delay): Maximal time for nonblocked request to be transmitted. |
Bandwidth: Maximal total rate (MB/sec) of system communication, or subsystem-to-subsystem communication. Sometimes determined by cutsets, which cut all communication between subsystems. Often useful in providing lower bounds on time needed for task. |
Worm Hole Routing -- Intermediate switch nodes do not wait for full message but allow it to pass throuch in small packets |
From Aspects of Computational Science, Editor Aad van der Steen, published by NCF |
System Communication Speed Computation Speed
|
IBM SP2 40 267 |
Intel iPSC860 2.8 60 |
Intel Paragon 200 75 |
Kendall Square |
KSR-1 17.1 40 |
Meiko CS-2 100 200 |
Parsytec GC 20 25 |
TMC CM-5 20 128 |
Cray T3D 150 300 |
tcomm = 4 or 8 /Speed in Mbytes sec
|
tfloat = 1/Speed in Mflops per sec |
Thus tcomm / tfloat is just 4 X Computation Speed divided by Communication speed |
tcomm / tfloat is 26.7, 85, 1.5, 9.35, 8, 5, 25.6, 8 for the machines SP2, iPSC860, Paragon, KSR-1, Meiko CS2, Parsytec GC, TMC CM5, and Cray T3D respectively |
Latency makes situation worse for small messages and double for 64bit arithmetic natural on large problems! |
Transmission Time for message of n bytes: |
T0 + T1 n where |
T0 is latency containing a term proportional to number of hops. It also has a term representing interrupt processing time at beginning at and for communication network and processor to synchronize |
T0 = TS + Td . Number of hops |
T1 is the inverse bandwidth -- it can be made small if pipe is large size. |
In practice TS and T1 are most important and Td is unimportant |
Dongarra and Dunigan: Message-Passing Performance of Various Computers, August 1995 |
Square blocks indicate shared memory copy performance |
Dongarra and Dunigan: Message-Passing Performance of Various Computers, August 1995 |
Executive Summary |
I. Introduction |
II. Program Accomplishments and Plan |
1. High Performance Communications
|
2. High Performance Computing Systems
|
3. Advanced Software Technologies
|
4. Technologies for the Information Infrastructure
|
5. High Performance Computing Research Facilities
|
6. Grand Challenge Applications
|
7. National Challenge Applications - Digital Libraries
|
8. Basic Research and Human Resources
|
III. HPCC Program Organization |
IV. HPCC Program Summary |
V. References |
VI. Glossary |
VII. Contacts |
NSF Supercomputing Centers |
NSF Science and Technology Centers |
NASA Testbeds |
DOE Laboratories |
NIH Systems |
NOAA Laboratories |
EPA Systems |
Applied Fluid Dynamics |
Meso- to Macro-Scale Environmental Modeling |
Ecosystem Simulations |
Biomedical Imaging and Biomechanics |
Molecular Biology |
Molecular design and Process Optimization |
Cognition |
Fundamental Computational sciences |
Grand-Challenge-Scale Applications |
Computational Aeroscience |
Coupled Field Problems and GAFD (Geophysical and Astrophysical Fluid Dynamics) Turbulence |
Combustion Modeling: Adaptive Grid Methods |
Oil Reservoir Modeling: Parallel Algorithms for Modeling Flow in Permeable Media |
Numerical Tokamak Project (NTP) |
An image from a video illustrating the flutter analysis of a FALCON jet under a sequence of transonic speed maneuvers. Areas of high stress are red; areas of low stress are blue. |
Particle trajectories and electrostatic potentials from a three- dimensional implicit tokamak plasma simulation employing adaptive mesh techniques. The boundary is aligned with the magnetic field that shears around the torus. The strip in the torus is aligned with the local magnetic field and is color mapped with the local electrostatic potential. The yellow trajectory is the gyrating orbit of a single ion. |
Massively Parallel Atmospheric Modeling Projects |
Parallel Ocean Modeling |
Mathematical Modeling of Air Pollution Dynamics |
A Distributed Computational System for Large Scale Environmental Modeling |
Cross-Media (Air and Water) Linkage |
Adaptive Coordination of Predictive Models with Experimental Data |
Global Climate Modeling |
Four-Dimensional Data Assimilation for Massive Earth System Data Analysis |
Ozone concentrations for the California South Coast Air Basin predicted by the Caltech research model show a large region in which the national ozone standard of 120 parts per billion (ppb) are exceeded. Measurement data corroborate these predictions. Scientific studies have shown that human exposure to ozone concentrations at or above the standard can impair lung functions in people with respiratory problems and can cause chest pain and shortness of breath even in the healthy population. This problem raises concern since more than 30 urban areas across the country still do not meet the national standard. |
The colored plane floating above the block represents the simulated atmospheric temperature change at the earth's surface, assuming a steady one percent per year increase in atmospheric carbon dioxide to the time of doubled carbon dioxide. The surfaces in the ocean show the depths of the 1.0 and 0.2 degree (Celsius) temperature changes. The Southern Hemisphere shows much less surface warming than the Northern Hemisphere. This is caused primarily by the cooling effects of deep vertical mixing in the oceans south of 45 degrees South latitude. Coupled ocean-atmosphere climate models such as this one from NOAA/GFDL help improve scientific understanding of potential climate change. |
A scientist uses NASA's virtual reality modeling resources to explore the Earth's atmosphere as part of the Earth and Space Science Grand Challenge. |
Environmental Chemistry |
Groundwater Transport and Remediation |
Earthquake Ground Motion Modeling in Large Basins: The Quake Project |
High Performance Computing for Land Cover Dynamics |
Massively Parallel Simulations of Large-Scale, High- Resolution Ecosystme Models |
Visible Human Project |
Reconstruction of Positron Emission Tomography (PET) Images |
Image Processing of Electron Micrographs |
Understanding Human Joint Mechanisms |
Protein and Nucleic Sequence Analysis |
Protein Folding Prediction |
Ribonucleic Acid (RNA) Structure Predition |
Biological Applications of Quantum Chemistry |
Biomolecular Design |
Biomolecular Modeling and Structure Determination |
Computational Structural Biology |
Biological Methods for Enzyme Catalysis |
A portion of the Glucocorticoid Receptor bound to DNA; the receptor helps to regulate expression of the genetic code. |
Quantum Chromodynamics |
High Capacity Atomic-Level Simulations for the Design of Materials |
First Principals Simulation of Materials Properties |
Black Hole Binaries: Coalescence and Gravitational Radiation |
Scalable Hierarchical Particle Algorithms for Galzy Formation and Accretion Astrophysics |
Radio Synthesis Imaging |
Large Scale Structure and Galaxy Formation |
The Alliance will produce an accurate, efficient description of the coalescence of black holes, and gravitational radiation emitted, by solving computationally EinsteinÕs equations for gravitational fields with direct application to the gravity-wave detection systems LIGO and VIRGO under construction in USA and Europe. |
Austin- Chapel Hill- Cornell- NCSA- Northwestern- Penn State- Pittsburgh- NPAC has Formal Goals |
To develop a problem solving environment for the Nonlinear Einstein's equations describing General Relativity, including a dynamical adaptive multilevel parallel infrastructure |
To provide controllable convergent algorithms to compute gravitational waveforms which arise from Black Hole encounters, and which are relevant to astrophysical events and may be used to predict signals which for detection by future ground-, and space-, based detectors.
|
To provide representative examples of computational waveforms. |
http://www.npac.syr.edu/projects/bbh/bbh.html |
Problem size: Analysis with Uniform Grid
|
Solution: Adaptive Mesh Refinement
|
Einstein's equations can be represented as a coupled system of hyperbolic and elliptic PDEs with non-trivial boundary conditions to be solved using adaptive multilevel methods |
We are building PSE that will support:
|
To implement the system we use technologies developed by CRPC, in particular MPI and HPF, combined with emerging new Web technologies: JAVA and VRML 2.0. |
Simulation of gravitational clustering of dark matter. This detail shows one sixth of the volume computed in a cosmological simulation involving 16 million highly clustered particles that required load balancing on a massively parallel computing system. Many particles are required to resolve the formation of individual galaxy halos seen here as red/white spots. |
Simulation of Chorismate Mutase |
Simulation of Antibody-Antigen Association |
A Realistic Ocean Model |
Drag Control |
The Impact of Turbulence on Weather/Climate Prediction |
Shoemaker-Levy 9 Collision with Jupiter |
Vortex structure and Dynamics in Superconductors |
Molecular Dynamics Modeling |
Crash Simulation |
Advanced Simulation of Chemically Reacting Flows |
Simulation of circulation in the North Atlantic. Color shows temperature, red corresponding to high temperature. In most prior modeling, the Gulf Stream turns left past Cape Hatteras, clinging to the continental shoreline. In this simulation, however, the Gulf Stream veers off from Cape Hatteras on a northeast course into the open Atlantic, following essentially the correct course. |
Impact of the comet fragment. Image height corresponds to 1,000 kilometers. Color represents temperature, ranging from tens of thousands of degrees Kelvin (red), several times the temperature of the sun, to hundreds of degrees Kelvin (blue). |
Illustrative of the computing power at the Center for Computational Science is the 50 percent offset crash of two Ford Taurus cars moving at 35 mph shown here. The Taurus model is detailed; the results are useful in understanding crash dynamics and their consequences. These results were obtained using parallel DYNA-3D software developed at Oak Ridge. Run times of less than one hour on the most powerful machine are expected. |
Digital Libraries |
Public Access to Government Information |
Electronic Commerce |
Civil Infrastructure |
Education and Lifelong Learning |
Energy Management |
Environmental Monitoring |
Health Care |
Maunfacturing Processes and Products |
Define information generally to include both CNN headline news and the insights on QCD gotten from lattice gauge theories |
Information Production e.g. Simulation
|
Information Analysis e.g. Extraction of location of oil from seismic data, Extraction of customer preferences from purchase data
|
Information Access and Dissemination - InfoVision e.g. Transaction Processing, Video-On-Demand
|
Information Integration .
|
1:Computational Fluid Dynamics |
2:Structural Dynamics |
3:Electromagnetic Simulation |
4:Scheduling |
5:Environmental Modelling (with PDE's) |
6:Environmental Phenomenology |
7:Basic Chemistry |
8:Molecular Dynamics |
9:Economic Modelling |
10:Network Simulations |
11:Particle Transport Problems |
12: Graphics |
13:Integrated Complex Systems Simulations |
14:Seismic and Environmental Data Analysis |
15:Image Processing |
16:Statistical Analysis |
17:Healthcare Fraud |
18:Market Segmentation |
Growing Area of Importance and reasonable near term MPP opportunity in decision support combined with parallel (relational) databases |
19:Transaction Processing |
20:Collaboration Support |
21:Text on Demand |
22:Video on Demand |
23:Imagery on Demand |
24:Simulation on Demand (education,financial modelling etc.) -- simulation is a "media"! |
MPP's as High Performance Multimedia (database) servers -- WebServers |
Excellent Medium term Opportunity for MPP enabled by National Information Infrastructure |
25:Military and Civilian Command and Control(Crisis Management) |
26:Decision Support for Society (Community Servers) |
27:Business Decision Support |
28:Public Administration and Political Decision(Judgement) Support |
29:Real-Time Control Systems |
30:Electronic Banking |
31:Electronic Shopping |
32:(Agile) Manufacturing including Multidisciplinary Design/Concurrent Engineering |
33:Education at K-12, University and Continuing levels |
Largest Application of any Computer and Dominant HPCC Opportunity |
In spite of the large and very succesful national activity, simulation will not be a large "real world" sales opportunity for MPP's
|
However some areas of national endeavor will be customers for MPP's used for simulation
|
Some areas which may adopt HPCC for simulation in relatively near future
|
The role of HPCC in Manufacturing is quite clear and will be critical to
|
On the other hand for
|
Return on Investment Unclear:
|
The Industry is in a very competitive situation and focussed on short term needs |
In March 1994 Arpa Meeting in Washington, Boeing(Neves) endorsed parallel databases and not parallel simulation
|
Aerospace Engineers are just like University Faculty
|
There is perhaps some general decline of Supercomputer Industry
|
MAD (Multidisciplinary Analysis and Design) links:
|
(Includes MDO -- Multidisciplinary Optimization) |
Link Simulation and CAD Processes
|
This is really important application of HPCC as addresses "Amdahl's Law" as we use HPCC to support full manufacturing cycle -- not just one part! Thus large improvements in manufacturers time to market and product quality possible. |
BUT must change and even harder integrate:
|
The limited nearterm industrial use of HPCC implies that it is critical for Government and DoD to support and promote |
DoD Simulation: Dual-Use Philosophy implies
|
Manufacturing Support can lead to future US Industry leadership in advanced HPCC based manufacturing environments 10-20 years from now
|
An HPCC Software Industry is essential if HPCC field is to become commercially succesful |
The HPCC Simulation market is small |
This market is not used to paying true cost for software
|
There is a lot of excellent available public domain software (funded by federal government) |
Small Businesses are natural implementation of HPCC Software Industry
|
Two InfoMall Success Stories
|
Anecdotes from Thinking Machines (TMC) April 94 before the fall
|
Anecdote from Digital September 94:
|
These can be defined simply as those HPCC applications which have sufficient market to sustain a true balanced HPCC computing Industry with viable hardware and software companies
|
Alternatively one can define National Challenges by the HPCC technologies exploited
|
Partial Differential Equations |
Particle Dynamics and Multidisciplinary Integration |
Image Processing |
Some: |
Visualization |
Artificial Intelligence |
Not Much: |
Network Simulation |
Economic (and other complex system) modeling |
Scheduling |
Manufacturing |
Education |
Entertainment |
Information Processing |
BMC3IS (Command & Control in military war) |
Decision Support in global economic war |