Given by Geoffrey C. Fox at CPS615 Basic Simulation Track for Computational Science on Fall Semester 97. Foils prepared 27 August 1997
Outside Index
Summary of Material
This introduces the course which covers the essential programming and algorithmic skills needed for large scale (serious) computing (aka simulation) |
We start with an overview of course itself and describe how it compares to other computational science offerings
|
In Introductory material, we describe some large simulations of interest (the so called Grand Challenges) |
An Overview of Technology trends driving the computer industry (in web and simulation areas!) |
Overview of computational science education program here and elsewhere |
Parallel Computing in Society and why it works in computers! |
Outside Index Summary of Material
http://www.npac.syr.edu/users/gcf/cps615intro97 |
Geoffrey Fox |
Syracuse University NPAC |
111 College Place Syracuse NY 13244 4100 |
3154432163 |
This introduces the course which covers the essential programming and algorithmic skills needed for large scale (serious) computing (aka simulation) |
We start with an overview of course itself and describe how it compares to other computational science offerings
|
In Introductory material, we describe some large simulations of interest (the so called Grand Challenges) |
An Overview of Technology trends driving the computer industry (in web and simulation areas!) |
Overview of computational science education program here and elsewhere |
Parallel Computing in Society and why it works in computers! |
Practical use of leading edge computer science technologies to address "real" applications |
CPS615/713: Simulation Track |
CPS606/616/640/714: Information Track |
Large Scale Parallel Computing Low latency Closely Coupled Components |
World Wide Distributed Computing Loosely Coupled Components |
Parallel Algorithms Fortran90 |
HPF MPI Interconnection Networks |
Transactions |
Security |
Compression |
PERL JavaScript Multimedia |
Wide Area Networks |
Java |
VRML Collaboration Integration (middleware) CORBA Databases |
These ideas come from Xiaoming Li, Peking University |
CPS615 |
Instructor: Geoffrey Fox -- gcf@npac.syr.edu, 3154432163 and Room 3-131 CST |
Backup: Nancy McCracken -- njm@npac.syr.edu, 3154434687 and Room 3-234 CST |
NPAC administrative support: Nora Downey-Easter -- nora@npac.syr.edu, 3154431722 and room 3-206 CST |
The above can be reached at cps615ad@npac.syr.edu |
Students will be on alias cps615@npac.syr.edu |
Homepage is: http://www.npac.syr.edu/projects/cps615fall97 |
There is a slightly out of date paper "Basic Issues and Current Status of Parallel Computing" by Fox (SCCS736 on NPAC technical reports) |
Graded on the basis of approximately 7 Homework sets which will be due Thursday of the week following day (Tuesday or Thursday given out) |
There will be two projects -- one will start after message passing (MPI) discussed, the other about 60% through the course |
Total grade is 50% homework, 50% the sum of two projects |
All homework will be handled through the web and indeed all computer access will be through the VPL or Virtual Programming Laboratory which gives access to compilers, Java visualization etc. through the web |
Status of High Performance Computing and Computation HPCC nationally
|
What is Computational Science Nationally (and at Syracuse) |
Technology driving forces (until around 2010)
|
Elementary discussion of Parallel Computing in Society and why it must obviously work in simulation! |
Sequential and Parallel Computer Architectures |
Comparison of Architecture of World Wide Web and Parallel Systems |
Simple base Example -- Laplace's Equation
|
Then we discuss software and algorithms (mini-applications) intermixed |
Programming Models -- SPMD and Message Passing (MPI), Software Integration middleware (Java, WebFlow), Data Parallel (HPF, Fortran90) |
Visualization -- Java Applets |
Other tools: Collaboration , Performance Measurement |
This introduction is followed by a set of "vignettes" discussing problem classes which illustrate parallel programming and parallel algorithms |
Ordinary Differential Equations
|
Numerical Integration including adaptive methods |
Floating Point Arithmetic |
Monte Carlo Methods including Random Numbers |
Full Matrix Algebra as in
|
Partial Differential Equations implemented as sparse matrix problems (as in Computational Fluid Dynamics)
|
Large Scale Simulations in Engineering
|
Large Scale Academic Simulations (Physics, Chemistry, Biology)
|
"Other Critical Real World Applications"
|
For "Convential" MPP/Distributed Shared Memory Architecture |
Now(1996) Peak is 0.1 to 0.2 Teraflops in Production Centers
|
In 1999, one will see production 1 Teraflop systems |
In 2003, one will see production 10 Teraflop Systems |
In 2007, one will see production 50-100 Teraflop Systems |
Memory is Roughly 0.25 to 1 Terabyte per 1 Teraflop |
If you are lucky/work hard: Realized performance is 30% of Peak |
HPCC is a maturing field with many organizations installing large scale systems |
These include NSF (academic computations), DoE (Dept of Energy) and DoD (Defense) |
There are new applications with new algorithmic challenges
|
However these are not "qualitatively" new concepts |
Software ideas are "sound" but note simulation is only a few percent of total computer market
|
PC and workstation clusters are of growing important and this typically distributed memory people's technology is contrasted with distributed shared memory tightly coupled MPP's. |
Computational science moving to multidisciplinary (multi-component) applications |
Corresponding growing use of databases (for data-intensive applications) |
Interoperability between disparate heterogeneous platforms, support of multidisciplinary applications, and metacomputing are three related important areas |
"full metacomputing" (decompose general problem on general networked resources) may not be relevant |
The Web is delivering a new operating environment (WebWindows) and a rich distributed computing software infrastructure with especially excellent support for software integration |
There is a need for a new scalable technical operating system (NT v UNIX v WebWindows) |
We can distinguish Decomposition and Integration |
Decomposition is performed by an HPF or other Parallelizing compiler; or by a user writing a Fortran + Message Passing code "by hand" |
MPI integrates decomposed parts together with high bandwidth latency constraints |
Systems such as AVS integrate larger modules together and much of "software engineering" (modular style of programming) involved with this |
Web is a powerful integration model suitable for large coarse modules with modest latency and sometimes modest bandwidth requirements
|
Collaboration, computational steering, multidisciplinary science are all integration and not decomposition problems! |
By definition, Web Software will be the "best" software ever built because it has the largest market (and so greatest leverage of investment dollars) and most creative business model (harness the world's best minds together with open interfaces)
|
One should build upwards from the "democractic Web"
|
This allows you to both deliver your application to the general public (not always required but often desireable) and use the best leveraged software |
Note Web Software tends to offer highest functionality as opposed to highest performance and HPCC often requires different trade-offs |
HPCC is a small field with limited resources for a very hard problem and must leverage as much software as possible |
Web Software provides an excellent pervasive user interface with Java Applets and WebWindows |
Web Software provides a potentially excellent high performance object oriented language (Java) for scientific and engineering computation |
Web Software provides a high functionality but modest performance distributed computing environment based on either Web Servers or Clients
|
All(!?) we need to do is to add high performance to the Web! |
Web Technology is still uncertain and there may be major changes but "enough" capabilities are in place to build very general (~all) applications
|
Rapidly evolving Standards and a mechanism to get rapid consensus |
Fortran 77 -> Fortran90 --> HPF --> Fortran2000 (23 years) |
VRML Idea (1994) --> VRML1 deployed (95) --> VRML2 deployed (early 97) (2.3 years)
|
Classic Web: HTTP Mime HTML CGI Perl etc. |
Java and JavaScript Compiled to almost compiled (applet) to fully Interpreted Programming Language |
VRML2 as a dynamic 3D Datastructure for products and their simulation object |
Java Database Connectivity (JDBC) and general Web linked databases |
Dynamic Java Servers and Clients |
Rich Web Collaboration environment building electronic societies |
Security -- still needs maturing as very clumsy or non existent at present in many cases |
Compression/ Quality of Service for Web Multimedia
|
Emerging Web Object model including integration of Corba (see JavaBeans and Orblets) |
See Original Foil |
Java for the User Interface: This is roughly the "WebWindows Philosophy" of building applications to Web Server/Client Standards |
Java for Coarse Grain Software Integration: see collaboration and metacomputing |
Java as a high performance scientific language: for "inner" (and outer) loops Here parallelism is important but sequential issues also critical and first issues to examine! |
Building from bottom of Computing pyramid starts with high functionality software which has an architecture that can be augmented with high performance |
3 Levels of Software
|
One universal language -- Java for all layers |
f is the size of basic lines on chip and is now 0.25nm and will decrease inexorably to 100 nm in 2005-2010 time period
|
Roughly number of Transistors is proportional to (1/f)2 |
Speed of chip is proportional to (1/f)
|
Figure of Merit is (1/f)3 |
Die size is also increasing like (1/f) and this enhances effect! |
RAM density increases by about a factor of 50 in 8 years |
Supercomputers in 1992 have memory sizes around 32 gigabytes (giga = 109) |
Supercomputers in year 2000 should have memory sizes around 1.5 terabytes (tera = 1012) |
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994 |
L=Logic, D=DRAM, A=ASIC, mP = microprocessor |
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994 |
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994 |
Overall Roadmap Technology Characteristics from SIA (Semiconductor Industry Association) Report 1994 |
See Chapter 5 of Petaflops Report -- July 95 |
Transistors are getting cheaper and cheaper and it only takes some 0.5 million transistors to make a very high quality CPU
|
Already we build chips with some factor of ten more transistors than this and this is used for "automatic" instruction level parallelism.
|
However getting much more speedup than this requires use of "outer loop" or data parallelism. |
Actually memory bandwidth is an essential problem in any computer as doing more computations per second requires accessing more memory cells per second!
|
Data locality implies CPU finds information it needs in cache which stores most recently accessed information |
This means one reuses a given memory reference in many nearby computations e.g. |
A1 = B*C |
A2 = B*D + B*B |
.... Reuses B |
L3 Cache |
Main |
Memory |
Disk |
Increasing Memory |
Capacity Decreasing |
Memory Speed (factor of 100 difference between processor |
and main memory |
speed) |
For both parallel and sequential computers, cost is accessing remote memories with some form of "communication" |
Data locality addresses in both cases |
Differences are quantitative size of effect and what is done by user and what automatically |
Originally $2.9 billion over 5 years starting in 1992 and
|
The Grand Challenges
|
Nearly all grand challenges have industrial payoff but technology transfer NOT funded by HPCCI |
High Performance Computing Act of 1991 |
Computational performance of one trillion operations per second on a wide range of important applications |
Development of associated system software, tools, and improved algorithms |
A national research network capable of one billion bits per second |
Sufficient production of PhDs in computational science and engineering |
1992: Grand Challenges |
1993: Grand Challenges |
1994: Toward a National Information Infrastructure |
1995: Technology for the National Information Infrastructure |
1996: Foundation for America's Information Future |
Different machines |
New types of computers |
New libraries |
Rewritten Applications |
Totally new fields able to use computers .... ==> Need new educational initiatives Computational Science |
Will be a nucleus for the phase transition |
and accelerate use of parallel computers in the real world |
Computational Science is an interdisciplinary field that integrates computer science and applied mathematics with a wide variety of application areas that use significant computation to solve their problems |
Includes the study of computational techniques
|
Includes the study of new algorithms, languages and models in computer science and applied mathematics required by the use of high performance computing and communications in any (?) important application
|
Includes computation of complex systems using physical analogies such as neural networks and genetic optimization. |
Formal Master's Program with reasonable curriculum and course material |
PhD called Computer and Information Science but can choose computational science research |
Certificates(Minors) in Computational Science at both the Masters and PhD Level |
Undergraduate Minors in Computational Science |
All Programs are open to both computer science and application (computer user) students |
Currently have both an "Science and Engineering Track" ("parallel computing") and an "Information oriented Track" ("the web") |
Conclusions of DOE Conference on Computational Science Education, Feb 1994 |
Industry and government laboratories want graduates with Computational Science and Engineering training - don't care what degree is called |
Universities - want graduates with Computational Science and Engineering training - want degrees to have traditional names |
Premature to have BS Computational Science and Engineering |
Master's Degree in Computational Science Course Requirements: |
Core Courses:
|
Application Area:
|
It is required to take one course in 3 out of the following 4 areas:
|
Minors in Computational Science |
Masters Level Certificate:
|
Doctoral level Certificate:
|
Doctoral level Certificate in Computational Neuroscience:
|
Computer Science -- Nationally viewed as central activity
|
Computer Engineering -- Historically Mathematics and Electrical Engineering have spawned Computer Science programs -- if from electrical engineering, the field is sometimes called computer engineering |
Applied Mathematics is a very broad field in U.K. where equivalent to Theoretical Physics. In USA applied mathematics is roughly mathematics associated with fluid flow
|
Computational Physics -- Practioners will be judged by their contribtion to physics and not directly by algorithms and software innovations.
|
The fundamental principles behind the use of concurrent computers are identical to those used in society - in fact they are partly why society exists. |
If a problem is too large for one person, one does not hire a SUPERman, but rather puts together a team of ordinary people... |
cf. Construction of Hadrians Wall |
Domain Decomposition is Key to Parallelism |
Need "Large" Subdomains l >> l overlap |
AMDAHL"s LAW or |
Too many cooks spoil the broth |
Says that |
Speedup S is small if efficiency e small |
or for Hadrian's wall |
equivalently S is small if length l small |
But this is irrelevant as we do not need parallel processing unless problem big! |
"Pipelining" or decomposition by horizontal section is:
|
Hadrian's Wall is one dimensional |
Humans represent a flexible processor node that can be arranged in different ways for different problems |
The lesson for computing is: |
Original MIMD machines used a hypercube topology. The hypercube includes several topologies including all meshes. It is a flexible concurrent computer that can tackle a broad range of problems. Current machines use different interconnect structure from hypercube but preserve this capability. |
Comparing Computer and Hadrian's Wall Cases |
At the finest resolution, collection of neurons sending and receiving messages by axons and dendrites |
At a coarser resolution |
Society is a collection of brains sending and receiving messages by sight and sound |
Ant Hill is a collection of ants (smaller brains) sending and receiving messages by chemical signals |
Lesson: All Nature's Computers Use Message Passing |
With several different Architectures |
Problems are large - use domain decomposition Overheads are edge effects |
Topology of processor matches that of domain - processor with rich flexible node/topology matches most domains |
Regular homogeneous problems easiest but |
irregular or |
Inhomogeneous |
Can use local and global parallelism |
Can handle concurrent calculation and I/O |
Nature always uses message passing as in parallel computers (at lowest level) |
Now we go through a set of semi-realistic examples of parallel computing and show they use various forms of data parallelism |
Seismic Wave Simulation: Regular mesh of grid points |
Cosmology (Universe) Simulation: Irregular collection of stars or galaxies |
Computer Chess: "Data" is now parts of a game tree with major complication of pruning algorithms
|
Defending the Nation: Tracking multiple missiles achieves parallelism from set of missiles |
Finite Element (NASTRAN) structures problem: Irregular collection of nodal (grid) points clustered near a crack |
2 Different types of Mappings in Physical Spaces |
Both are static
|
Different types of Mappings -- A very dynamic case without any underlying Physical Space |
c)Computer Chess with dynamic game tree decomposed onto 4 nodes |
And the corresponding poor workload balance |
And excellent workload balance |
The case of Programming a Hypercube |
Each node runs software that is similar to sequential code |
e.g., FORTRAN with geometry and boundary value sections changed |
Geometry irregular but each brick takes about the same amount of time to lay. |
Decomposition of wall for an irregular geometry involves equalizing number of bricks per mason, not length of wall per mason. |
Fundamental entities (bricks, gargoyles) are of different complexity |
Best decomposition dynamic |
Inhomogeneous problems run on concurrent computers but require dynamic assignment of work to nodes and strategies to optimize this |
(we use neural networks, simulated annealing, spectral bisection etc.) |
Global Parallelism
|
Local Parallelism
|
Local and Global Parallelism |
Should both be Exploited |
Disk (input/output) Technology is better matched to several modest power processors than to a single sequential supercomputer |
Concurrent Computers natural in databases, transaction analysis |