Given by Geoffrey C. Fox at Aachen Parallel Computing Workshop, Pallas Presentation Germany on April 21,23 97. Foils prepared April 27 97
Outside Index
Summary of Material
This talk was presented at the "Kolloquium uber Parallelverarbeitung in technisch-naturwissenschaftlichen" at Aachen April 21, 1997 and (without PetaFlop comments) at the Pallas Software company (Bruhl Germany) April 23, 1997 |
The visit was sponsored by GMD Bonn SCAI (Ulrich Trottenberg) |
We discussed the expected PetaFlop architectures with their challenges and then the new software approaches suggested by the Web |
Please go to URL http://www.npac.syr.edu/projects/javaforcse |
We describe 3 major areas where Java (and other Web Technologies) can have significant impact |
1) Java can be used to build user Interfaces and here we describe the Virtual Programming Laboratory VPL |
2) Java can support coarse grain integration and metacomputing
|
3) Java as a traditional compiled language for computational kernels
|
Outside Index
Summary of Material
Talks in Germany April 21-23 1997 |
Geoffrey Fox |
Syracuse University |
111 College Place |
Syracuse |
New York 13244-4100 |
This talk was presented at the "Kolloquium uber Parallelverarbeitung in technisch-naturwissenschaftlichen" at Aachen April 21, 1997 and (without PetaFlop comments) at the Pallas Software company (Bruhl Germany) April 23, 1997 |
The visit was sponsored by GMD Bonn SCAI (Ulrich Trottenberg) |
We discussed the expected PetaFlop architectures with their challenges and then the new software approaches suggested by the Web |
Please go to URL http://www.npac.syr.edu/projects/javaforcse |
We describe 3 major areas where Java (and other Web Technologies) can have significant impact |
1) Java can be used to build user Interfaces and here we describe the Virtual Programming Laboratory VPL |
2) Java can support coarse grain integration and metacomputing
|
3) Java as a traditional compiled language for computational kernels
|
Conventional (Distributed Shared Memory) Silcon
|
Note Memory per Flop is much less than one to one |
Natural scaling says time steps decrease at same rate as spatial intervals and so memory needed goes like (FLOPS in Gigaflops)**.75
|
Superconducting Technology is promising but can it compete with silicon juggernaut? |
Should be able to build a simple 200 Ghz Superconducting CPU with modest superconducting caches (around 32 Kilobytes) |
Must use same DRAM technology as for silicon CPU ? |
So tremendous challenge to build latency tolerant algorithms (as over a factor of 100 difference in CPU and memory speed) but advantage of factor 30-100 less parallelism needed |
Processor in Memory (PIM) Architecture is follow on to J machine (MIT) Execube (IBM -- Peter Kogge) Mosaic (Seitz)
|
One could take in year 2007 each two gigabyte memory chip and alternatively build as a mosaic of
|
12000 chips (Same amount of Silicon as in first design but perhaps more power) gives:
|
Performance data from uP vendors |
Transistor count excludes on-chip caches |
Performance normalized by clock rate |
Conclusion: Simplest is best! (250K Transistor CPU) |
Millions of Transistors (CPU) |
Millions of Transistors (CPU) |
Normalized SPECINTS |
Normalized SPECFLTS |
Fixing 10-20 Terabytes of Memory, we can get |
16000 way parallel natural evolution of today's machines with various architectures from distributed shared memory to clustered heirarchy
|
5000 way parallel Superconducting system with 1 Petaflop performance but terrible imbalance between CPU and memory speeds |
12 million way parallel PIM system with 12 petaflop performance and "distributed memory architecture" as off chip access with have serious penalities |
There are many hybrid and intermediate choices -- these are extreme examples of "pure" architectures |
All proposed designs have VERY deep memory hierarchies which are a challenge to algorithms, compilers and even communication subsystems |
Major need for hig-end performance computers comes from government (both civilian and military) applications
|
Government must develop systems using commercial suppliers but NOT relying on traditionasl industry applications to motivate |
So Currently Petaflop initiative is thought of as an applied development project whereas HPCC was mainly a research endeavour |
Nuclear Weopens Stewardship (ASCI) |
Cryptology and Digital Signal Processing |
Satellite Data Analysis |
Climate and Environmental Modeling |
3-D Protein Molecule Reconstruction |
Real-Time Medical Imaging |
Severe Storm Forecasting |
Design of Advanced Aircraft |
DNA Sequence Matching |
Molecular Simulations for nanotechnology |
Large Scale Economic Modelling |
Intelligent Planetary Spacecraft |
Well the rest of the Software World is Changing with emergence of WebWindows Environment! |
Current approaches (HPF,MPI) lack needed capability to address memory hierarchy of either today's or any future contemplated high performance architecture -- whether sequential or parallel |
Problem Solving Environments are needed to support complex applications implied by both Web and increasing capabilities of scientific simulations |
So I suggest rethinking High Performance Computing Software Models and Implementations! |
MPI represents data movement with the abstraction for a structure of machines with just two levels of memory
|
This was a reasonable model in the past but even today fails to represent complex memory structure of typical microprocessor node |
Note HPF Distribution Model has similar (to MPI) underlying relatively simple Abstraction for PEM |
This addresses memory hierarchy intra-processor as well as inter-processor
|
Level 2 Cache |
Level 1 Cache |
See Original Foil |
By definition, Web Software will be the "best" software ever built because it has the largest market (and so greatest leverage of investment dollars) and most creative business model (harness the world's best minds together with open interfaces)
|
One should build upwards from the "democractic Web"
|
This allows you to both deliver your application to the general public (not always required but often desireable) and use the best leveraged software |
Note Web Software tends to offer highest functionality as opposed to highest performance and HPCC often requires different trade-offs |
Web Software MUST be cheaper and better than MPP software as more money invested! |
Therefore natural strategy is to get parallel computing environment by adding synchronization of parallel algorithms to loosely coupled Web distributed computing model |
Web Technology is still uncertain and there may be major changes but "enough" capabilities are in place to build very general (~all) applications
|
Rapidly evolving Standards and a mechanism to get rapid consensus |
Fortran 77 -> Fortran90 --> HPF --> Fortran2000 (23 years) |
VRML Idea (1994) --> VRML1 deployed (95) --> VRML2 deployed (early 97) (2.3 years)
|
Classic Web: HTTP Mime HTML CGI Perl etc. |
Java and JavaScript Compiled to almost compiled (applet) to fully Interpreted Programming Language |
VRML2 as a dynamic 3D Datastructure for products and their simulation object |
Java Database Connectivity (JDBC) and general Web linked databases |
Dynamic Java Servers and Clients |
Rich Web Collaboration environment building electronic societies |
Security -- still needs maturing as very clumsy or non existent at present in many cases |
Compression/ Quality of Service for Web Multimedia
|
Emerging Web Object model including integration of Corba (see JavaBeans and Orblets) |
1)Compute Power ? Maybe |
2)Network Bandwidth? In some cases |
3)Implementing / Discovering new ways of doing Business? Usually the major issue
|
Which organizations will still be here 10 years from now ? Consider University education as an example |
4)Web Technologies are very rich and are perhaps 10 times as complicated as HPCC and Parallel Programming |
Much harder to match the drumbeat of web than drumbeat of HPCC |
I use a research <--> Teach iterative cycle to learn and understand significance of new technologies
|
5)Rapid evolution of technologies implies that any "product" is bound to be out of date
|
Problem with HPCC is not ideas but rather finding enough people to implement robust rich software
|
View parallel computing as a special case of distributed computing with tighter synchronization and lower latency
|
Java/JavaScript front ends for interoperability and visualization is first step
|
Java for the User Interface: This is roughly the "WebWindows Philosophy" of building applications to Web Server/Client Standards |
Java for Coarse Grain Software Integration: see collaboration and metacomputing |
Java as a high performance scientific language: for "inner" (and outer) loops Here parallelism is important but sequential issues also critical and first issues to examine! |
This is least controversial and is essentially WebWindows for User Interfaces |
Fortran was never good at user interfaces! |
Initially Aimed at education where usability higher priority than performance |
Teaching Java and JavaScript greatly aided by interpreted technology which allow integration of demonstrations into lectures |
VPL aimed at allowing embedding of F90, HPF and MPI (etc.) examples in lectures and convenient support of homeworks for transient inexperienced users. |
Features of VPL:
|
User registers data in Java Applet and running HPF/MPI program and transfers between client applet and running simulation in a fashion similar to AVS |
This interacts via wrappers to MPI/HPF/F90 running programs |
VRML naturally gives 3D visualization with usual Web advantage of running on PC's and Workstations |
Its universality implies can use in industry to specify products so can design, manufacture and market from the same(related) specification |
Should impact PDES/STEP and such industry product specification standards |
VRML will need extension to handle this but it is a good start and allows user defined types |
VRML and Parallel Computing?
|
NPAC Web Based Geographical Information System in Stand Alone Mode |
A GIS application is a specialized OpenInventor viewer, however it accepts any OpenInventor 2.1 scene model. That's why it's so easy to integrate it with third party applications, which produce IO/VRML output. The images show GIS integration with Weather Simulation application. A GIS viewer can also display animated objects controlled by Simulation Engine. |
Parallel Java is inevitable and indeed Java will replace Fortran and C++ in general scientific computing
|
The (commercial) Web itself will lead to "coarse grain software integration" in AVS like data flow environments
|
Web Collaboration technology can revolutionize computational steering
|
Can use network of Web Clients and/or Web Servers |
Not clear if distinction (in capability) between web server and client will remain |
Web Client Models Include SuperWeb from UCSB and hotwired article "Suck your Mips". |
More powerful but less pervasive is a pure Web Server model as in NPAC WebFlow |
Can either use in controlled (IntraNets or run a server on every node of your MPP) or uncontrolled (the whole world wide web) fashion
|
Note total compute power in all Web "clients" is about 1000 times that in all Central Supercomputers |
http://www.packet.com/packet/ Hot Wired Tuesday January 7 Edition |
Applet calculates pi while you read article! |
High Level WebHPL (Interpreted Interface to parallel Java, Fortran, C++) |
or WebFlow (AVS on the Web) |
Low Level WebVM (MPI on the Web) is linked servers |
Using Servlets (Jeeves) or Resource Objects (Jigsaw) |
Java is basis of Web Collaboration Systems with Applets Coordinated by Java Server |
Habanero from NCSA was one of first |
TANGOsim uses more modern Web Technology and incorporates a Discrete Event Simulator |
TANGOsim |
Basic |
Replicated Applications |
1)Virtual Users 2)Customized Views |
TANGO Java |
Collaboratory |
Server |
HTTP |
Server |
MultiMedia Mail |
C2 Commander |
Chat |
VTC |
Event Driven |
Simulation |
Engine |
C2 Radar Officer |
3D GIS |
Scripting |
Language |
C2 Weather Officer |
Message Routing |
SW/Data Distrib. |
Other |
Collaborators |
MultiMedia Mail |
Chat |
Simulation |
Engine Controller |
All Clients |
Typical Clients |
Entirely Web-based system (runtime implemented in Java) |
Able to tap any information resources |
Self-distributing software model (applets not applications) |
Unrestricted inter-applet communication |
Supports all basic collaboratory functions:
|
Language independent: support for non-Java applications
|
Archiving system for session replays
|
Dynamic and flexible master-slave mode |
Entirely open, extensible system with growing set of applications |
Multiplatform: SGI/Sun/Win 95/NT |
TANGOsim mode provides support for discrete event simulations |
Used in Command and Control, telemedicine, and weather application in Rome Laboratory Project that funded. |
Chatboard |
Collaboratory Web browser |
Collaboratory search engine |
Mmail - TANGO multimedia mail
|
Weather with 2D and 3D views and simulation and sensor data displays |
All apps collaboratory and compatible with Simulation Engine, hence scriptable. |
Clearly Java Collaboration Systems are natural implementations of general environments that mix computers and people |
Computational Steering -- a simulation is like a participant in a Tango session which has
|
Need to link to Tango, Java data analysis/visulaization front ends as well as distributed resource management systems such as ARMS from Cornell |
Note synergy with Java Server based distributed computing such as WebFlow which builds an AVS like environment with graphical interfaces to software Integration |
More ambitious to upgrade discrete event simulation component of TANGOsim to support full SIMNET/DSI (Distributed Simulation Internet) functionality. |
Note that Java is natural language for DSI/Forces Modelling because these typically use object parallelism which fits both language and applet/JavaBean capabilities. |
See discussion in http://www.npac.syr.edu/projects/javaforcse |
Java for User Interfaces and MetaComputing is natural from its design! |
Java for your favourite Conjugate Gradient routine (etc.) is less obvious ..... |
Java likely to be a dominant language as will be learnt and used by a broad group of users
|
Java may replace C++ as major system building language
|
Clearly Java can easily replace Fortran as a Scientific Computing Language as can be compiled as efficiently and has much better software engineering (object) and graphics (web) capabilities
|
Java can unify classic science and engineering computations with more qualitative macroscopic "distributed simulation and modelling" arena which is critical in military and to some extent industry |
Key question is performance of Java |
Note Web Software can be run on High Performance IntraNets such as Iway so hardware need NOT be a problem! |
Java is currently semi-interpreted and (as in Linpack online benchmark) is about 50 times slower than good C or Fortran
|
Java --> (javac)--> Downloadable Universal Bytecodes --> (Java Interpreter) |
--> Native Machine Code
|
However Language can be efficiently compiled with "native compilers" |
Java ----> (native compiler) |
---> Native (for Particular Machine) Code |
Lots of Interesting Compiler issues for both compiled and scripted Java |
Syracuse Workshop saw no serious problem to High Performance Java on sequential or Shared Memory Machines |
Some restrictions are needed in programming model |
For instance, Avoid Complicated Exception handlers in areas compilers need to optimize! |
Should be able to get comparable performance on compiled Java C and Fortran starting with either Java Language or JavaVM bytecodes |
The Interpreted (Applet) JavaVM mode would always be slower than compiled Java/C/Fortran -- perhaps by a factor of two with best technology |
One can use "native classes" which is just a predownloaded library of optimized runtime routines which can be high performance compiled Java, C, C++, Fortran, HPF etc. modules invoked by interpreted or compiled Java
|
Use Native Classes selectively for
|
One can identify both decomposition and integration as key parts of parallel (high performance) computing |
Thus in HPF, we have distribute to address decomposition and the compiler uses MPI or equivalent to integrate |
Java brings objects and threads to help decomposition |
Java servers and applets really address integration and the greatest power of Web is in integration -- not decomposition
|
1)Classic solution of large scale PDE or Particle dynamics problem
|
2)Modest Grain size Functional Parallelism as seen in overlap of communication and computation in a node process of a parallel implementation.
|
3)Object parallelism seen in Distributed Simulation where "world" modelled (typically by event driven simulation) as set of interacting macroscopic (larger than grid points) objects
|
4)MetaProblems consisting of several large grain functionally distinct components such as
|
Java: 1) Not Supported, 2) is Thread mechanism, 3) is Java Objects or Applets, 4) is JavaBeans or equivalent |
Fortran: 1)is supported in HPF, 2--4) are not supported |
As we saw large scale Applications need many forms of parallelism and it is not needed/appropriate to use the same mechanism for each form
|
Thus Java needs (runtime and perhaps language) extension to support HPF/HPC++ like (shared memory model for programmer) data parallelism but "Java plus message passing" is already here
|
MPI (or equivalent message passing) done either as "pure Java" or as native class interface |
Threads allow overlap of communication and computation |
Higher Level Libraries such as those of DAGH (Adaptive Mesh Support) or PCRC (Compiler Runtime) |
Build in capabilities with classes designed for "ghost region" support etc. |
Parallel C++ approach (Standard Template Libraries etc.) does not work
|
Could copy HPF directive approach but as this requires major compiler development, this does not seem appropriate in near future
|
In particular can use this with Java threads running on SMP as target i.e. use Java runtime to get parallelism automatically if we spawn appropriate threads |
This work can be done on .class (Bytecode) or .java (Java language) files |
Interpreted but limited (in functionality) Java client interface to Java wrapped HPF/C++ (not necessarily and perhaps best not parallel Java)
|
Note that we avoid many difficulties but lose elegance as we exchange information between the Host and running Parallel code using "text strings" |
Host and parallel node "synchronize" object reference by registering names with the communication broker |
We can use Java as an interface to to a Web-implemented simulation linking to either Server or Client |
This does not necessarily need one to use Java native class linkage to Fortran and C -- rather just to be able to send messages between running programs |
PreProcessors Can make this more "automatic"
|
More generally should study link between interpreted and compiled environments
|
Need an Interpreted Java -- JavaScript is interpreted but in limited domain |
Java raises issue of role of Interpreters versus Compilers |
Success of systems like MATLAB and languages like APL show relevance of interpreters in Scientic Computing |
PERL, JavaScript, TcL, Visual Basic etc. indicate growing use of Interpreters in other domain
|
We suggest that integration of Interpreters and compilers is an important research issue and could suggest new models for parallelism
|
Optimizing Interpreters (as in JIT for Java) |
A library model where interpreted toolkits invoke lovingly parallelized high performance libraries |
Natural linkage to interpreted data analysis / visualization |
Numerical Objects in (C++/Fortran/C/Java) |
Expose the Coarse Grain Parallelism |
Expose All Levels of Memory Hierarchy |
a) Pure Script (Interpreted) |
c) High Level Language but Optimized Compilation |
d) Machine Optimized RunTime |
b) Semi- Interpreted |
a la Applets |
Memory Levels in High |
Performance CPU |
Nodes of Parallel/ Distributed System |
1)Classic solution of large scale PDE or Particle dynamics problem
|
2)Modest Grain size Functional Parallelism as seen in overlap of communication and computation in a node process of a parallel implementation.
|
3)Object parallelism seen in Distributed Simulation where "world" modelled (typically by event driven simulation) as set of interacting macroscopic (larger than grid points) objects
|
4)MetaProblems consisting of several large grain functionally distinct components such as
|
Java: 1) Not Supported, 2) is Thread mechanism, 3) is Java Objects or Applets, 4) is JavaBeans or equivalent |
Fortran: 1)is supported in HPF, 2--4) are not supported |
As we saw large scale Applications need many forms of parallelism and it is not needed/appropriate to use the same mechanism for each form
|
Thus Java needs (runtime and perhaps language) extension to support HPF/HPC++ like (shared memory model for programmer) data parallelism but "Java plus message passing" is already here
|
Applications requires a range of capabilities in any language |
High level ("Problem Solving Environment") manipulating"large" objects
|
Intermediate level Compiled Code targetted at "sequential" (multi-threaded) architecture
|
Lower level runtime exploiting parallelism and memory hierarchies
|
We have proposed an approach which uses native classes for "compiler runtime" and follows an HPF style with an interpreted front-end like Matlab or APL or "host" programming model as in *LISP on CM-2 |
e.g. A = HParray.matmul(B,C)
|
This leads again to Java wrappers invoked by HPF-style Java(Script) interpreter which interfaces to native HPF or other implementations.
|
NPAC HPJava Activity -- Michael Chang and Bryan Carpenter |
NPAC HPJava Activity -- Michael Chang and Bryan Carpenter |
WebFlow approach with Java Servers supporting metacomputing
|
Java suggests new approaches to distributed Event Driven Simulations
|
As usual most things work for "embarassingly parallel" problems when integration and decomposition coincide. |