Given by Geoffrey C. Fox,Wojtek Furmanski at Int. Conf. on Parallel Computing in Minneapolis on Oct 3-4 1996. Foils prepared Sept 30 1996
Abstract * Foil Index for this file
Addon
See also color IMAGE
We describe some of forces and issues which we suggest will lead to Java emerging as the dominant language for scientific and engineering computation. |
One Force is the new complex architectures expected for future high performance (petaflop) computers |
This implies that other aspects of the Web will become important and in particular Web Servers will be used as a network(web) of computer servers which will allow powerful integration of data and compute services as a "server-server" infrastructure
|
We discuss both intrinsic reasons why
|
This table of Contents
Abstract
Geoffrey Fox, Wojtek Furmanski
|
We describe some of forces and issues which we suggest will lead to Java emerging as the dominant language for scientific and engineering computation. |
One Force is the new complex architectures expected for future high performance (petaflop) computers |
This implies that other aspects of the Web will become important and in particular Web Servers will be used as a network(web) of computer servers which will allow powerful integration of data and compute services as a "server-server" infrastructure
|
We discuss both intrinsic reasons why
|
Conventional (Distributed Shared Memory) Silcon
|
Note Memory per Flop is much less than one to one |
Natural scaling says time steps decrease at same rate as spatial intervals and so memory needed goes like (FLOPS in Gigaflops)**.75
|
Superconducting Technology is promising but can it compete with silicon juggernaut? |
Should be able to build a simple 200 Ghz Superconducting CPU with modest superconducting caches (around 32 Kilobytes) |
Must use same DRAM technology as for silicon CPU ? |
So tremendous challenge to build latency tolerant algorithms (as over a factor of 100 difference in CPU and memory speed) but advantage of factor 30-100 less parallelism needed |
Processor in Memory (PIM) Architecture is follow on to J machine (MIT) Execube (IBM -- Peter Kogge) Mosaic (Seitz)
|
One could take in year 2007 each two gigabyte memory chip and alternatively build as a mosaic of
|
12000 chips (Same amount of Silicon as in first design but perhaps more power) gives:
|
Performance data from uP vendors |
Transistor count excludes on-chip caches |
Performance normalized by clock rate |
Conclusion: Simplest is best! (250K Transistor CPU) |
Millions of Transistors (CPU) |
Millions of Transistors (CPU) |
Normalized SPECINTS |
Normalized SPECFLTS |
Fixing 10-20 Terabytes of Memory, we can get |
16000 way parallel natural evolution of today's machines with various architectures from distributed shared memory to clustered heirarchy
|
5000 way parallel Superconducting system with 1 Petaflop performance but terrible imbalance between CPU and memory speeds |
12 million way parallel PIM system with 12 petaflop performance and "distributed memory architecture" as off chip access with have serious penalities |
There are many hybrid and intermediate choices -- these are extreme examples of "pure" architectures |
All proposed hardware architectures have a complex memory hierarchy which should be abstracted with a software architecture
|
This implies a layered software architecture reflected in all components
|
The Software Architecture should be defined early on so that hardware and software respect it!
|
Users and Compilers must be able to have full control of data movement and placement in all parts of petaflop system |
Size and Complex Memory Structure of PetaFlop machines represent major challenges in scaling existing Software Concepts |
Domain Specific Application Problem Solving Environment |
Numerical Objects in (C++/Fortran/C/Java) High Level Virtual Problem |
Expose the Coarse Grain Parallelism of the Real Complex Computer |
Expose All Levels of Memory Hierarchy of the Real Complex Computer |
Virtual |
Problem /Appl. ADI |
Multi |
Level |
Machine ADI |
Well the rest of the Software World is Changing with emergence of WebWindows Environment! |
Current approaches (HPF,MPI) lack needed capability to address memory hierarchy of either today's or any future contemplated high performance architecture -- whether sequential or parallel |
Problem Solving Environments are needed to support complex applications implied by both Web and increasing capabilities of scientific simulations |
So I suggest rethinking High Performance Computing Software Models and Implementations! |
MPI represents data movement with the abstraction for a structure of machines with just two levels of memory
|
This was a reasonable model in the past but even today fails to represent complex memory structure of typical microprocessor node |
Note HPF Distribution Model has similar (to MPI) underlying relatively simple Abstraction for PEM |
This addresses memory hierarchy intra-processor as well as inter-processor
|
Level 2 Cache |
Level 1 Cache |
1)Classic solution of large scale PDE or Particle dynamics problem
|
2)Modest Grain size Functional Parallelism as seen in overlap of communication and computation in a node process of a parallel implementation.
|
3)Object parallelism seen in Distributed Simulation where "world" modelled (typically by event driven simulation) as set of interacting macroscopic (larger than grid points) objects
|
4)MetaProblems consisting of several large grain functionally distinct components such as
|
Java: 1) Not Supported, 2) is Applet mechanism, 3) is Java Objects or Applets, 4) is JavaBeans or equivalent |
Fortran: 1)is supported in HPF, 2--4) are not supported |
First the Caveat -- It is possible that Java will not "make it" but current momentum is hard to derail!
|
If Java is not the web language of future, then whatever replaces it must be better and our remarks should be applied to its replacement! |
Note that it is not clear if built-in thread mechanism of Java should be used in high performance implementation or "just" view as critical in supporting modest grain size functional parallelism (item 2)) within application
|
As we saw large scale Applications need many forms of parallelism and it is not needed/appropriate to use the same mechanism for each form
|
Thus Java needs (runtime and perhaps language) extension to support HPF/HPC++ like (shared memory model for programmer) data parallelism but "Java plus message passing" is already here
|
Java likely to be a dominant language as will be learnt and used by a broad group of users
|
Java may replace C++ as major system building language
|
Clearly Java can easily replace Fortran as a Scientific Computing Language as can be compiled as efficiently and has much better software engineering (object) and graphics (web) capabilities
|
Java can unify classic science and engineering computations with more qualitative macroscopic "distributed simulation and modelling" arena which is critical in military and to some extent industry |
Java is currently semi-interpreted and (as in Linpack online benchmark) is about 50 times slower than good C or Fortran
|
Java --> (javac)--> Downloadable Universal Bytecodes --> (Java Interpreter) |
--> Native Machine Code
|
However Language can be efficiently compiled with "native compilers" |
Java ----> (native compiler) |
---> Native (for Particular Machine) Code |
Lots of Interesting Compiler issues for both compiled and scripted Java |
My SGI INDY gets .54 Megaflops for Java 100 by 100 Linpack |
It has 200 Mhz R4400 and current Netlib benchmark for this chip is 32 mflops for optimized Fortran |
For better resolution see JPEG Version |
see http://www.netlib.org/benchmark/linpackjava/ |
Note Just in Time Compilers are giving a factor of 10 from June 96 Measurements! |
Applications requires a range of capabilities in any language |
High level ("Problem Solving Environment") manipulating"large" objects
|
Intermediate level Compiled Code targetted at "sequential" (multi-threaded) architecture
|
Lower level runtime exploiting parallelism and memory hierarchies
|
Domain Specific Application Problem Solving Environment |
Numerical Objects in (C++/Fortran/C/Java) High Level Virtual Problem |
Expose the Coarse Grain Parallelism of the Real Complex Computer |
Expose All Levels of Memory Hierarchy of the Real Complex Computer |
Virtual |
Problem /Appl. ADI |
Multi |
Level |
Machine ADI |
Pure Script (Interpreted) |
High Level Language but Optimized Compilation |
Machine Optimized RunTime |
Semi-Interpreted |
a la Applets |
One can use "native classes" which is just a predownloaded library of optimized runtime routines which can be high performance compiled Java, C, C++, Fortran, HPF etc. modules invoked by interpreted or compiled Java
|
Use Native Classes selectively for
|
In "WebWindows" Approach one naturally gets a Web Server and Client on every node
|
Several emerging technologies
|
In future one will NOT write software for either
|
Rather one will write software for WebWindows defined as the operating environment for World Wide Web |
WebWindows builds on top of Web Servers and Web Client open interfaces as in
|
Applications written for WebWindows will be portable to all computers running Web Servers or Clients which hide hardware and native O/S specifics |
WebWindows Interface |
See Original Foil |
See Original Foil |
See Original Foil |
Build initial experiments conservatively so insensitive to rapid evolution of Web |
Note Problem Solving Environments and "Forces Modelling" (Human/Instrument in the loop) applications require integration of computing and collaboration
|
Succesful examples:
|
http://www.npac.syr.edu/factoring/status.html |
Web Sieving started in September 1995. |
On April 10, 1996, we found that |
RSA-130 = 1807082088687404805951656164405905566278102516769401349170127021450056662540244048387341127590812303371781887966563182013214880557 has the following factorization: RSA-130 = 39685999459597454290161126162883786067576449112810064832555157243 * 45534498646735972188403686897274408864356301263205069600999044599 |
An example of Web-based Computing |
It lets researchers author tools and leave them on the machine of choice on the web |
It allows multiple data bases to intercommunicate with each other and the functional operators that the software tools represent and to make a web browser the window into this system. |
Web Servers and HTTP are not as efficient as PVM/MPI daemons and their messaging but
|
Deploy Web technology first in education and in program development where high functionality of "Web Productivity Environment" is more important than performance |
Then run production in classic "bare-bones" HPCC environment |
PCRC embodies the Parallel Computing Synchronization and collective parallel algorithms and runtime that will enable efficient Web-based computing |
Replace user interface of HPF or HPC++ with the Web(work) and use pervasive Web Technologies in infrastructure (World Wide Virtual Machine -- WWVM) |
Internet is quite slow and getting slower but in fact many Web activities focus on IntraNets -- domain and perhaps geographically specialized hardware running pervasive Web Softwate
|
Superficially one can state goal as adding to the distributed computing model of the Web, the HPCC lessons and algorithms needed for high performance and tight synchronization of multiple servers and clients (Web is typically loose coarse grained coupling).
|
http://kestrel1.npac.syr.edu:6151/vpl/ (Kivanc Dincer) |
Allows one to specify program from Web Client, Invoke HPF Compiler and excute on a chosen set of networked Workstations |
Implemented as a network of HTTPD Web Servers using CGI scripts which replace PVM daemons and invoke communication implemented by modifying PVM software |
Supports HPF and Global Arrays (Chemistry full matrix primitives developed at Pacific NorthWest Lab)
|
Will be used in Virtual Workshop (Cornell) and Fox's introductory computational Science class this fall CPS615
|
Have implemented a large(16) number of Java Applets interfacing to SDDF (Self-Defining Data Format), provided by Pablo Performance Analysis Environment, developed at UIUC by CRPC Associate Dan Reed |
Running Node Program --> SDDF Performance Monitoring Data --> Web server |
which can be accessed by full set of Web Tools including
|
see: http://www.npac.syr.edu/users/dincer/pablo/ |
Will add Java "wrapper" to HPF data-structures so can use Java for scientific visualization of applications that run in HPF |
This illustrates how use of the rich Web information improves HPCC programming environment for easy linkage of databases and logging and display of scientific and performance visualization |
Java is a convenient User Interface builder which allows one to develop |
quickly customized Interfaces
|
This gives AVS and Khoros like environments |
As part of black hole Grand Challenge, we are designing an interface to adaptive mesh (AMR) "Problem Solving environment" |
From Gregor von Laszewski |
From Gregor von Laszewski |
We can use Java as an interface to to a Web-implemented simulation linking to either Server or Client |
As Java lies "inbetween" Fortran and C++, one can expect that data parallel Java can learn from corresponding HPF and HPC++ studies |
"Parallel Compiler Runtime Consortium" produced a very rough draft
|
Java does not support templates and STL approach of C++ not so natural |
Need to recognize that performance of Objects in Java poorer than that of "simple types" |
Java spans high level interpreted objects to low level optimally compiled "simple types" |
We have proposed an approach which uses native classes for "compiler runtime" and follows an HPF style with an interpreted front-end like Matlab or APL |
e.g. A = HParray.matmul(B,C)
|
This leads again to Java wrappers invoked by HPF-style Java(Script) interpreter which interfaces to native HPF or other implementations.
|
Establish bottom-up constituency by teaching Java in Middle and High Schools |
Start working groups/meetings to study requirements and issues
|
Build Prototype Web Coarse Grain Computing Environments
|
Design and build "Java Wrappers" to both sequential and parallel Fortran77/90
|
Link above technologies in the WebWindows Programming Laboratory
|
Build Expertise/Infrastructure on High Performance Optimizing Java Compilers aiming at data structures of importance to Science! |
Collaborate with Web/Compiler groups in China .... |