Fox Presentation Spring 96 Computing in 2007: Future PetaFlop Architectures Java as the the Language for High Performance Computational Science and Simulation Invited Presentation: International Conference on Parallel Computing Minnesota Oct 3-4,96 http://www.npac.syr.edu/users/gcf/javaforcsefall96/index.html Geoffrey Fox, Wojtek Furmanski NPAC 111 College Place Syracuse NY 13244-4100 Abstract of Java for Computational Science We describe some of forces and issues which we suggest will lead to Java emerging as the dominant language for scientific and engineering computation. One Force is the new complex architectures expected for future high performance (petaflop) computers This implies that other aspects of the Web will become important and in particular Web Servers will be used as a network(web) of computer servers which will allow powerful integration of data and compute services as a "server-server" infrastructure Some of this is the natural consequence of the WebWindows picture of future software infrastructure Here "HPF on the Web" Programming Laboratory as an example We discuss both intrinsic reasons why Java is more attractive than Fortran77/90 for Computational Science (I.e. Scientific and Engineering Computation) and Issues in extending Java to support both coordination and data parallelism (HPJava) Classes of Simulations and their High Performance Needs 1)Classic solution of large scale PDE or Particle dynamics problem Data parallelism over grid points or particles 2)Modest Grain size Functional Parallelism as seen in overlap of communication and computation in a node process of a parallel implementation. More generally overlap of I/O -- disk,visualization -- and computation 3)Object parallelism seen in Distributed Simulation where "world" modelled (typically by event driven simulation) as set of interacting macroscopic (larger than grid points) objects Objects are weopens, military units etc. in SIMNET/DSI (Forces Modelling) 4)MetaProblems consisting of several large grain functionally distinct components such as Structural Analysis, Airflow, Manufacturing Process, Pricing, Controls etc. in MDO approach to manufacturing and design more generally are components of a Problem Solving Environment Java: 1) Not Supported, 2) is Thread mechanism, 3) is Java Objects or Applets, 4) is JavaBeans or equivalent Fortran: 1)is supported in HPF, 2--4) are not supported Some Critical Features of Java and Parallelism - I First the Caveat -- It is possible that Java will not "make it" but current momentum is hard to derail! Limbo (A T and T) and Active-X (Microsoft) are possibilities If Java is not the web language of future, then whatever replaces it must be better and our remarks should be applied to its replacement! Note that it is not clear if built-in thread mechanism of Java should be used in high performance implementation or "just" view as critical in supporting modest grain size functional parallelism (item 2)) within application Could use threads to support parallel implementations on shared memory machines Some Critical Features of Java and Parallelism - II As we saw large scale Applications need many forms of parallelism and it is not needed/appropriate to use the same mechanism for each form Coarse Grain Software Integration or Coordination (item 4)) Naturally built into Java through Applet mechanism and networking classes But Data Parallelism (item 1)) -- needed for "massive parallelism" -- but although not directly supported, we can do by hand! Thus Java needs (runtime and perhaps language) extension to support HPF/HPC++ like (shared memory model for programmer) data parallelism but "Java plus message passing" is already here Most Examples of Java+MP are in Information arena (This is how you build Java Collaboratories) but scientific examples are emerging We can do Java+MP for "Laplace Equation Jacobi Iteration" and this how we (Caltech) started hypercube work in 1981 Note that Fortran or C plus message passing (PVM,MPI) is dominant implementation technology for data parallelism over last ten years Some Critical Features of Java as a Programming Language Java likely to be a dominant language as will be learnt and used by a broad group of users We have taught 3 full courses and several tutorials Popular as widely applicable (growing number of API's etc.) and one gets good graphics outpiut easily. Further can use Web to exchange results of your program with peers Expect to be very effective in middle and high school programming Kids will come to University and jobs knowing and expecting to use Java They will not accept Fortran as unfamiliar and less attractive They may accept C++ as a later more complicated language The bottom up revolution! Java may replace C++ as major system building language Perhaps greater functionality (e.g. pointers) of C++ critical although "WebWindows" favors Java but this is not topic today! Comparison of Java and Fortran 77/90 Clearly Java can easily replace Fortran as a Scientific Computing Language as can be compiled as efficiently and has much better software engineering (object) and graphics (web) capabilities Fortran90 is object oriented but very small user base and not clear if will replace Fortran77 Note Fortran90 discussion started in 1978 (after Fortran77 agreed) and took fourteen years and even now Cray's Fortran77 compiler is (on C90 for numerical relativity) much better than their Fortran90 compiler. Originally Fortran90 (as Fortran8X) was designed precisely for Cray architecture systems! This illustrates that informal standards activities (as in the Web and HPF) are most appropriate for rapidly changing technologies Java can unify classic science and engineering computations with more qualitative macroscopic "distributed simulation and modelling" arena which is critical in military and to some extent industry Isn't the Web hardware and software too slow to be interesting for HPCC? -Java- I Java is currently semi-interpreted and (as in Linpack online benchmark) is about 50 times slower than good C or Fortran http://www.netlib.org/benchmark/linpackjava/ Java --> (javac)--> Downloadable Universal Bytecodes --> (Java Interpreter) --> Native Machine Code Just in Time Compilers speed this up by factor of 10 However Language can be efficiently compiled with "native compilers" Java ----> (native compiler) ---> Native (for Particular Machine) Code Lots of Interesting Compiler issues for both compiled and scripted Java Isn't the Web hardware and software too slow to be interesting for HPCC? -Java- II Applications requires a range of capabilities in any language High level ("Problem Solving Environment") manipulating"large" objects Semi Interpreted (Java Applet) or Interpreted (Improved JavaScript) Intermediate level Compiled Code targetted at "sequential" (multi-threaded) architecture Existing Native Compiled Java using Simple types (arrays) for numerically intensive parts Note as no pointers and no overloading of basic operators, Java code should be very efficient Lower level runtime exploiting parallelism and memory hierarchies "Hints" from higher level languages (in HPF style?) referencing highly functional efficient runtime optimized for high performance architectures Requires extensions to both message passing and data parallel interfaces for whatever language one uses Isn't the Web hardware and software too slow to be interesting for HPCC? -Java- III One can use "native classes" which is just a predownloaded library of optimized runtime routines which can be high performance compiled Java, C, C++, Fortran, HPF etc. modules invoked by interpreted or compiled Java This does NOT violate Web Philosophy in our opinion! Use Native Classes selectively for Compiler Runtime, Matrix Primitives, Image Processing and other engineering/science libraries, PDE primitives such as mesh generators, optimization as needed in resource management or applications Issues in Use of Web Servers as a Compute Net - I In "WebWindows" Approach one naturally gets a Web Server and Client on every node Automatic in JavaOS (NT/UNIX "replacement") Web is "server-server" and not a "client-server" architecture Several emerging technologies Jigsaw (30,000 line Java Server from MIT) Habanero and other Java Collaboration technologies JRI (Java Runtime Interface) from Netscape hides changes in Java World Java IDL links to Corba and JDBC to (all) databases Java RMI -- Remote Method Invocation and Object Serialization are distributed computing technologies from JavaSoft JavaBeans is coarse grain object (potential basic dataflow module in distributed/parallel computing supporting standardized input/output) interoperable with Visual Basic, Borland Delphi, OpenDoc, OLE, CORBA etc. Issues in Use of Web Servers as a Compute Net - II Build initial experiments conservatively so insensitive to rapid evolution of Web Note Problem Solving Environments and "Forces Modelling" (Human/Instrument in the loop) applications require integration of computing and collaboration Java Servers (merge Jigsaw and Habanero!) provide this Succesful examples: RSA-130 Factoring on the Web (embarassingly parallel) completed (NPAC, Boston, BellCore) NEOS (Argonne) and Netsolve (Tennessee) NCSA Biology Workbench uses classic CGI Web to integrate many biology simulation packages Isn't the Web hardware and software too slow to be interesting for HPCC? - IV Web Servers and HTTP are not as efficient as PVM/MPI daemons and their messaging but Technology is rapidly changing -- HTTP-NG and new Java Servers will improve and further allow customization of services to HPCC with high performance when necessary Don't customize now as Web Technology not stable enough yet! Deploy Web technology first in education and in program development where high functionality of "Web Productivity Environment" is more important than performance Then run production in classic "bare-bones" HPCC environment Isn't the Web hardware and software too slow to be interesting for HPCC? - V Internet is quite slow and getting slower but in fact many Web activities focus on IntraNets -- domain and perhaps geographically specialized hardware running pervasive Web Softwate vBNS and I-Way or ATM connected PC/Workstation clusters are our typical targets as HPCC IntraNets Superficially one can state goal as adding to the distributed computing model of the Web, the HPCC lessons and algorithms needed for high performance and tight synchronization of multiple servers and clients (Web is typically loose coarse grained coupling). This is worth doing as Web has excellent productivity software Let us Examine Issues with an Example -- "HPF on the Web" - I http://kestrel1.npac.syr.edu:6151/vpl/ (Kivanc Dincer) Allows one to specify program from Web Client, Invoke HPF Compiler and excute on a chosen set of networked Workstations Implemented as a network of HTTPD Web Servers using CGI scripts which replace PVM daemons and invoke communication implemented by modifying PVM software Supports HPF and Global Arrays (Chemistry full matrix primitives developed at Pacific NorthWest Lab) Will support MPI and some of fuller NWChem package Will be used in Virtual Workshop (Cornell) and Fox's introductory computational Science class this fall CPS615 This is Web Programming Lab Technology Naturally link in manuals and tutorial material is WebWindows implementation of Programming Environment Let us Examine Issues with an Example -- "HPF on the Web" - II Have implemented a large(16) number of Java Applets interfacing to SDDF (Self-Defining Data Format), provided by Pablo Performance Analysis Environment, developed at UIUC by CRPC Associate Dan Reed Running Node Program --> SDDF Performance Monitoring Data --> Web server which can be accessed by full set of Web Tools including Java Applet (Real-Time or Batch) Displays Store SDDF data in Web-linked databases see: http://www.npac.syr.edu/users/dincer/pablo/ Will add Java "wrapper" to HPF data-structures so can use Java for scientific visualization of applications that run in HPF This illustrates how use of the rich Web information improves HPCC programming environment for easy linkage of databases and logging and display of scientific and performance visualization Network of Web Servers and Clients We can use Java as an interface to to a Web-implemented simulation linking to either Server or Client Applications of Java for Visualization/GUI Builder Java is a convenient User Interface builder which allows one to develop quickly customized Interfaces See Screendumps of a distributed computing environment built for NASA 4D data assimilation Allows mapping and linkage of programs, datasets and machines together This gives AVS and Khoros like environments As part of black hole Grand Challenge, we are designing an interface to adaptive mesh (AMR) "Problem Solving environment" Remarks on HPJava -- Data Parallel Java - I As Java lies "inbetween" Fortran and C++, one can expect that data parallel Java can learn from corresponding HPF and HPC++ studies "Parallel Compiler Runtime Consortium" produced a very rough draft http://www.npac.syr.edu/users/gcf/hpjava3.html Java does not support templates and STL approach of C++ not so natural Need to recognize that performance of Objects in Java poorer than that of "simple types" Java spans high level interpreted objects to low level optimally compiled "simple types" Remarks on HPJava -- Data Parallel Java - II We have proposed an approach which uses native classes for "compiler runtime" and follows an HPF style with an interpreted front-end like Matlab or APL or "host" programming model as in *LISP on CM-2 e.g. A = HParray.matmul(B,C) Technically Generalizes HPF Interpreter we prototyped in 1993 Interpreters and objects are great as long as "coarse-grain" i.e. arrays not array-elements This leads again to Java wrappers invoked by HPF-style Java(Script) interpreter which interfaces to native HPF or other implementations. e.g. access HPF array Ahpf elements from Java with wrapper object A HParray A = new HParrayConstructor("Ahpf"); A.grabelement(1,100) Suggested Action Items at NPAC Establish bottom-up constituency by teaching Java in Middle and High Schools Start working groups/meetings to study requirements and issues December 96 Inaugural meeting at NPAC Build Prototype Web Coarse Grain Computing Environments WebFlow -- Furmanski and Hariri MetaWeb -- Baker and Lifka(Cornell) Design and build "Java Wrappers" to both sequential and parallel Fortran77/90 link wrapper classes to good (Java) scientific display (plot) packages Link above technologies in the WebWindows Programming Laboratory Add Pablo and Science visualization to HPF on the web Virtual ,Programing Laboratory Build Expertise/Infrastructure on High Performance Optimizing Java Compilers aiming at data structures of importance to Science! Collaborate with Web/Compiler groups in China .... Workshop on Java for Computational Science and Engineering Simulation and Modelling December 16-17, 1996 Syracuse New York Geoffrey Fox NPAC 111 College Place Syracuse NY 13244-4100 Some Motivations Java will become a very popular and perhaps dominant language for scientific and engineering computation This will be true in both parallel and conventional environments Web Technology will be basis of many software systems (WebWindows) and in particular scientific computing environments This encompasses various aspects of computation "Integration" -- distributed metacomputing wrappers and user interfaces More tightly coupled computing Some Deductions Some of us should focus attention on Java and not worry so much about Fortran and C++ even though these will be doiminant in immediate future Technical computing has always suffered as too small to get substantial industry attention Sun and others will focus on more lucrative Java applications Technical community needs to ensure their concerns/requirements in Java language and environments are heard Need to establish a framework in which reseaerch in such a new area will be funded by federal government Make area respectable and not "chasing fads" Some Action Items Initial attempt at a white paper in May 96 of marginal success as no agreement among participants as to role of Java The first white paper focussed on parallel java This workshop focuses on Java for technical computing in all aspects including parallel as needed for high performance Birds of a Feather Session at Supercomputing 96 saw agreement on new theme mailing list java-for-cse@npac.syr.edu Enroll at http://www.npac.syr.edu/projects/javaforcse This workshop is to air technical issues and publish as special issue of journal "Concurrency: Practice and Experience" (edited by Fox) Need to interact with Java juggernaut with a White Paper on Java for technical computing and other ways! Approachs to Parallel Java - SPMD Model i.e. user writes Node Program MPI (or equivalent message passing) done either as "pure Java" or as native class interface Threads allow overlap of communication and computation Higher Level Libraries such as those of DAGH (Adaptive Mesh Support) or PCRC (Compiler Runtime) Build in capabilities with classes designed for "ghost region" support etc. Approachs to Parallel Java - High Level - I Parallel C++ approach (Standard Template Libraries etc.) does not work Cannot overload operators Could copy HPF directive approach but as this requires major compiler development, this does not seem appropriate in near future Approachs that need simple preprocessor are probably acceptable parallel Fortran 77 approach is easier with identification of loop level parallelism In particular can use this with Java threads running on SMP as target i.e. use Java runtime to get parallelism automatically if we spawn appropriate threads This work can be done on .class (Bytecode) or .java (Java language) files Approachs to Parallel Java - High Level - II Interpreted but limited (in functionality) Java client interface to Java wrapped HPF/C++ (not necessarily and perhaps best not parallel Java) Do visualization and simple data analysis support first Note that we avoid many difficulties but lose elegance as we exchange information between the Host and running Parallel code using "text strings" Host and parallel node "synchronize" object reference by registering names with the communication broker More on Interpreted Java Front Ends This does not necessarily need one to use Java native class linkage to Fortran and C -- rather just to be able to send messages between running programs PreProcessors Can make this more "automatic" Such registration familiar in CORBA and many visualization systems such as AVS Remote Method Synchronization (RMI) and Object Serialization in Java 1.1 are important in "native" Java solutions More generally should study link between interpreted and compiled environments Increasing performance of Computers implies interpreters getting more attractive Need an Interpreted Java -- JavaScript is interpreted but in limited domain Decomposition Versus Integration One can identify both decomposition and integration as key parts of parallel (high performance) computing Thus in HPF, we have distribute to address decomposition and the compiler uses MPI or equivalent to integrate Java brings objects and threads to help decomposition Java servers and applets really address integration and the greatest power of Web is in integration -- not decomposition Most Web computing research focusses implicitly on integration Approachs to Parallel Java - High Level - III WebFlow approach with Java Servers supporting metacomputing Note this is coarse grain software integration NOT decomposition You decomposose problem implicitly as start bottom-up to build Web aware modules and link (integrate/coordinate) with Java Servers Java suggests new approaches to distributed Event Driven Simulations Java Objects or JavaBeans provide decomposition Java Servers provide integration As usual most things work for "embarassingly parallel" problems when integration and decomposition coincide.