Fox Presentation Spring 96 Emerging Network Technologies for Scientific Computing CRPC Review Meeting August 16 1996 http://www.npac.syr.edu/users/gcf/crpcnetcomp2/index.html Geoffrey Fox, Wojtek Furmanski NPAC 111 College Place Syracuse NY 13244-4100 Abstract of Emerging Network Technologies for Scientific Computing We describe some of forces driving the Web and its technologies of relevance to large scale distributed metacomputing We focus on Two Areas in this talk Role of Web Servers in forming a network(web) of computer servers which allow powerful integration of data and compute services as a "server-server" infrastructure We take "HPF on the Web" Programming Laboratory as an example Issues in extending Java to support both coordination and data parallelism (HPJava) CRPC Context for Presentation This is second in set of five talks that roughly correspond to layers in an integrated environment for high performance computing in a networked (meta)computing environment Ian Foster: Middleware enabling Wide Scale high performance communication -- Globus, Nexus -- and application motivation Geoffrey Fox: Role of Web Technology in Scientific Computing and Data Processing for Middleware and Programming Ken Kennedy, Jack Dongarra and Mani Chandy describe higher level Compilation and technology for Domain Specific Problem Solving Environments Much of Syracuse work is in collaboration with CRPC Associate Marina Chen at Boston University Some Critical Features of Java and Parallelism Large Scale Applications (as discussed by Foster) need many forms of parallelism Coarse Grain Software Integration or Coordination Naturally built into Java through Applet mechanism and networking classes Data Parallelism -- needed for "massive parallelism" Java needs (runtime and perhaps language) extension to support HPF/HPC++ like data parallelism but Foster's talk has shown that "Java plus message passing" is already here Note that Fortran or C plus message passing (PVM,MPI) is dominant implementation technology for data parallelism over last ten years It is possible that Java will not "make it" but current momentum is hard to derail! Limbo (A T and T) and Active-X (Microsoft) are possibilities If Java is not the web language of future, then whatever replaces it must be better and our remarks should be applied to its replacement! Some Critical Features of Java and Scientific Computing Java likely to be a dominant language as will be learnt and used by a broad group of users We have taught 3 full courses and several tutorials Popular as widely applicable (growing number of API's etc.) and one gets good graphics output easily. Expect to be very effective in middle and high school programming Kids will come to University and jobs knowing and expecting to use Java The bottom up revolution! Clearly Java can easily replace Fortran as a Scientific Computing Language as can be compiled as efficiently and has much better software engineering (object) and graphics (web) capabilities Java may replace C++ as major system building language Perhaps greater functionality (e.g. pointers) of C++ critical Isn't the Web hardware and software too slow to be interesting for HPCC? - I Java is currently semi-interpreted and (as in Linpack online benchmark) is about 50 times slower than good C or Fortran http://www.netlib.org/benchmark/linpackjava/ Java --> (javac)--> Downloadable Universal Bytecodes --> (Java Interpreter) --> Native Machine Code Just in Time Compilers speed this up by factor of 10 However Language can be efficiently compiled with "native compilers" Java ----> (native compiler) ---> Native (for Particular Machine) Code Lots of Interesting Compiler issues for both compiled and scripted Java Isn't the Web hardware and software too slow to be interesting for HPCC? - II One can use "native classes" which is just a predownloaded library of optimized runtime routines which can be high performance compiled Java, C, C++, Fortran, HPF etc. modules invoked by interpreted or compiled Java This does NOT violate Web Philosophy in our opinion! Use Native Classes selectively for Compiler Runtime, Matrix Primitives, Image Processing and other engineering/science libraries, PDE primitives such as mesh generators, optimization as needed in resource management or applications Isn't the Web hardware and software too slow to be interesting for HPCC? - III Web Servers and HTTP are not as efficient as PVM/MPI daemons and their messaging but Technology is rapidly changing -- HTTP-NG and new Java Servers will improve and further allow customization of services to HPCC with high performance when necessary Don't customize now as Web Technology not stable enough yet! Deploy Web technology first in education and in program development where high functionality of "Web Productivity Environment" is more important than performance Then run production in classic "bare-bones" HPCC environment Isn't the Web hardware and software too slow to be interesting for HPCC? - IV Internet is quite slow and getting slower but in fact many Web activities focus on IntraNets -- domain and perhaps geographically specialized hardware running pervasive Web Softwate vBNS and I-Way or ATM connected PC/Workstation clusters are our typical targets as HPCC IntraNets Superficially one can state goal as adding to the distributed computing model of the Web, the HPCC lessons and algorithms needed for high performance and tight synchronization of multiple servers and clients (Web is typically loose coarse grained coupling). This is worth doing as Web has excellent productivity software Let us Examine Issues with an Example -- "HPF on the Web" - I http://kestrel1.npac.syr.edu:3000/hpf-demo/ (Kivanc Dincer) Allows one to specify program from Web Client, Invoke HPF Compiler and excute on a chosen set of networked Workstations Implemented as a network of HTTPD Web Servers using CGI scripts which replace PVM daemons and invoke communication implemented by modifying PVM software Supports HPF and Global Arrays (Chemistry full matrix primitives developed at Pacific NorthWest Lab) Will support MPI and some of fuller NWChem package Will be used in Virtual Workshop (Cornell) and Fox's introductory computational Science class this fall CPS615 This is Web Programming Lab Technology Naturally link in manuals and tutorial material Let us Examine Issues with an Example -- "HPF on the Web" - II Have implemented a large(16) number of Java Applets interfacing to SDDF (Self-Defining Data Format), provided by Pablo Performance Analysis Environment, developed at UIUC by CRPC Associate Dan Reed Running Node Program --> SDDF Performance Monitoring Data --> Web server which can be accessed by full set of Web Tools including Java Applet (Real-Time or Batch) Displays Store SDDF data in Web-linked databases see: http://www.npac.syr.edu/users/dincer/pablo/ Will add Java "wrapper" to HPF data-structures so can use Java for scientific visualization of applications that run in HPF This illustrates how use of the rich Web information improves HPCC programming environment for easy linkage of databases and logging and display of scientific and performance visualization Issues in Use of Web Servers as a Compute Net - I In "WebWindows" Approach one naturally gets a Web Server and Client on every node Automatic in JavaOS (NT/UNIX "replacement") Web is "server-server" and not a "client-server" architecture Several emerging technologies Jigsaw (30,000 line Java Server from MIT) Habanero and other Java Collaboration technologies JRI (Java Runtime Interface) from Netscape hides changes in Java World Java IDL links to Corba and JDBC to (all) databases Java RMI -- Remote Method Invocation and Object Serialization are distributed computing technologies from JavaSoft JavaBeans is coarse grain object (potential basic dataflow module in distributed/parallel computing supporting standardized input/output) interoperable with Visual Basic, Borland Delphi, OpenDoc, OLE, CORBA etc. Issues in Use of Web Servers as a Compute Net - II Build initial experiments conservatively so insensitive to rapid evolution of Web Note Problem Solving Environments and "Forces Modelling" (Human/Instrument in the loop) applications require integration of computing and collaboration Java Servers (merge Jigsaw and Habanero!) provide this Succesful examples: RSA-130 Factoring on the Web (embarassingly parallel) completed (NPAC, Boston, BellCore) NEOS (Argonne) and Netsolve (Tennessee) NCSA Biology Workbench uses classic CGI Web to integrate many biology simulation packages Applications of Java for Visualization/GUI Builder Java is a convenient User Interface builder which allows one to develop quickly customized Interfaces See Screendumps of a distributed computing environment built for NASA 4D data assimilation Allows mapping and linkage of programs, datasets and machines together This gives AVS and Khoros like environments As part of black hole Grand Challenge, we are designing an interface to adaptive mesh (AMR) "Problem Solving environment" Remarks on HPJava -- Data Parallel Java - I As Java lies "inbetween" Fortran and C++, one can expect that data parallel Java can learn from corresponding HPF and HPC++ studies "Parallel Compiler Runtime Consortium" produced a very rough draft http://www.npac.syr.edu/users/gcf/hpjava3.html Java does not support templates and STL approach of C++ not so natural Need to recognize that performance of Objects in Java poorer than that of "simple types" Java spans high level interpreted objects to low level optimally compiled "simple types" Remarks on HPJava -- Data Parallel Java - II We have proposed an approach which uses native classes for "compiler runtime" and follows an HPF style with an interpreted front-end like Matlab or APL e.g. A = HParray.matmul(B,C) Technically Generalizes HPF Interpreter we prototyped in 1993 Interpreters and objects are great as long as "coarse-grain" i.e. arrays not array-elements This leads to Java wrappers invoked by HPF-style Java(Script) interpreter which interfaces to native HPF or other implementations. e.g. access HPF array Ahpf elements from Java with wrapper object A HParray A = new HParrayConstructor("Ahpf"); A.grabelement(1,100)