High Performance Commodity Computing Geoffrey C. Fox, and Wojtek Furmanski Northeast Parallel Architectures Center, 111 College Place, Syracuse University, Syracuse NY gcf, furm@npac.syr.edu Introduction In this paper, we describe an approach to high performance computing which makes extensive use of commodity technologies. In particular, we exploit web technology, distributed objects and Java. The use of commodity hardware (workstation and PC based MPP's) and operating systems (UNIX, Linux and Windows NT) is relatively well established. We propose extending this strategy to the programming and runtime environments supporting developers and users of both parallel computers and large scale distributed systems. We suggest that this will allow one to build systems that combine the functionality and attractive user environments of modern enterprise systems with delivery of high performance in those application components that need it. Critical to our strategy is the observation that HPCC applications are very complex but typically only require high performance in parts of the problem. These parts are dominant when measured in terms of compute cycles or data-points but often a modest part of the problem if measured in terms of lines of code or other measures of implementation effort. Thus rather than building such systems heroically from scratch, we suggest starting with a modest performance but user friendly system and then selectively enhancing performance when needed. In section 2, we describe key relevant concepts that are emerging in the innovative technology cauldron induced by the merger of multiple approaches to distributed objects and web system technologies. This cauldron is largely fueled by development of corporate Intranets and broad based Internet applications including electronic commerce and multimedia. We define the "Pragmatic Object Web" approach which recognizes that there is not a single "best" approach but several co-existing technology bundles within an object based web. In particular, CORBA (Corporate Coalition), COM (Microsoft), Javabeans/RMI (100% pure Java), and XML/WOM/DOM (from World Wide Web Consortium) have different tradeoffs. One can crudely characterize them as the most general, the highest performance, the most elegant and simplest distributed object models respectively. This merger of web and distributed object capabilities is creating a remarkably powerful distributed systems architecture. However the multiple standards -- each with critical capabilities -- implies that one cannot choose a single approach but rather must pragmatically pick and choose from diverse interoperating systems. Another key community concept is that of a multi-tier enterprise system where one no longer expects a simple client server system. Rather clients interact with a distributed system of servers, from which information is created by the interactions of modular services, such as the access to and filtering of data from a database. In the Intranet of a modern corporation, these multiple servers reflected both the diverse functionality and geographical distribution of the components of the business information ecosystem. It hs been estimated that typical Intranets support around 50 distinct applications whose integration is an area of great current interest. This middle tier of distributed linked servers and services gives new Operating System challenges and is current realization of the WebWindows concept we described in an earlier RCI article. Note that Java is often the language of choice for building this tier but the object model and communication protocols can reflected any of the different standards CORBA, COM, Java or WOM. The linked continuum of servers shown in fig. 1, reflects the powerful distributed information integration capabilities of the Pragmatic Object Web. Fig.1: Continuum tier multi-server model In section 3, we present our basic approach to achieving high performance within the pragmatic object web(POW) multi-tier model. We make the simple observation that high performance is not always needed but rather that one needs hybrid systems combining modest performance high functionality components of the commodity Intranet with selected high performance enhancements. We suggest a multi-tier approach exploiting the separation between invocation and implementation of a data transfer that is in fact natural in modern publish-subscribe messaging models. We illustrate this HPcc -- High Performance commodity computing -- approach with a Quantum Monte Carlo application integrating NPAC's WebFlow client and middle tier technology with Globus as the back end high performance subsystem. In the following sections, we refine these basic ideas. In section 4, we describe the natural POW building block JWORB -- a server written in Java which supports all 4- object models. In particular JWORB builds its Web Services in terms of basic CORBA capabilities. We present performance measurements which emphasis the need to enhance the commodity tier when high performance messaging is needed. We discuss how JWORB allows us to develop HPCC componentware by using the Javabean visual- computing model on top of this infrastructure. We describe application integration and multidisciplinary problems in this framework. In section 5, we discuss the new DMSO (Distributed Modeling and Simulation Office) HLA (High Level Architecture) and RTI (Run Time Infrastructure) standards. These can naturally be incorporated into HPcc, giving the WebRTI runtime and WebHLA distributed object model. These concepts suggest a novel approach to general metacomputing, built in terms of a coarse grain event based runtime to manage and schedule resources. This defines a HPcc "virtual machine" which is used to define the coarse grain distributed structure of applications. Finally in section 6, we present Java Grande or the application of web technologies to the finer grain aspects of scientific computing -- including the use of the Java language to express sequential and parallel scientific kernels. 2. Pragmatic Object Web and Commodity Systems DcciS - Distributed commodity computing and information System Why use Java for Servers Multi tier extending to continuum of servers (server on client) COM CORBA RMI XML Java servlet versus CGI Define Java distributed object model with standard name Discuss Enterprise Javabeans and approach to object data bases in software 3. Hybrid High Performance Systems Publish Subscribe Model and "complicated picture" Multi disciplinary example Globus/WebFlow/Nanomaterials example Database and NT support natural in HPcc 4. Servers, Distributed Objects and HPCC Componentware JWORB architecture and performance -- discuss "ultimate" and even this too small Compare COM performance Haupt CORBAization projects Application Integration and Multidisciplinary problems from distributed object Point of view (Chapter 3 discussed high performance issues) Javabeans and componentware 5. WebHLA Commodity Modeling and Simulation Environment HLA and RTI described WebHLA and JWORB Management Model HLA resource management Java framework for computing -- seamless computing Metacomputing based on WebHLA for coarse grain components SBA -- Simulation based Acquisition as HLA "multidisciplinary" application Revisit WebFlow in terms of SBA 6. Java Grande JavaMPI HPJava Java Grande Numerics Issues Compiler issues Java Grande RMI/ scaling Java VM