HPcc as High Performance Commodity Computing Geoffrey C. Fox, Wojtek Furmanski gcf@npac.syr.edu, furm@npac.syr.edu World Wide Web: http://www.npac.syr.edu Abstract: We review the growing power and capability of commodity computing and communication technologies driven by the World Wide Web's distributed information model. We use a three tier model with largely independent clients connected to a distributed network of severs. These host various services including object and relational databases and of course parallel and sequential computing. High performance requires concurrency at both the middle server tier and in the back end services. This approach can be applied to both metacomputing and to provide improved parallel programming environments. We give several examples; web-based user interfaces; the use of Java as a scientific computing language; integration of interpreted and compiled environments; visual component based programming; distributed simulation; and the role of databases, COBRA and collaborative systems. We suggest that establishment of a set of frameworks which will be the necessary standard interfaces between the users and HPcc Services. [Image] Figure 1: Industry View of Enterprise Computing Three Tier High Performance Commodity Computing We start with a modern industry view of commodity computing with the three tiers shown in fig 1. Here we have customizable client and middle tier systems accessing "traditional" back end services such as relational and object databases. A set of standard interfaces allows a rich set of custom applications to build with appropriate client and middleware software. As indicated on figure, both these two layers can use web technology such as Java and Javabeans, distributed objects with COBRA and standard interfaces such as JDBC (Java Database Connectivity). There are of course no rigid solutions and one can get "traditional" client server solutions by collapsing two of the layers together. For instance with database access, one gets a two tier solution by either incorporating custom code into the "thick" client or in analogy to Oracle's PL/SQL, compile for better performance the customized database access and incorporate with the back end server. The latter like the general 3 tier solution, supports "thin" clients such as the currently popular network computer. We wish to use this architecture as the basis of our HPcc environment. We need to naturally incorporate (essentially) all services of the commodity web and to use it's protocols and standards wherever possible. This requires us to enhance certain services to get higher performance and to incorporate new capabilities such as high-end visualization (e.g. CAVE's) or massively parallel systems where needed. If successful we will build an HPcc environment that offers the evolving functionality of commodity systems without significant re-engineering as advances in hardware and software lead to new and better commodity products. HPcc, shown in fig 2, extends fig 1 in two natural ways. Firstly the middle tier is promoted to a distributed network of servers; in the "purest" model these are Java Web Servers but obviously any protocol compatible server is possible. This middle tier extension provides networked servers for many different capabilities (increasing functionality) but also multiple servers to increase performance on an given service. The use of high functionality but modest performance communication protocols and interfaces at the middle tier limits the performance levels that can be reached in this fashion. However this first step gives a scaling, if necessary parallel (in terms of multiple servers) HPcc implementation which includes all commodity services such as databases, object services, transaction processing and collaboratory. The next step is only applied to those services with insufficient performance. Naively we "just" replace an existing back end (third tier) implementation of a commodity service by its natural HPcc version. Sequential or socket based messaging distributed simulations are replaced by MPI (or equivalent) implementations on low latency high band widths dedicated parallel machines. These could be specialized architectures or "just" clusters of workstations. Note that with the right high performance software and network connectivity, workstations can be used at tier three just as the popular "LAN" consolidation" use of parallel machines like the IBM SP-2, corresponds to using these in the middle tier. Further a "middle tier" compute or database server could of course deliver its services using the same or different machine from the server. These caveats illustrate that like many concepts there will be times when the clean architecture of fig 2 will become confused. In particular the physical realization does not reflect the logical architecture shown in fig 2. Note that we have dropped "communications" referred to in the classic HPCC acronym. This is not because it is unimportant but rather because a commodity approach to high performance networking is already being adopted. Here we focus on high level services such as programming, data access and visualization which we abstract to the rather wishwashy "computing" in the HPcc acronym. [Image] Figure 2:HPcc Architecture