HPcc as High Performance Commodity Computing

                      Geoffrey C. Fox, Wojtek Furmanski

                     gcf@npac.syr.edu, furm@npac.syr.edu

                   World Wide Web: http://www.npac.syr.edu


Abstract:

We review the growing power and capability of commodity computing and
communication technologies driven by the World Wide Web's distributed
information model. We use a three tier model with largely independent
clients connected to a distributed network of severs. These host various
services including object and relational databases and of course parallel
and sequential computing. High performance requires concurrency at both the
middle server tier and in the back end services. This approach can be
applied to both metacomputing and to provide improved parallel programming
environments. We give several examples; web-based user interfaces; the use
of Java as a scientific computing language; integration of interpreted and
compiled environments; visual component based programming; distributed
simulation; and the role of databases, COBRA and collaborative systems. We
suggest that establishment of a set of frameworks which will be the
necessary standard interfaces between the users and HPcc Services.

                                   [Image]
               Figure 1: Industry View of Enterprise Computing

Three Tier High Performance Commodity Computing

We start with a modern industry view of commodity computing with the three
tiers shown in fig 1. Here we have customizable client and middle tier
systems accessing "traditional" back end services such as relational and
object databases. A set of standard interfaces allows a rich set of custom
applications to build with appropriate client and middleware software. As
indicated on figure, both these two layers can use web technology such as
Java and Javabeans, distributed objects with COBRA and standard interfaces
such as JDBC (Java Database Connectivity). There are of course no rigid
solutions and one can get "traditional" client server solutions by
collapsing two of the layers together. For instance with database access,
one gets a two tier solution by either incorporating custom code into the
"thick" client or in analogy to Oracle's PL/SQL, compile for better
performance the customized database access and incorporate with the back end
server. The latter like the general 3 tier solution, supports "thin" clients
such as the currently popular network computer.

We wish to use this architecture as the basis of our HPcc environment. We
need to naturally incorporate (essentially) all services of the commodity
web and to use it's protocols and standards wherever possible. This requires
us to enhance certain services to get higher performance and to incorporate
new capabilities such as high-end visualization (e.g. CAVE's) or massively
parallel systems where needed. If successful we will build an HPcc
environment that offers the evolving functionality of commodity systems
without significant re-engineering as advances in hardware and software lead
to new and better commodity products.

HPcc, shown in fig 2, extends fig 1 in two natural ways. Firstly the middle
tier is promoted to a distributed network of servers; in the "purest" model
these are Java Web Servers but obviously any protocol compatible server is
possible. This middle tier extension provides networked servers for many
different capabilities (increasing functionality) but also multiple servers
to increase performance on an given service. The use of high functionality
but modest performance communication protocols and interfaces at the middle
tier limits the performance levels that can be reached in this fashion.
However this first step gives a scaling, if necessary parallel (in terms of
multiple servers) HPcc implementation which includes all commodity services
such as databases, object services, transaction processing and
collaboratory. The next step is only applied to those services with
insufficient performance. Naively we "just" replace an existing back end
(third tier) implementation of a commodity service by its natural HPcc
version. Sequential or socket based messaging distributed simulations are
replaced by MPI (or equivalent) implementations on low latency high band
widths dedicated parallel machines. These could be specialized architectures
or "just" clusters of workstations. Note that with the right high
performance software and network connectivity, workstations can be used at
tier three just as the popular "LAN" consolidation" use of parallel machines
like the IBM SP-2, corresponds to using these in the middle tier. Further a
"middle tier" compute or database server could of course deliver its
services using the same or different machine from the server. These caveats
illustrate that like many concepts there will be times when the clean
architecture of fig 2 will become confused. In particular the physical
realization does not reflect the logical architecture shown in fig 2.

Note that we have dropped "communications" referred to in the classic HPCC
acronym. This is not because it is unimportant but rather because a
commodity approach to high performance networking is already being adopted.
Here we focus on high level services such as programming, data access and
visualization which we abstract to the rather wishwashy "computing" in the
HPcc acronym.

                                   [Image]
                         Figure 2:HPcc Architecture