HPcc as High Performance Commodity Computing
Geoffrey C. Fox, Wojtek Furmanski
gcf@npac.syr.edu, furm@npac.syr.edu
World Wide Web: http://www.npac.syr.edu
Abstract:
We review the growing power and capability of commodity computing and communication technologies largely driven by commercial distributed information systems. CORBA, Microsoft’s COM, Javabeans, and less sophisticated web and networks approaches build such systems. These can all be abstracted to a three-tier model with largely independent clients connected to a distributed network of severs. These host various services including object and relational databases and of course parallel and sequential computing. High performance can be obtained by combining concurrency at the middle server tier with optimized parallel back end services. The resultant system combines the needed performance for large-scale HPCC applications with the rich functionality of commodity systems. Further the architecture with distinct interface, server and specialized service implementation layers, naturally allows advances in each area to be easily incorporated. We show that this approach can be applied to both metacomputing and to provide improved parallel programming environments. We give several examples; web-based user interfaces; the use of Java as a scientific computing language; integration of interpreted and compiled environments; visual component based programming; distributed simulation; and the role of databases, CORBA and collaborative systems. We suggest that establishment of a set of frameworks, which will be the necessary standard interfaces between the users and HPcc Services.
1: Introduction
We believe that industry and the loosely organized worldwide collection of (freeware) programmers is developing a remarkable new software environment of unprecedented quality and functionality. We call this DcciS - Distributed commodity computing and information System. We believe that this can benefit HPCC in several ways and allow the development of both more powerful parallel programming environments and new distributed metacomputing systems. In the second section, we define what we mean by commodity technologies and explain the different ways that they can be used in HPCC. In the third and critical section, we define an emerging architecture of DcciS in terms of a conventional 3 tier commercial computing model. In the fourth section, we discuss the critical research issue - can high performance systems - called HPCC or High Performance Commodity Computing -be built on top of DcciS. In the final section, we illustrate the advantages and features of the proposed HPcc approach by discussing several applications. These include the linkage of collaboration to HPCC for computational steering and multi-disciplinary engineering; the emerging object - web approach to distributed simulation; integration of compiled and interpreted environments; visual interfaces and HPCC component based programming paradigm; and finally the use of Java as a scientific programming language.
2: Commodity Technologies and their use in HPCC
The last three years have seen an unprecedented level of innovation and progress in commodity technologies driven largely by the new capabilities and business opportunities of the evolving worldwide network. The web is not just a document access system supported by the somewhat limited HTTP protocol. Rather it is the distributed object technology which can build general multi-tiered enterprise intranet and internet applications. CORBA is turning from a sleepy heavyweight standards initiative to a major competitive development activity that battles with COM and Javabeans to be the core distributed object technology.
There are many driving forces and many aspects to DcciS but we suggest that the three critical technology areas are the web, distributed objects and databases. These are being linked and we see them subsumed in the next generation of "object-web" technologies, which is illustrated by the recent Netscape and Microsoft version 4 browsers. Databases are older technologies but their linkage to the web and distributed objects, is transforming their use and making them more widely applicable.
In each commodity technology area, we have impressive and rapidly improving software artifacts. However equally and perhaps more importantly, we have a set of open interfaces enabling distributed modular software development. These interfaces are at both low and high levels and the latter generate a very powerful software environment in which large preexisting components can be quickly integrated into new applications. We suggest that HPCC applications can also benefit in the same way from this new level of productivity. Thus it is essential to build HPCC environments in a way that naturally inherent all the commodity capabilities. This is the goal of NPAC’s HPcc activity.
3: Three Tier High Performance Commodity Computing
Fig. 1: Industry 3-tier view of enterprise Computing
We start with a common modern industry view of commodity computing with the three tiers shown in fig 1. Here we have customizable client and middle tier systems accessing "traditional" back end services such as relational and object databases. A set of standard interfaces allows a rich set of custom applications to be built with appropriate client and middleware software. As indicated on figure, both these two layers can use web technology such as Java and Javabeans, distributed objects with CORBA and standard interfaces such as JDBC (Java Database Connectivity). There are of course no rigid solutions and one can get "traditional" client server solutions by collapsing two of the layers together. For instance with database access, one gets a two tier solution by either incorporating custom code into the "thick" client or in analogy to Oracle’s PL/SQL, compile the customized database access code for better performance and incorporate the compiled code with the back end server. The latter like the general 3-tier solution, supports "thin" clients such as the currently popular network computer.
The commercial architecture is evolving rapidly and is exploring several approaches which co-exist in today’s (and any realistic future) distributed information system. The most powerful solutions involve distributed objects. There are three important commercial object systems - CORBA, COM and Javabeans. These have similar approaches and it is not clear if the future holds a single such approach or a set of interoperable standards. CORBA is a distributed object standard managed by the OMG (Object Management Group) comprised of 700 companies. COM is Microsoft’s distributed object technology initially aimed at Window machines. Javabeans (augmented with RMI and other Java 1.1 features) is the "pure Java" solution - cross platform but unlike CORBA, not cross-language! Legion is an example of a major HPCC focused distributed object approach; currently it is not built on top of one of the three major commercial standards. The HLA/RTI standard for distributed simulations in the forces modeling community is another important domain specific distributed object system. It appears to be moving to integration with CORBA standards. Although a distributed object approach is attractive, most network services today are provided in a more ad-hoc fashion. In particular today’s web uses a "distributed service" architecture with HTTP middle tier servers invoking via the CGI mechanism, C and Perl programs linking to databases, simulations or other custom services. There is a trend toward the use of Java servers with the servlet mechanism for the services. This is certainly object based but does not necessarily implement the standards compiled by CORBA, COM or Javabeans. However, this illustrates an important evolution as the web absorbs object technology with the evolution:
HTTP --> Java Sockets ...........-->IIOP or RMI
(Low Level network standard) (High level network standard)
Perl CGI Script --> Java Program --> Javabean distributed object.
As an example consider the evolution of networked databases. Originally these were client-servers with a proprietary network access protocol. Web linked databases produced a three tier distributed service model with an HTTP server using a CGI program (running Perl for instance) to access the database at the backend. Today we can build databases as distributed objects with a middle tier Javabean using JDBC to access the backend database.
Today we see a mixture of distributed service and distributed object architectures. CORBA, COM, Javabean, HTTP Server + CGI, Java Server and Servlets, Databases with specialized network accesses, and other services co-exist in a heterogeneous environment with common themes but disparate implementations. We believe that there will be significant convergence as a more uniform architecture is in everyone’s best interest. We also believe that the resultant architecture will be integrated with the web so that the latter will exhibit distributed object architecture. Most of our remarks are valid for all these approaches to a distributed set of services. Our ideas are however easiest to understand if one assumes an underlying architecture which is a CORBA or Javabean distributed object model integrated with the web.
We wish to use this service/object evolving 3-tier commodity architecture as the basis of our HPcc environment. We need to naturally incorporate (essentially) all services of the commodity web and to use its protocols and standards wherever possible. We insist on adopting the architecture of commodity distribution systems as complex HPCC problems require the rich range of services offered by the broader community systems. Perhaps we could "port" commodity services to a custom HPCC system but this would require continued upkeep with each new upgrade of the commodity service. By adopting the architecture of the commodity systems, we make it easier to track their rapid evolution and expect it will give high functionality systems, which will naturally track the evolving Web/distributed object worlds. This requires us to enhance certain services to get higher performance and to incorporate new capabilities such as high-end visualization (e.g. CAVE’s) or massively parallel systems where needed. This is the essential research challenge for HPcc for we must not only enhance performance where needed but do it in a way that is preserved as we evolve the basic commodity systems. We certainly have not demonstrated this clearly but we have a simple strategy that we will elaborate later. Thus we exploit the three-tier structure and keep HPCC enhancements in the third tier, which is inevitability, the home of specialized services in the object-web architecture. This strategy isolates HPCC issues from the control of interface issues in the middle layer. If successful we will build an HPcc environment that offers the evolving functionality of commodity systems without significant re-engineering as advances in hardware and software lead to new and better commodity products.
Figure 2:HPcc Illustrated with Interoperating Hybrid Server Architecture
HPcc, shown in fig 2, extends fig 1 in two natural ways. Firstly the middle tier is promoted to a distributed network of servers; in the "purest" model these are CORBA/COM/Javabean object-web servers but obviously any protocol compatible server is possible. This middle tier extension provides networked servers for many different capabilities (increasing functionality) but also multiple servers to increase performance on an given service. The use of high functionality but modest performance communication protocols and interfaces at the middle tier limits the performance levels that can be reached in this fashion. However this first step gives a scaling, if necessary parallel (in terms of multiple servers) HPcc implementation which includes all commodity services such as databases, object services, transaction processing and collaboratories. The next step is only applied to those services with insufficient performance. Naively we "just" replace an existing back end (third tier) implementation of a commodity service by its natural HPCC version. Sequential or socket based messaging distributed simulations are replaced by MPI (or equivalent) implementations on low latency high bandwidth dedicated parallel machines. These could be specialized architectures or "just" clusters of workstations. Note that with the right high performance software and network connectivity, workstations can be used at tier three just as the popular "LAN" consolidation" use of parallel machines like the IBM SP-2, corresponds to using these in the middle tier. Further a "middle tier" compute or database server could of course deliver its services using the same or different machine from the server. These caveats illustrate that like many concepts there will be times when the clean architecture of fig 2 will become confused. In particular the physical realization does not necessarily reflect the logical architecture shown in fig 2.
.
Glossary
CORBA
(Common Object Request Broker Architecture)An approach to distributed object development cross platform cross language by a broad industry group the OMG. CORBA specifies basic services (such as naming, trading, persistence) the protocol IIOP used by communicating ORBS, and is developing higher level facilities - object architectures for specialized domains such as banking.
COM (Common Object Model)
Microsoft’s windows object model, which is being extended to distributed systems and multi-tiered architectures. ActiveX controls are an important class of COM objects.
HPcc (High Performance commodity computing)
NPAC project to develop a commodity computing based high performance computing software environment. Note that we have dropped "communications" referred to in the classic HPCC acronym. This is not because it is unimportant but rather because a commodity approach to high performance networking is already being adopted. Here we focus on high level services such as programming, data access and visualization that we abstract to the rather wishy-washy "computing" in the HPcc acronym.
HPCC (High Performance Computing and Communication)
Originally a formal federal initiative but even after this ended in 1996, this term is used to describe the field devoted to solving large-scale problems with powerful computers and networks.
HTTP (Hyper Text Transport Mechanism)
A stateless transport protocol allowing control information and data to be transmitted between web clients and servers.
IIOP (Internet Inter Orb Protocol)
A stateful protocol allowing CORBA ORB’s to communicate with each other, both the request for a desired service and the returned result.
JDBC (Java Data Base Connection)
A set of interfaces (Java methods and constants) in the Java 1.1 enterprise framework, defining a uniform access to relational databases. JDBC calls from a client or server Java program link to a particular "driver" that converts these universal database access calls (establish a connection, SQL query, etc.) to particular syntax needed to access essentially any significant database.
Javabean
Part of the Java 1.1 enhancements defining design frameworks (particular naming conventions) and inter Javabean communication mechanisms for Java components with standard (Bean box) or customized visual interfaces (property editors). Enterprise Javabeans are Javabeans enhanced for server side operation with capabilities such as multi user support.
OMG (Object Management Group)
OMG is the organization of over 700 companies that is developing CORBA through a process of call for proposals and development of consensus standards.
ORB (Object Request Broker)
Used in both clients and services in CORBA to enable the remote access to objects. ORB’s are available from many vendors and communicate with IIOP.
RMI (Remote Method Invocation)
A somewhat controversial part of Java 1.1 in the enterprise framework which specifies the remote access to Java objects with a generalization of the UNIX RPC (Remote Procedure Call).
Web Client
Originally web clients displayed HTML and related pages but now they support Java Applets that can be programmed to give web clients the necessary capabilities to support general enterprise computing. The support of signed applets in recent browsers has removed crude security restrictions, which handicapped previous use of applets.
Web Servers
Originally Web Servers supported HTTP requests for information - basically HTML pages but included the invocation of general server side programs using the very simple but arcane CGI - Common Gateway Interface. A new generation of Java servers have enhanced capabilities including server side Java program enhancements (Servlets) and support of stateful permanent communication channels.