HPcc as High Performance Commodity Computing
Geoffrey C. Fox, Wojtek Furmanski
gcf@npac.syr.edu, furm@npac.syr.edu
World Wide Web: http://www.npac.syr.edu
Abstract:
We review the growing power and capability of commodity computing and communication technologies largely driven by commercial distributed information systems. These systems are built from CORBA, Microsoft’s COM, Javabeans, and less sophisticated web and networked approaches. One can abstract these to a three-tier model with largely independent clients connected to a distributed network of servers. The latter host various services including object and relational databases and of course parallel and sequential computing. High performance can be obtained by combining concurrency at the middle server tier with optimized parallel back end services. The resultant system combines the needed performance for large-scale HPCC applications with the rich functionality of commodity systems. Further the architecture with distinct interface, server and specialized service implementation layers, naturally allows advances in each area to be easily incorporated. We show that this approach can be applied to both metacomputing and to provide improved parallel programming environments.
1: Introduction
We believe that industry and the loosely organized worldwide collection of (freeware) programmers is developing a remarkable new software environment of unprecedented quality and functionality. We call this DcciS - Distributed commodity computing and information System. We believe that this can benefit HPCC in several ways and allow the development of both more powerful parallel programming environments and new distributed metacomputing systems. In the second section, we define what we mean by commodity technologies and explain the different ways that they can be used in HPCC. In the third and critical section, we define an emerging architecture of DcciS in terms of a conventional 3 tier commercial computing model. In an accompanying paper (ref. [31]), we discuss the critical research issue - can high performance systems - called HPcc or High Performance Commodity Computing -be built on top of DcciS. Here we just summarize in final section, the expected steps in the CORBA model for establishing HPcc as a community framework and CORBA facility.
2: Commodity Technologies and their use in HPCC
The last three years have seen an unprecedented level of innovation and progress in commodity technologies driven largely by the new capabilities and business opportunities of the evolving worldwide network. The web is not just a document access system supported by the somewhat limited HTTP protocol. Rather it is the distributed object technology which can build general multi-tiered enterprise intranet and internet applications. CORBA is turning from a sleepy heavyweight standards initiative to a major competitive development activity that battles with COM and Javabeans to be the core distributed object technology. CORBA battles with RMI but it will be likely integrated with JavaBeans
There are many driving forces and many aspects to DcciS but we suggest that the three critical technology areas are the web, distributed objects and databases. These are being linked and we see them subsumed in the next generation of "object-web" technologies, which is illustrated by the recent Netscape and Microsoft version 4 browsers. Databases are older technologies but their linkage to the web and distributed objects, is transforming their use and making them more widely applicable.
In each commodity technology area, we have impressive and rapidly improving software artifacts. As examples, we have at the lower level the collection of standards and tools such as HTML, HTTP, MIME, IIOP, CGI, Java, JavaScript, Javabeans, CORBA, COM, ActiveX, VRML, new powerful object broker ORB’s, dynamic Java servers and clients including applets and servlets. At a higher level collaboration, security, commerce, multimedia and other applications/services are rapidly developing using standard interfaces or frameworks/facilities. This emphasizes that equally and perhaps more importantly than raw technologies, we have a set of open interfaces enabling distributed modular software development. These interfaces are at both low and high levels and the latter generate a very powerful software environment in which large preexisting components can be quickly integrated into new applications. We believe that there are significant incentives to build HPCC environments in a way that naturally inherits all the commodity capabilities so that HPCC applications can also benefit from the impressive productivity of commodity systems. NPAC’s HPcc activity is designed to demonstrate that this is possible and useful so that one can achieve simultaneously both high performance and the functionality of commodity systems.
Note that commodity technologies can be used in several ways. This article concentrates on exploiting the natural architecture of commodity systems but more simply, one could just use a few of them as "point solutions". This we can term a "tactical implication" of the set of the emerging commodity technologies and illustrate below with some examples:
However probably more important is the strategic implication of DcciS which implies certain critical characteristics of the overall architecture for a high performance parallel or distributed computing system. First we note that we have seen over the last 30 years many other major broad-based hardware and software developments -- such as IBM business systems, UNIX, Macintosh/PC desktops, video games -- but these have not had profound impact on HPCC software. However we suggest the DcciS is different for it gives us a world-wide/enterprise-wide distributing computing environment. Previous software revolutions could help individual components of a HPCC software system but DcciS can in principle be the backbone of a complete HPCC software system -- whether it be for some global distributed application, an enterprise cluster or a tightly coupled large scale parallel computer. In a nutshell, we suggest that "all we need to do" is to add "high performance" (as measured by bandwidth and latency) to the emerging commercial DcciS systems. This "all we need to do" may be very hard but by using DcciS as a basis we inherit a multi-billion investment and what in many respects is the most powerful productive software environment ever built. Thus we should look carefully into the design of any HPCC system to see how it can leverage this commercial environment.
3: Three Tier High Performance Commodity Computing
Fig. 1: Industry 3-tier view of enterprise Computing
We start with a common modern industry view of commodity computing with the three tiers shown in fig 1. Here we have customizable client and middle tier systems accessing "traditional" back end services such as relational and object databases. A set of standard interfaces allows a rich set of custom applications to be built with appropriate client and middleware software. As indicated on figure, both these two layers can use web technology such as Java and Javabeans, distributed objects with CORBA and standard interfaces such as JDBC (Java Database Connectivity). There are of course no rigid solutions and one can get "traditional" client server solutions by collapsing two of the layers together. For instance with database access, one gets a two tier solution by either incorporating custom code into the "thick" client or in analogy to Oracle’s PL/SQL, compile the customized database access code for better performance and incorporate the compiled code with the back end server. The latter like the general 3-tier solution, supports "thin" clients such as the currently popular network computer.
The commercial architecture is evolving rapidly and is exploring several approaches which co-exist in today’s (and any realistic future) distributed information system. The most powerful solutions involve distributed objects. There are three important commercial object systems - CORBA, COM and Javabeans. These have similar approaches and it is not clear if the future holds a single such approach or a set of interoperable standards. CORBA is a distributed object standard managed by the OMG (Object Management Group) comprised of 700 companies. COM is Microsoft’s distributed object technology initially aimed at Window machines. Javabeans (augmented with RMI and other Java 1.1 features) is the "pure Java" solution - cross platform but unlike CORBA, not cross-language! Legion is an example of a major HPCC focused distributed object approach; currently it is not built on top of one of the three major commercial standards. The HLA/RTI standard for distributed simulations in the forces modeling community is another important domain specific distributed object system. It appears to be moving to integration with CORBA standards. Although a distributed object approach is attractive, most network services today are provided in a more ad-hoc fashion. In particular today’s web uses a "distributed service" architecture with HTTP middle tier servers invoking via the CGI mechanism, C and Perl programs linking to databases, simulations or other custom services. There is a trend toward the use of Java servers with the servlet mechanism for the services. This is certainly object based but does not necessarily implement the standards compiled by CORBA, COM or Javabeans. However, this illustrates an important evolution as the web absorbs object technology with the evolution:
HTTP --> Java Sockets --> IIOP or RMI
(Low Level network standard) (High level network standard)
Perl CGI Script --> Java Program --> Javabean distributed object.
As an example consider the evolution of networked databases. Originally these were client-server with a proprietary network access protocol. Web linked databases produced a three tier distributed service model with an HTTP server using a CGI program (running Perl for instance) to access the database at the backend. Today we can build databases as distributed objects with a middle tier Javabean using JDBC to access the backend database. Thus a conventional database is naturally evolving to the concept of managed persistent objects.
Today we see a mixture of distributed service and distributed object architectures. CORBA, COM, Javabean, HTTP Server + CGI, Java Server and Servlets, Databases with specialized network accesses, and other services co-exist in a heterogeneous environment with common themes but disparate implementations. We believe that there will be significant convergence as a more uniform architecture is in everyone’s best interest. We also believe that the resultant architecture will be integrated with the web so that the latter will exhibit distributed object architecture. Most of our remarks are valid for all these approaches to a distributed set of services. Our ideas are however easiest to understand if one assumes an underlying architecture which is a CORBA or Javabean distributed object model integrated with the web.
We wish to use this service/object evolving 3-tier commodity architecture as the basis of our HPcc environment. We need to naturally incorporate (essentially) all services of the commodity web and to use its protocols and standards wherever possible. We insist on adopting the architecture of commodity distribution systems as complex HPCC problems require the rich range of services offered by the broader community systems. Perhaps we could "port" commodity services to a custom HPCC system but this would require continued upkeep with each new upgrade of the commodity service. By adopting the architecture of the commodity systems, we make it easier to track their rapid evolution and expect it will give high functionality HPCC systems, which will naturally track the evolving Web/distributed object worlds. This requires us to enhance certain services to get higher performance and to incorporate new capabilities such as high-end visualization (e.g. CAVE’s) or massively parallel systems where needed. This is the essential research challenge for HPcc for we must not only enhance performance where needed but do it in a way that is preserved as we evolve the basic commodity systems. We certainly have not demonstrated clearly that this is possible but we have a simple strategy that we will elaborate in ref. [31]. Thus we exploit the three-tier structure and keep HPCC enhancements in the third tier, which is inevitability, the home of specialized services in the object-web architecture. This strategy isolates HPCC issues from the control or interface issues in the middle layer. If successful we will build an HPcc environment that offers the evolving functionality of commodity systems without significant re-engineering as advances in hardware and software lead to new and better commodity products.
Figure 2:Today’s Heterogeneous Interoperating Hybrid Server Architecture. HPcc involves adding to this system, high performance in the third tier.
Figure 2 elaborates fig. 1 in two natural ways. Firstly the middle tier is promoted to a distributed network of servers; in the "purest" model these are CORBA/COM/Javabean object-web servers but obviously any protocol compatible server is possible. This middle tier layer includes not only networked servers with many different capabilities (increasing functionality) but also multiple servers to increase performance on an given service. The use of high functionality but modest performance communication protocols and interfaces at the middle tier limits the performance levels that can be reached in this fashion. However this first step gives a modest performance scaling, parallel (implemented if necessary, in terms of multiple servers) HPcc system which includes all commodity services such as databases, object services, transaction processing and collaboratories. The next step is only applied to those services with insufficient performance. Naively we "just" replace an existing back end (third tier) implementation of a commodity service by its natural HPCC high performance version. Sequential or socket based messaging distributed simulations are replaced by MPI (or equivalent) implementations on low latency high bandwidth dedicated parallel machines. These could be specialized architectures or "just" clusters of workstations. Note that with the right high performance software and network connectivity, workstations can be used at tier three just as the popular "LAN" consolidation" use of parallel machines like the IBM SP-2, corresponds to using parallel computers in the middle tier. Further a "middle tier" compute or database server could of course deliver its services using the same or different machine from the server. These caveats illustrate that like many concepts there will be times when the relatively clean architecture of fig 2 will become confused. In particular the physical realization does not necessarily reflect the logical architecture shown in fig 2. Figures 3 and 4 give examples of database and object services implemented in different ways in the commodity 3-tier approach.
Fig. 3: Some example implementations of an Oracle Database in the 3-tier Approach
Fig. 4: More examples of the 3-tier Approach
4: HPcc as a CORBA Facility
CORBA is defined as a suite of software layers with the architecture illustrated in fig. 5.
Fig. 5 Software Layers in CORBA
We see (currently 15) basic services such as naming and persistence layered below a set of general capabilities or horizontal facilities in the ComponentWare jargon. We suggest that HPcc is naturally placed here as it is designed to provide high performance to essentially any application area. The latter are seen as vertical or specialized facilities in fig. 5 that provide domain-specific distributed object support. Here we see mainstream commercial applications such as manufacturing, banking and mapping. The vertical and horizontal facilities are associated with frameworks that are the universal interfaces with which they are linked together and to user programs (or objects).
Note that CORBA currently supports only relatively simple computing models including "embarrassingly parallel" as in transaction processing or dataflow as in the CORBA workflow facility. The modeling and simulation community appears likely to evolve their HLA standard to a new vertical CORBA facility. HPcc therefore fills a gap and is defined as the HPCC (here we are using capital C’s) CORBA horizontal facility. In the following paragraph, we point out that this allows us to define a commercialization strategy for high performance computing technologies.
Academia and Industry should now experiment with the design and implementations of HPcc as a general framework for providing high performance CORBA services. Then one or more Industry led groups proposes HPcc specifications as a new horizontal facility. A process similar to the MPI or HPF forum activities leads to consensus and the definition of an HPcc facility standard. This then allows industry (and academia) to compete in the implementation of HPcc within an agreed interoperable standard.
Related Work and References
The base material on CORBA can be found on the OMG Web site [1]. This includes OMG Formal Documentation [2], Recently Adopted Specifications and The Technical Committee Work in Progress [3] which offers up-to-date on-line information on the individual RFPs and their adoption process. One such RFP of a particular relevance for this Chapter, CORBA Components [4], has been recently posted by OMG in response to the Position Paper [5] by IBM, Netscape, Oracle and SunSoft, with mid November '97 as the first submission deadline. It is expected that this activity will result in JavaBeans based ComponentWare model for CORBA. Primary source of information on JavaBeans is the JavaSoft Web site [6]. See also the recent O'Reilly book by Robert Englander [7]. A good recent reference on Microsoft COM (Component Object Model) is Microsoft's Press book by Dale Rogerson [8]. CORBA/COM integration model is specified in the Core CORBA 2.1 document [9], Chapters 14 (Interworking), 15 (COM/CORBA mapping) and 16 (OLE Automation/CORBA mapping). A good overview of CORBA/Java integration and the Object Web concepts can be found in the recent book Robert Orfali and Dan Harkey [10]. The two currently most popular commercial Java ORBs are: OrbixWeb by IONA [11] and VisiBroker for Java by Visigenic [12]. The first public domain ORBs became recently available such as JacORB [13] by University of Berlin, omniBroker by Olivetti and Oracle Labs [14] or Electra by Olsen & Associates [15]. These public domain ORB’s facilitate several ongoing research projects on using CORBA for reliable distributed or/and high performance computing which we list below.
Nile, a National Challenge Computing Project [16] develops distributed computing solution for the CLEO High Energy Physics experiment using a self-managing, fault-tolerant, heterogeneous system of hundreds of commodity workstations, with access to a distributed database in excess of about 100 terabytes. These resources are spread across the United States and Canada at 24 collaborating institutions. NILE is CORBA based and it uses the Electra ORB.
Douglas Schmidt, Washington University, conducts research on high performance implementations of CORBA [17], geared towards real-time image processing and telemedicine applications on workstation clusters over ATM. His high performance ORB - TAO [18] - based on optimized version of public domain IIOP implementation from SunSoft outperforms commercial ORB’s by factor 2-3. Steve Vinoski, IONA and Douglas Schmidt address current R&D topics on the use of CORBA for distributed computing in their C++ Report column [19]. Richard Muntz, UCLA, explores the use of CORBA for building large-scale object based data mining systems. His OASIS (Open Architecture Scientific Information System)[20] environment for scientific data analysis allows to store, retrieve, analyze and interpret selected datasets from a large collection of scientific information scattered across heterogeneous computational environments of Earth Science projects such as EOSDIS.
NPAC is developing a public domain Java based IIOP and HTTP server, WORB [21], with the alpha release planned by the end of '97 [21]. New anticipated developments in CORBA based distributed computing include emergent CORBA facilities in specialized areas such as Workflow [22] or Distributed Simulations [23].
An elementary overview of "classic" Web Technologies can be found in reference [24]. Reference [25] is one of a set of recent interesting Byte articles on COM, CORBA and Javabeans in the commercial 3-tier model. Reference [26] is the beginning of a compilation of input for defining interfaces for a horizontal CORBA HPcc facility or more precisely it is aimed at one aspect – a seamless interface for users to computing systems.
Reference [27] is a compilation of resources relevant to use of Java in Computational Science and Engineering and in particular has proceedings of two workshops in this area. References [28-31] are from NPAC and describe some aspects of HPcc activities there.
1. Object Management Group,
http://www.omg.org2. OMG Formal Documentation,
http://www.omg.org/library/specindx.htm3. OMG TC Work in Progress,
http://www.omg.org/library/schedule.htm4. "CORBA Component Model RFP",
http://www.omg.org/library/schedule/CORBA_Component_Model_RFP.htm
5. "CORBA Component Imperatives" - a position paper by IBM, Netscape,
Oracle and SunSoft,
http://www.omg.org/news/610pos.htm6. JavaBeans,
http://www.javasoft.com/beans/7. "Developing JavaBeans" by Robert Englander, O'Reilly & Associates, June
'97, ISBN: 1-56592-289-1.
8. "Inside COM - Microsoft's Component Object Model" by Dale Rogerson,
Microsoft Press, 1997, ISBN: 1-57231-349-8.
9. CORBA 2.0/IIOP Specification,
http://www.omg.org/corba/c2indx.htm10. "Client/Server Programming with Java and CORBA" by Robert Orfali and
Dan Harkey, Wiley, Feb'97, ISBN: 0-471-16351-1
11. OrbixWeb for Java from IONA,
http://www.iona.com12. VisiBroker for Java from Visigenic,
http://www.visigenic.com13. JacORB by Freie Universität Berlin,
http://www.inf.fu-berlin.de/~brose/jacorb/
14. omniORB2 by Olivetti and Oracle Research Laboratory
http://www.orl.co.uk/omniORB/omniORB.html
15. The Electra Object Request Broker, http://www.olsen.ch/~maffeis/electra.html
16. Nile: National Challenge Computing Project
http://www.nile.utexas.edu/17. "Research on High Performance and Real-Time CORBA" by Douglas Schmidt,
http://www.cs.wustl.edu/~schmidt/corba-research-overview.html
18. "Real-time CORBA with TAO (The ACE ORB)" by Douglas Schmidt,
http://www.cs.wustl.edu/~schmidt/TAO.html
19. "Object Interconnections" by Steve Vinoski, column in C++ Report,
http://www.iona.com/hyplan/vinoski/
20. E. Mesrobian, R. Muntz, E. Shek, S. Nittel, M. LaRouche, and M.
Krieger, "OASIS: An Open Architecture Scientific Information System,"
6th International Workshop on Research Issues in Data Engineering, New
Orleans, La., February, 1996. See also
http://techinfo.jpl.nasa.gov/JPLTRS/SISN/ISSUE36/MUNTZ.htm
21. WORB - Web Object Request Broker, http://osprey7.npac.syr.edu:1998/iwt98/projects/worb
22. Workflow Mangement Coalition,
http://www.aiai.ed.ac.uk/project/wfmc/23. High Level Architecture and Run-Time Infrastructure by DoD Modeling and
Simulation Office (DMSO),
http://www.dmso.mil/hla24. Geoffrey Fox, "Introduction to Web Technologies and Their Applications", Syracuse report SCCS-790.
http://www.npac.syr.edu/techreports/html/0750/abs-0790.html25. Article in August 1997 Byte on 3-tier commercial computing model.
http://www.byte.com/art/9708/sec5/art1.htm26. Mark Baker Portsmouth, Collection of Links relevant to HPcc Horizontal CORBA facility and seamless interfaces to HPCC computers,
http://www.sis.port.ac.uk/~mab/Computing-FrameWork/27. Compilation of References to use of Java in Computational Science and Engineering including proceeding of Syracuse (December 96) and Las Vegas (June 97) meetings.
http://www.npac.syr.edu/projects/javaforcse28. Geoffrey Fox and Wojtek Furmanski, "Petaops and Exaops: Supercomputing on the Web", IEEE Internet Computing, 1(2), 38-46 (1997);
http://www.npac.syr.edu/users/gcf/petastuff/petaweb29. Geoffrey Fox and Wojtek Furmanski, "Java for Parallel Computing and as a General Language for Scientific and Engineering Simulation and Modeling", Concurrency: Practice and Experience 9(6), 4135-426(1997). Web version in ref. [27].
30. Compilation of NPAC Activities in Web-based HPcc,
http://www.npac.syr.edu/projects/webspace/webbasedhpcc/31. Geoffrey Fox and Wojtek Furmanski, "Use of Commodity Technologies in a Computational Grid", chapter in book to be published by Morgan-Kaufmann and edited by Carl Kesselman and Ian Foster.
.
Glossary
CORBA (Common Object Request Broker Architecture)
An approach to cross-platform cross-language distributed object developed by a broad industry group, the OMG. CORBA specifies basic services (such as naming, trading, persistence) the protocol IIOP used by communicating ORBS, and is developing higher level facilities which are object architectures for specialized domains such as banking.(fig. 3)
COM (Common Object Model)
Microsoft’s windows object model, which is being extended to distributed systems and multi-tiered architectures. ActiveX controls are an important class of COM object, which implement the component model of software. The distributed version of COM used to be called DCOM.
ComponentWare
An approach to software engineering with software modules developed as objects with particular design frameworks (rules for naming and module architecture) and with visual editors both to interface to properties of each module and also to link modules together.
HPcc (High Performance commodity computing)
NPAC project to develop a commodity computing based high performance computing software environment. Note that we have dropped "communications" referred to in the classic HPCC acronym. This is not because it is unimportant but rather because a commodity approach to high performance networking is already being adopted. We focus on high level services such as programming, data access and visualization that we abstract to the rather wishy-washy "computing" in the HPcc acronym.
HPCC (High Performance Computing and Communication)
Originally a formal federal initiative but even after this ended in 1996, this term is used to describe the field devoted to solving large-scale problems with powerful computers and networks.
HTTP (Hyper Text Transport Mechanism)
A stateless transport protocol allowing control information and data to be transmitted between web clients and servers.
IIOP (Internet Inter Orb Protocol)
A stateful protocol allowing CORBA ORB’s to communicate with each other, and transfer both the request for a desired service and the returned result.
JDBC (Java Data Base Connection)
A set of interfaces (Java methods and constants) in the Java 1.1 enterprise framework, defining a uniform access to relational databases. JDBC calls from a client or server Java program link to a particular "driver" that converts these universal database access calls (establish a connection, SQL query, etc.) to particular syntax needed to access essentially any significant database.
Javabean
Part of the Java 1.1 enhancements defining design frameworks (particular naming conventions) and inter Javabean communication mechanisms for Java components with standard (Bean box) or customized visual interfaces (property editors). Enterprise Javabeans are Javabeans enhanced for server side operation with capabilities such as multi user support. Javabeans are Java’s component technology and in this sense are more analogous to ActiveX than either COM or CORBA. However Javabeans augmented with RMI can be used to build a "pure Java" distributed object model.
Object Web
The evolving systems software middleware infrastructure gotten by merging CORBA with Java. Correspondingly merging CORBA with Javabeans gives Object Web ComponentWare. This is expected to compete with the COM/ActiveX architecture from Microsoft.
OMG (Object Management Group)
OMG is the organization of over 700 companies that is developing CORBA through a process of call for proposals and development of consensus standards.
ORB (Object Request Broker)
Used in both clients and servers in CORBA to enable the remote access to objects. ORB’s are available from many vendors and communicate via the IIOP protocol.
RMI (Remote Method Invocation)
A somewhat controversial part of Java 1.1 in the enterprise framework which specifies the remote access to Java objects with a generalization of the UNIX RPC (Remote Procedure Call).
Web Client
Originally web clients displayed HTML and related pages but now they support Java Applets that can be programmed to give web clients the necessary capabilities to support general enterprise computing. The support of signed applets in recent browsers has removed crude security restrictions, which handicapped previous use of applets.
Web Servers
Originally Web Servers supported HTTP requests for information - basically HTML pages but included the invocation of general server side programs using the very simple but arcane CGI - Common Gateway Interface. A new generation of Java servers have enhanced capabilities including server side Java program enhancements (Servlets) and support of stateful permanent communication channels.