As in all areas of computing, the software support for HPDC is built up in a layered fashion with no clearly agreed architecture for these layers. We will describe software for a target metacomputer consisting of a set of workstations. The issues are similar if you include more general nodes, such as PCs or MPPs. Many of the existing software systems for HPDC only support a subset of possible nodes, although ``in principle'' all could be extended to a general set.
In a layered approach, we start with a set of workstations running a particular variant of UNIX with a communication capability set up on top of TCP/IP. As high-speed networks such as ATM become available, there has been substantial research into optimizing communication protocols so as to take advantage of the increasing hardware capability. It seems likely that the flexibility of TCP/IP will ensure it will be a building block of all but the most highly optimized and specialized HPDC communication systems. On top of these base distributed computing operating system services, two classes of HPDC software support have been developed.
The most important of these we can call the MCPE (Metacomputing Programming Environment) which is the basic application program interface (API). The second class of software can be termed MCMS (Metacomputing Management System) which provides the overall scheduling and related services.
For MCMS software, the best known products are probably Load Leveler (from IBM produced for the SP-2 parallel machine), LSF (Load Sharing Facility from Platform Computing), DQS (Distributed Queuing System), and Condor from Wisconsin State University. Facilities provided by this software class could include batch and other queues, scheduling and node allocation, process migration, load balancing, and fault tolerance. None of the current systems is very mature and they do not offer integration of HPDC with parallel computing, and such standards as HPF or MPI.
We can illustrate two possible MCPE's or API's with PVM and the World Wide Web. PVM offers basic message passing support for heterogeneous nodes. It also has the necessary utilities to run ``complete jobs'' with appropriate processes spanned.
The World Wide Web is, of course, a much more sophisticated environment, which currently offers HPDC support in the information sector. However, recent developments, such as Java, support embedded downloaded applications. This naturally enables embarrassingly parallel HPDC computational problems. We have proposed a more ambitious approach, termed WebWork, where the most complex HPDC computations would be supported. This approach has the nice feature that one can integrate HPDC support for database and compute capabilities. All three examples discussed in Section 2 required this. One can be concerned that the World Wide Web will have too much overhead with its use of HTTP communication protocol, and substantial Web server processing overhead. We expect that the benefits of the flexibility of WebWork will outweigh the disadvantages of these additional overheads. We can also expect the major Web Technology development around the world to lead too much more efficient server and communication systems.