Next: Problem Architectures Up: An Application Perspective Previous: An Application Perspective

Introduction

This paper could be viewed as a feasibility study for the success and viability of HPCC. We will analyze the majority of large-scale real-world problems, and find that it is relatively clear that all ``can'' perform well on large-scale parallel and distributed systems. We put ``can'' in quotes because most applications ``will not,'' in fact, ``run'' well today, and it is quite hard to find the necessary return on investment for significant industrial investment in the use of HPCC systems.

We believe that this is not a failure of the concept or the work on HPCC up to now, but rather that now we know what to do, we must build a robust HPCC software and systems infrastructure. In an accompanying paper [Fox:96c], we have suggested that the key to HPCC is implementation of the essential technologies, concepts and capabilities on top of a pervasive technology, and application base. In contrast, today, HPCC is ``top-down'' with a set of beautiful ``niche'' technologies that we can not afford to build, maintain, and integrate into a robust infrastructure. Our proposed bottom-up approach is illustrated in Figure , which shows both ``high-end'' applications---typically, the so called Grand Challenges [HPCC:96a]---and the ``integrated'' metaproblems---or national challenges. In the analysis that follows, we group possible industrial uses of parallel computers into 33 broad classes, which include both Grand and National Challenges. In the final section, we describe five particular applications to illustrate the analysis of the relevance of HPCC in their solution. These include Grand Challenges, such as Monte Carlo simulation, computational fluid dynamics, and molecular dynamics as well as National Challenges, such as multimedia (Web) information systems, manufacturing, and command and control. The latter three areas are ``metaproblems'' (defined precisely in Section ), which integrate several distributed applications including component grand challenges, such as vehicle and process simulation in manufacturing, and weather prediction in command and control.

Figure: Integration of Grand Challenges and Pervasive Technologies

The bottom-up approach of Figure is proposed so that one can build HPCC applications and software on a commercially viable base [Fox:95k]. There are two such natural technology springboards---firstly, shared memory multiprocessors, and secondly, Web or distributed computing. The first choice leads to the interesting distributed shared memory environments, whereas the second is naturally a message passing environment. We expect that both these ``viable bases'' should and will be explored. One important feature of the broader distributed computing base is that it ``by definition'' includes ``everything,'' and so one can build complete metaproblems in terms of a single technology framework.

From this point of view, this paper can be considered as a summary of results and requirements for ``top of the pyramid'' software, algorithms, and applications that need to be used in designing and building the bottom-up HPCC technology.

Section reviews our general study of the structure of problems, as this is helpful in understanding the appropriate hardware and software system in each case.

In Section , we show how the different problem categories or architectures are addressed by parallel software systems with different capabilities. We give illustrative examples, but not an exhaustive list of existing software systems with these characteristics. We consider High Performance Fortran and its extensions as a data parallel language; message passing systems, such as those supplied with commercial multicomputers; as well as approaches to software integration. In Section , we point that our old classification of problems omitted metaproblems---problems built up from several components---each of which could have its own parallelization issues.

Note that our discussion is far more complete in the classic HPCC (parallel processing, MPP) areas, as we have far more examples, and a clearer understanding of the programming paradigms have than for the National (NII) challenges. However, we indicate where NII (Web) concepts, such as Java or VRML might fit as our environments evolve to include these rapidly developing technologies.

In Section , we combine the previous sections and describe informally the problem architecture, and possible software and hardware needs for a selection of ``real-world'' applications. We have grouped our discussion into five broad case studies; Monte Carlo methods; computational chemistry; manufacturing and computational fluid dynamics; command and control; InfoVISiON, or the delivery of multimedia information on the ``digital'' superhighway. These cover a range of software issues including, as we discussed, both the grand and national challenges, and spanning both the bottom and top of the pyramid. We conclude with a glossary of some terms used here, and in the accompanying paper [Fox:96c].

The applications, and their classification come from a survey of New York State industry [Fox:92e], [Fox:94a], [Fox:94b], [Fox:94c], [Fox:94h], [Fox:94i], [Mills:93a]. Tables 1--4 summarizes the industrial opportunities for parallel computing in the form we will use them. Some 80 different applications used in the survey have been broken up into 33 distinct areas. This is certainly somewhat arbitrary, and there are many overlaps (and omissions). The importance, difficulty of implementation, and degree of risk also differ from case to case. However, these issues will not be discussed here.

Table 1 describes the general guidelines used in organizing Table 4. Note that we did not directly cover academic areas, and a more complete list (which included our industrial table) was produced by the Petaflops meeting [Peta:94a]. Notice that Tables 1--4 are organized around the concept of ``information.'' This corresponded to an early realization from the survey that the major industrial opportunities for HPCC in New York State were information related. Thus, for instance, simulation is subtitled ``Information Production'' with say, computational fluid dynamics simulations providing information to be used in either manufacturing (application 32) or education (application 33). It is not directly relevant to this paper, but the results of this survey caused the ACTION program to refocus its efforts and evolve into InfoMall [Fox:93c], [Fox:94f], [Fox:94h], [Fox:95b], [Infourl:95a], [Mills:94a]. Here, ``Info'', in InfoMall, refers to the information based application focus and ``Mall'' to the use of a virtual corporation (groups of ``storeholders'') to produce the complex integrated applications enabled by HPCC.

The first column of Table 4 contains the area label and some sample applications. There is also a pointer to Section , if appropriate. Algorithmic and other comments are in column two. The third and fourth columns describe, respectively, the problem architecture and an estimate of appropriate parallel software approach. The background for these two columns is described in the following two sections.

Note on Language: HPF, MPF use Fortran for illustration, one can use parallel Java, C, C++ or any similar extensions of data parallel or message passing languages

This paper is not intended to advocate a particular parallel software environment or language. Rather, we want to describe the broad capabilities of, and give examples of the parallel programming paradigm needed for the applications of Table 4. We believe that the programming functionality needed by a particular application is broadly determined by the problem architecture described in the following section. In discussing software needs, we do not discuss all the components of the parallel software environment, but just those relevant for expressing problems.

For this reason, we use broad software classifications using, for instance, MPF (Fortran plus message passing) as typical of all similar explicit messaging systems---one could substitute here C plus message passing, or Fortran M programming environments. Again, PVM, MPI, or any such message passing system could be used without changing the significance of the tables. High Performance Fortran is used as a typical data parallel language, although this has an evolving definition and similar C++, or even Java, environments could well be more attractive, and can be substituted in the table.

Next: Problem Architectures Up: An Application Perspective Previous: An Application Perspective

Geoffrey Fox, Northeast Parallel Architectures Center at Syracuse University, gcf@npac.syr.edu