Motivation

The recent advances in hardware technologies introduced a variety of high performance architectures. The hardware organization of these architectures include pipelined uniprocessor systems, shared memory multiprocessors and distributed memory, tightly- and loosly-coupled machines. As heterogeneity in the computer hardware seems to be a permanent factor, the supporting software should exhibit a high degree of desirable homogeneity, despite all differences in the underlying hardware. Therefore, we are constantly looking for a software framework which would be able to effectively transform a possibly wide range of heterogenous architectures into a single, uniform operational environment. One way of trying to satisfy this requirement is to have a uniform language level, the other is to build a portable library which then can be used by many standard languages. Both solutions screen a programmer from architectural differences, either by compilers or by runtime libraries. Unfortunately having a common language layer is in most cases not sufficient as the language semantics does not allow to express arbitrary programming needs. The best example could be a newly introduced standard for High Performance Fortran, which allows to express regular (synchronous and loosly-synchronous) parallel problems but all asynchronous, irregular tasks best expressed by functional parallelism, are simply out of reach. Having a portable runtime library, on the other hand, provides a defined set of lower level operations which can be utilized from the most suitable language for a given problem. A serious drawback of a library is its low abstraction level that can diminish expressiveness of a whole programming environment.

Following our conclusions we decided to provide a runtime library, which will be portable between different, heterogenous architectures, accessible from different programming languages. In order to make the library more expressive we started looking for programming paradigms which could enhance its functionality and expressiveness. As a result, we have found that multithreading is a powerful mechanism which can offer performance gains and a much more modular and convenient programming environment. Interestingly enough, those benefits can be enjoyed equally by programs written for uniprocessor and shared memory multiprocessor systems. The main benefit for single-processor machines comes from the additional concurrency at the programming level. Multiple threads allow to overlap IO operations with CPU activities, and make it easier to implement concurrent and asynchronous activities. For multiprocessors, multithreading makes true concurrency very efficient since different threads within a single process can be executed in parallel by different physical processors. For both architectures using multiple threads is much more efficient than using multiple heavy-weight processes because of less expensive context switching between threads and fast inter-thread communication (using common to a process data structures). Furthermore, a multithreaded library can be naturally exploited by a rich class of procedural languages, as procedures (functions or subroutines) can be intuitively mapped onto concurrent threads.

Despite the fact that multithreaded processes have been around for quite some time, they have not fully utilized on the most popular parallel architectures - distributed memory, parallel machines. But since inter-node communication can be treated as a special case of IO operations, all the benefits of multithreading in shared memory systems should be equally applicable to the distributed memory counterparts. The simplification of parallel programs for multicomputers will also take place, as all non-blocking and asynchronous communication calls can be removed and the familiar blocking semantics in respect to threads should still assure latency hiding and an overlap between computation and communication.

Following the conclusions of our research, it has been decided that our library should be multithreaded. The multithreading capabilities can be directly exploited on any uniprocessor and shared memory multiprocessor system. For tightly- and loosly-coupled distributed memory architectures the library contains communication channels, accessible by all threads within a single channel. Channels may be used to send and receive messages (message passing) between threads belonging to the same or to two different process. Finally, channels as abstract communication links allow interactions between two processes which are arbitrarily placed on any of the Internat-accessible computation node. Therefore an application can be composed of cooperating modules, each of them placed on different, possibly heterogenous, hardware platforms.

By designing and implementing our library we have reached our goal; presenting a set of different, heterogenous architectures in a uniform way. The library furnishes support for simple, uniprocessor programs as well as for complex parallel applications, possibly scattered among a collection of various parallel and sequential computers.