Abstract:
The integration of a compiled and interpreted HPF gives us an opportunity to design a powerful application development environment targeted for high performance parallel and distributed systems. This Web based system follows a three-tier model. The Java front-end holds proxy objects which can be manipulated with an interpreted Web client (a Java applet) interacting dynamically with compiled code through a tier-2 server. Although targeted for HPF back-end, the system's architecture is independent of the back-end language, and can be extended to support other high performance languages.
The situation is different for high-performance, parallel, or distributed architectures. Performance tuning, debugging, and data analysis is more difficult, and yet tools for these are not widely available. We also face a problem of software integration, as different software components often follow different parallel programming paradigms. On the other hand, we witness a rapid progress of Web-based technologies that are inherently distributed, heterogeneous, and platform-independent. Of particular interest are the definition and standardization of interfaces that enable cross-platform software interoperability. In this report we describe a system that borrows from these Web technologies.
The integration of compiled and interpreted HPF gives us an opportunity to design a powerful application development environment targeted for high performance parallel and distributed systems. This DARP environment includes a source level debugger, data visualization and data analysis packages, and an HPF interpreter. A capability to alternate between compiled and interpreted modes provides the means for interaction with the code at real-time, while preserving an acceptable performance. Thanks to interpreted interfaces, typical of Web technologies, w can use our system as a software integration tool.
The fundamental feature of our system is that the user can interrupt execution of the compiled code at any point and get an interactive access to the data. For visualizations, the execution is resumed as soon as the data transfer is completed. For data analysis, the interrupted code pauses and waits for the user's commands. The set of available commands closely reproduces functionality of a typical debugger (setting breakpoints, displaying or modifying values of variables, etc.). However, our system has a unique feature that values of distributed arrays can be modified by issuing HPF commands. In this sense, our system can be thought of as an HPF interpreter. For more complex data transformations, the user can dynamically link precompiled functions written in HPF or other languages. This enables rapid prototyping. In particular, parallel libraries that do not necessarily follow the HPF computational model can this way be integrated dynamically with the HPF code, through the HPF extrinsic interface.
The functionality of our system is further increased by implementing proxy libraries in Java. This allows us to design and develop DARP system as a three-tier system rather than a traditional client-server one. Now we can treat components of the DARP system as distributed objects to be implemented as CORBA ORB-lets or JavaBeans. We use this mechanism for dynamical embedding of calls to a visualization system (such as SciViz[2]), or for coupling this system with the WebFlow[3].
The paper is organized as follows. In section 2 we discuss the overall architecture of the system in the context of the High Performance Commodity Computing paradigm. Sections 3-6 describe the three-tier design of the DARP system and its components: the tier-2 DARP server, the instrumentation server, DARP front-end and HPF interpreter, respectively. Section 7 demonstrates integrating the DARP system with the visualization package using a proxy library. Finally, in Section 8 we give our summary and conclusions.
The growing power and capability of commodity computing and communication technologies is largely driven by commercial distributed information systems[1]. CORBA, Microsoft's DCOM, JavaBeans, and less sophisticated web and network approaches build such systems. These can all be abstracted to a three-tier model with largely independent clients connected to a distributed network of severs. These host various services including object and relational databases and of course parallel and sequential computing. High performance can be obtained by combining concurrency at the middle server tier with optimized parallel back end services. The resultant system combines the needed performance for large-scale HPCC applications with the rich functionality of commodity systems.
The design of the DARP system follows this idea of High Performance commodity computing (HPcc). Conceptually, the architecture of this three-tier system can be described as follows (c.f. Figure 1): The DARP system uses an interpreted Web client interacting dynamically with a compiled code. At this time, the system uses HPF back-end but the architecture is independent of the back-end language. The Java or JavaScript front-end holds proxy objects produced by an HPF front-end operating on the back-end code. These proxy objects can be manipulated with interpreted Java commands to request additional processing, visualization and other interactive computational steering and analysis.
Since HPF follows a simple global name space, data parallel paradigm, the DARP server can be implemented just as an extrinsic HPF LOCAL procedure. In this case, the server is a part of the application, and it comes into existence only after the application is launched. In this scenario, the application code is instrumented in such a way that the initialization of the DARP server is the first executable statement of the application. Once initialized, the server blocks the application waiting for the client to connect. From that point on, the execution is controlled by the client. Optionally, the initialization of the server may include processing a script that sets action/breakpoints and force resuming the execution without waiting for the user's commands.
In a general SPMD paradigm this simplistic implementation of the DARP server is not sufficient: the client looses the control over the application when the code on a single node dies. Therefore we extended the server architecture. Now, as shown in Figure 3, the manager which is independent of the application accepts requests from the client and multicasts them to all nodes participating in the computations. The application is packaged as an Java object with two Java threads: the instrumented application itself and the control thread that shares the context of the application. Figure 2: Current architecture of DARP. The DARP server is a part of the instrumented HPF application and it is replicated over the nodes participating in the computation. The client communicates with only one server on a selected node. On this node the server acts as a manager.
The interprocessor communication required by the distributed application is not implemented using the Web based protocols (such as CORBA IIOP), as is the case of the client-manager interactions. Instead, we use the native HPF runtime support or MPI directly. For a meta-computations, in our approach controlled by a network of managers, we consider replacing low level MPI by Nexus[5] and other services provided by Globus[6] as the high performance communication layer.
We build the preprocessor using the HPF Front End (HPFfe)[7] developed at NPAC within the PCRC consortium[8]. HPFfe is based on the SAGE++ system [9], which in addition to parsing, provides the means to access and modify the abstract parse tree (AST), the symbol table, the type tables, and source-level program annotations. For our purposes, we developed functions that identify attributes of all variables used in the HPF application (including the data type and runtime memory addresses) and operates on the AST to insert variables' "registration" calls (allowing the server to determine the size and location of the data to be sent) and calls to the server.
Since HPF is a superset of Fortran 90 we can apply our preprocessor to any sequential Fortran code, in particular, to a node code of any parallel application developed in Fortran that uses explicit calls to a message passing library such as MPI or PVM. Capability to process HPF compiler directives enhances our system in that we can preserve the information on the intended data distributions and assertions on the (lack of) data dependencies.
Note that since the code is instrumented on the source level, our "debugger" gives access only to the source level data. In particular, we are not able to provide a complete state of the machine (registers, buffers, etc.) at any given time, as many commercial debuggers do, and as it is recommended by the High Performance Debugging Forum[11]. Also, since at this time we address exclusively applications in HPF, we ignore several important features that are necessary to support a more general SPMD paradigm. In particular, we assume that interprocessor communications are facilitated by a bug-free HPF runtime system. However, the more advanced implementation of the DARP system with the independent DARP manager (c.f. Section 3) makes it possible to control applications that use explicit message-passing. Anyway, the DARP system is not designed to be a system level debugger. The intended functionality of the system is manipulation of large distributed data objects to, for example, investigate convergence and stability of algorithms used in scientific simulations.
Typically, a client-server architecture is used to implement a portable debugger for distributed systems (c.f. [12-17]). Our approach is unique in that we use a three-tier architecture. Therefore we can easily integrate our source level debugger with the HPF interpreter and a visualization tool which together comprise a powerful application development environment.
The interaction between the running application and user's commands is based on dynamical linking of UNIX shared objects with the application. This way any precompiled stand alone or library routine with a conforming interface can be called interactively at a breakpoint or at selected action points. In order to execute a new code entered in a form of a HPF source it must be first converted to a shared object. To this end we use the HPFfe to generate a subroutine from a single HPF statement or a block of statements, and then compile it using an HPF compiler, as shown in the Fig. 4. Since any "interpreted" code is in fact compiled, the efficiency of the resulting code is as good as that of the application itself. Nevertheless, time needed for creation of the shared object is prohibitively long to attempt to run the complete applications, statement after the statement, in the interpreted mode. On the other hand, the capability of data manipulations and visualization at any time during the execution of the application without recompiling and rerunning the whole application proves to be very time effective.
The architecture of the SciVis system makes it particularly attractive for integration with the DARP system. The SciVis client API allows us to design a proxy library in Java with a simple and very intuitive interface. The library, on the user behalf, causes automatic creation of a SciVis client routine that corresponds to the data type requested by the user. The client is then dynamically linked with the running application and executed at specified action point. This results in sending the data to the SciVis server which in turn display them on the user's workstation screen.
The same mechanism, with dedicated proxy libraries, can be used to integrate the DARP system with other software packages, such as computational libraries, data storage systems, or other visualization systems. Also, by using proxy libraries the DARP system may request or provide services from other tier-2 servers, or become a module in a data flow type computations[20].
Consistently with our HPcc strategy, the system implements a three-tier architecture: The Java front-end holds proxy objects produced by an HPF front-end operating on the back-end code. These proxy objects can be manipulated with an interpreted Web client interacting dynamically with compiled code through a typical tier-2 server (middleware). Although targeted for HPF back-end, the system's architecture is independent of the back-end language, and can be extended to support other high performance languages such as HPC++[18] or HPJava[19]. Finally, since we follow a distributed objects approach, the DARP system can be easily incorporated into a collaboratory environment such as Tango[4] or Habanero[21].