TOOLS SUMMARY

The enhancement of the programming environment at the CEWES MSRC through the identification and introduction of programming tools, computational tools, visualization tools, and collaboration/information tools is a major emphasis of the CEWES MSRC PET effort. Tools introduced into the CEWES MSRC by the PET team during Year 2 are listed in Table __. The CEWES MSRC PET team has provided training courses at the CEWES MSRC and at remote user sites on many of these, and continually provides guidance and assistance in their use through the on-site team. The purpose of this present section is to discuss the function of these tools and their importance to the CEWES MSRC users. The effort on many of these tools was across various components of the CEWES MSRC PET team, both on-site and at the universities. I. Programming Tools

A preliminary evaluation of the state of the art in parallel debuggers and performance analysis tools has been carried out by the University of Tennessee, Knoxville (UTK), as part of the National HPCC Software Exchange (NHSE) effort. From these findings UTK was able to recommend, install, test and evaluate the utility of such tools within the CEWES MSRC user community. At present five parallel performance analysis tools (VAMPIR, MPE logging and nupshot, AIMS, SvPablo, and ParaDyn) and one debugger (Totalview) have been made available to CEWES MSRC users on appropriate platforms. Performance analysis tools typically require the addition of code to a program (known as instrumentation) in order to output trace information to a file that will later be interpreted and displayed with a graphical interface (GUI). The purpose of such tools is to document significant events (e.g., subroutine calls, sending or receiving messages, I/O) within the execution of the user's code. Debuggers allow a programmer to examine codes as they execute in an attempt to determine the cause of and to fix catastrophic errors.

In January 1998 Dr Shirley Browne and Dr Clay Breshears taught a workshop at CEWES MSRC to introduce users to the performance analysis tools that were chosen for installation at CEWES MSRC. More details on each of the analysis tools and the debugger are given below, as well as work that has been done to port these tools to the various HPC machines at CEWES MSRC.

These and other programming tools are described below.

1. VAMPIR

VAMPIR is a commercial trace-based performance analysis tool from Pallas in Germany. UTK installed, tested and evaluated VAMPIR on the Cray T3E, IBM SP and SGI Origin 2000 at CEWES MSRC. Dr Browne and Dr Breshears have been working with members of the NRC Computational Migration Group (CMG) at the CEWES MSRC and with AFRL researchers in using VAMPIR effectively on codes of interest. With the latter group, VAMPIR was used to find improvements in the communication performance of the DoD Challenge Project code ICEPIC. Browne has written CEWES MSRC PET webpages on the use of VAMPIR on the CEWES MSRC platforms.

2. nupshot & MPE Logging Library

nupshot is a trace visualization tool that has a very simple, easy-to-use interface and gives users a quick overview of the message-passing behavior and performance of their application. nupshot analyzes trace files that are produced by the MPE logging library. Since both nupshot and the MPE logging library were originally designed as extensions of MPICH (the MississipoiState/Argonne implementation of MPI-1), UTK has made minor modifications to the MPE logging library in order for it to work with the native MPI (Message Passing Interface) libraries on the IBM SP and SGI Origin 2000 and is currently debugging library routines which do not work properly with MPI programs that define their own communicators or topologies. A new version of nupshot was written with help from UTK that will function under multiple versions of the Tcl and TK graphics libraries.

3. AIMS

AIMS is a freely available trace-based performance analysis tool developed by the NASA Ames NAS Division that provides flexible automatic instrumentation, monitoring and performance analysis of Fortran 77 and ASCI message-passing applications that use MPI or PVM. UTK has reported bugs in the current release of AIMS, including missing instrumentation of MPI_SendRecv and incorrect reporting of send and receive blocking times when MPI_WaitAll is used. Feedback from users at CEWES MSRC and University of Tennessee indicates that the AIMS interface, especially the source code clickback feature, is especially helpful in analyzing parallel codes. However, support for Fortran90 would be required before this tool could realize a wide user base at CEWES MSRC. Jeff Brown at LANL has told UTK that his tools group will be adding Fortran90 support to AIMS.

4. SvPablo

SvPablo, another trace-based performance analysis tool, was developed by the Pablo group at University of Illinois and is freely available. SvPablo has a GUI that lets the user determine which portions of the source code are selected for instrumentation, and then automatically produces the instrumented source code. After the code executes and a trace file has been produced, the SvPablo GUI displays the resulting performance data alongside the source code. SvPablo runs on Sun Solaris and SGI workstations and on the SGI PCA and Origin 2000. It can access and report results from the MIPS R10000 hardware performance counters on the Origin 2000. UTK has found that SvPablo has been too fragile on the CEWES MSRC Origin 2000 (crashes frequently) to be fairly evaluated. The Pablo group claims SvPablo runs robustly on their Origin 2000 and is investigating the problem. UTK is working on porting SvPablo to the IBM SP.

5. ParaDyn

ParaDyn, from the University of Wisconsin, is designed to provide a performance measurement tool that scales to long-running programs on large parallel and distributed systems. Unlike the other tools that do post-mortem analysis of trace files, ParaDyn does interactive run-time analysis. For twelve or fewer processes, ParaDyn seems fairly robust. For more than twelve processes, however, it experiences catastrophic failure in about 80 percent of the runs that were attempted. ParaDyn only works in interactive mode and cannot be used on batch processing systems. ParaDyn currently runs on IBM SP systems and Dr Breshears has been able to run the tool under an interactive batch job on the CEWES MSRC SP. The developers of ParaDyn are currently working on versions that will be able to execute on other HPC machines available at CEWES MSRC.

6. TotalView

TotalView is a commercial debugger from Dolphin Interconnect Solutions that comes equipped with a GUI, but does does not have a command-line interface. TotalView runs on Unix workstations, the IBM SP and (with the newest release) on the SGI PCA and Origin 2000. TotalView for SGI IRIX 6 requires SGI MPI 3.1, but the CEWES MSRC PCA and Origin 2000 are still at SGI MPI 3.0. Dr Browne has run TotalView successfully on the UTK SP2 which does not use PBS, but Dr Browne and Dr Breshears have been unsuccessful so far in getting TotalView to work on the CEWES MSRC IBM SP under PBS. Both Dolphin and Bob Henderson (developer of PBS) have been contacted about the TotalView-PBS problem. Dolphin is investigating and Henderson is willing to help from the PBS end. Since there has been considerable demand from CEWES MSRC users for a good cross-platform debugger, and TotalView has gotten good reports elsewhere, current plans are to push hard to get TotalView working on the CEWES MSRC platforms and then provide CEWES MSRC users with a one-month evaluation period.

7. MPE Graphics Library

The MPE graphics library is part of the MPICH package (developed by Mississippi State and Argonne) distributed by Argonne National Laboratory as an implementation of MPI-1. This graphics library gives the MPI (Message Passing Interface) programmer an easy-to-use, minimal set of routines that can asynchronously draw color graphics to an X11 window during the course of a numerical simulation. A user can make use of the graphics routines to scrutinize the execution of code with respect to monitoring the accuracy and progress of the solution or as a debugging aid. Dr Steve Bova and Dr Clay Breshears were able to develop a Fortran90 module (Fortran90 module for parallel message passing environment [MPE] graphics) that uses these routines to draw colored contour plots for 2D unstructured grids. To this end, additional routines to draw filled triangles and polygons were written to augment the original MPE graphics library routines. (This effort was joint between the CFD and SPPT teams).

8. ScaLAPACK

ScaLAPACK is a library of routines for the solution of dense, band and tridiagonal linear systems of equations and other numerical computations. Data sharing between distributed processors is accomplished using the Basic Linear Algebra Communication Subroutines (BLACS). The Netlib version of ScaLAPACK from UTK has been installed, tested and timed on all the CEWES MSRC platforms, and compared with vendor versions of ScaLAPACK which were also tested for correctness. Netlib ScaLAPACK has been modified to run on the SGI/Cray T3E, and a T3E patch for ScaLAPACK 1.6 is available on Netlib. Susan Blackford and Dr Victor Eijkhout taught a training course on numerical linear algebra libraries at CEWES MSRC in June 1997. Blackford assisted Dr Breshears in preparing webpages on use of Netlib and vendor versions of LAPACK and ScaLAPACK on CEWES MSRC platforms. ScaLAPACK has been used in the Wallcraft (at NRL-South) Ocean Modeling DoD Challenge Project application to achieve a significant improvement in performance.

9. PETSc

The Portable, Extensible Toolkit for Scientific Computation (PETSc), available from Argonne National Lab, is a suite of data structures and routines for the solution of large-scale scientific application problems modeled by partial differential equations. PETSc was developed within an object-oriented framework and is fully usable from Fortran, C and C++ codes. Dr Ehtesham Hayder (Rice University) has installed the PETSc library on the CEWES MSRC IBM SP in order to compare the relative merits of parallel libraries versus parallel programming languages (specifically HPF) for the computational solution of numerical problems. This evaluation will help CEWES MSRC users to choose better methods for parallel computations on different parallel platforms.

10. Repository in a Box (RIB)

Repository in a Box (RIB) is a toolkit for setting up and maintaining a software repository. It provides a uniform interface to a software catalog and facilitates interoperability, in the form of sharing of catalog information and/or software files, between repositories. RIB was used in an SC97 demonstration at the HPCMO booth of software repository interoperability between the CEWES, ARL, and ASC MSRCs. RIB is currently in production use at ASC MSRC for a CCM and parallel tools repositories. The UTK RIB installation is supporting prototype CFD (CEWES MSRC) and SIP (ARL MSRC) repositories. RIB is also being adopted by NASA HPCC sites, the National Computational Science Alliance (NCSA) and NPACI (San Diego) for their HPC software repositories.

11. Fortran Interface for Pthreads

Pthreads is a POSIX standard established to control the spawning, execution and termination of multiple threads within a single process. Use of Pthreads on a shared memory system is an attractive approach for parallel programming due in part to low system overhead. The disadvantage of Pthreads, with respect to high performance computing, is that there is no Fortran interface defined as part of the POSIX standard. We are defining and implementing a full set of Fortran 90 bindings for the POSIX threads functions. The potential impact on the CEWES MSRC will be to give our users an alternative approach for higher performance shared-memory parallel programming. We are using the DoD Fortran code MAGI (from the AF Phillips Lab) as a test bed for this interface. (This effort was done by the CFD component of the CEWES MSRC PET team.)

II. Computational Tools

1. Unstructured Message-Passing Toolkit

For any finite element (or finite volume) application that solves a partial differential equation, it is necessary to partition the element mesh among the processors in order to develop a message-passing parallel implementation. Furthermore, point-to-point communication is required as a result of the local nature of the finite element approximation. If a structured mesh is used to discretize the domain, then the resulting point-to-point communication patterns are also structured and therefore straightforward to implement. Contrast this situation to that which is encountered if an unstructured mesh is used to discretize the domain. In general, each processor has a different number of neighbors with whom it must exchange data, and the length of each message can also vary. In order to manage such unstructured communication patterns, certain data must be available to the application. In particular, each processor must store the number of neighboring processors for point-to-point communication; the destinations (origins) of the messages it sends (receives); the number of grid points along each inter-processor boundary; the identities of these grid points; arrays (buffers) in which the incoming and outgoing messages are stored; and finally, the identity of the processor in question. The CFD team helped develop a Fortran90 module that exploits language features such as modules, dynamic memory allocation, global variables, and user-defined data types in an attempt to bundle this data with functions in the spirit of a C++ class. This tool can provide CEWES MSRC users with a model data structure and associated functions to simplify the development of scalable message-passing software.

3. Unstructured Mesh Element Graph Finder

When partitioning unstructured grids for message-passing applications, it is often necessary to construct the graph of element connectivity. Given only an arbitrary list of elements, this tool (a Fortran90 routine to find the adjacency list for an arbitrary 2D mesh of linear elements) returns a list, for each element, of the three (for a triangle) or four (for a quad) elements that share any edge with the element in question. It also returns a list of edges that lie on domain boundaries. The algorithm is quite fast, and computational complexity increases only linearly with the element count. This tool provides the CEWES MSRC user with a robust and fast means to obtain the element connectivity graph for an arbitrary, unstructured 2D mesh. (This effort was joint between the CFD and SPPT teams).

III. Visualization Tools

In Year 2, the Scientific Visualization team has contributed to the use of five new tools at the CEWES MSRC. These are described briefly below.

1. VisGen

The EQM VisGen tool supports visual data analysis for the output of the CEWES code CEQUAL-IQM. The tool provides an easy-to-use, point-and-click interface for reading data, and displaying it in a variety of forms, including isosurfaces, colored slices, or volumes. A glyph representation is possible, to support integration of multiple variables into a single graphic. The tool supports animating through the time-series of the run.

2. Damaged Structures Visualization Tool

The Damaged Structures visualization tool is an application that displays the output of the CTH and Dyna3D simulations, as used in the CEWES DoD Challenge application in simulation of building deformation in response to bomb blast. The tool supports viewing slices of the blast's pressure field or the time-series of the isosurface of the blast at 1.40e5 pascals pressure (the shock front), as well as the time-series of the building's structural response.

3. Visual Collaboration

Many CEWES MSRC researchers have expressed needs to be able to share visualization easily with their non-colocated colleagues. The SV team has provided code to easily grab images and movies in a form that can be shared on the web. This code is modular and easily included in many visualization tools. It is included in both the EQM VisGen tool and the Damaged Structures visualization tool described above.

4. NCSA vss Audio Library

The NCSA vss audio library is designed to add non-speech audio to applications, particularly virtual reality applications. Audio can be particularly important in creating a sense of "presence" in a virtual environment. The vss library is based on a client-server model, and provides an API that simplifies creation and addition of audio to an application.

5. VTK (the Visualization ToolKit)

Finally, VTK (the Visualization ToolKit), from GE Corporate Research and Development, is an increasingly popular visualization library. The SV team this year worked with our CEWES MSRC counterparts to introduce and share expertise in the use of this toolkit. In addition to the 2-day training class, we informally shared code segments, example programs, and guidance to assist CEWES MSRC personnel in building their own expertise.

IV. Collaboration/Information Tools

1. Tango & WebWisdom

TANGO is a Java-based web collaboratory developed at Syracuse University (with funding from AF Rome Lab). It is implemented with standard Internet technologies and protocols, and runs inside an ordinary Netscape browser window (support for other browsers is in progress). TANGO delivers real-time multimedia content in an authentic, two-way interactive format.

TANGO was originally designed to support collaborative workgroups, though synchronous distance education, which can be thought of as a particular kind of collaborative workgroup, has become one of the key application areas of the system.

The primary TANGO window is called the control application (CA). From the CA, participants have access to many tools, including:

* SharedBrowser, a special-purpose browser window that "pushes" Web documents onto remote client workstations; * WebWisdom, a presentation environment for over lectures foilsets and similar materials; * WhiteBoard, for interactive text and graphics display; * 2D and 3D Chat tools;

* RaiseHand, a tool used to signal one's desire to ask a question; * BuenaVista, for two-way streaming audio and video.

WebWisdom is a tool for showing lecture slides (or foils). Each foil may have an "addon", which is a link (or links) to supporting material such as online documentation or example programs. WebWisdom was originally developed for use in courses at Syracuse University, and was later interfaced with Tango to support the same kind of presentations to distant audiences.

Tango and WebWisdom were deployed at the CEWES MSRC and at Jackson State University for use in the joint Syracuse-Jackson State distance education work begun as part of the CEWES MSRC PET effort in Year 2. The system was first used to present a three-day training class in the Java programming language simultaneously to a local audience at the CEWES MSRC and a remote audience at JSU. Shortly thereafter, Tango and WebWisdom were used by instructors located in Syracuse to deliver full-semester academic credit courses to students located at JSU in the Fall 1997 and Spring 1998 terms. The effort has been quite successful, and with the benefit of the experience gained, expanded efforts are planned, including graduate-level course offerings, additional recipient sites, and offering CEWES MSRC PET training classes through this mechanism.

A separate effort at Syracuse focused on enhancing Tango to better support collaborative software development and remote consulting. In this project, capabilities for shared browsing and modification of source code, shared debugging, and basic shared on-line computer access were added to Tango. The new tools will be available at CEWES MSRC shortly, in conjunction with a training class to introduce CEWES MSRC users to the Tango collaborative system.

2. Grid Generation Search Engine

The ease of publishing information on the world-wide web has rapidly made it an invaluable source of information on a great many topics. Its popularity and the resulting growth curve has, at the same time, made it harder to sift the desired information out of the rapidly growing flood of data. Topical search engines are one way of dealing with this flood of information. By starting from a set of "master documents", perhaps specifically identified by an expert in a given knowledge domain, it is possible to construct a specialized search engine which focuses on resources in that domain. This provides researchers with a higher "signal-to-noise" ratio in searching for information focused on a particular knowledge domain. Using a search engine framework previously developed at Syracuse University, this idea has been applied to produce a specialized resource on the topic of computational grid generation as part of the CEWES MSRC PET effort. This resource has the potential to benefit researchers in the various "grid-based" computational technology areas, while clearly the concept can and will be applied to a variety of knowledge domains.

3. CEWES MSRC Search Engine

The search engine framework previously developed at Syracuse has also been used to deploy a search engine facility for the CEWES MSRC website (which includes the CEWES MSRC PET website). Based on a relational database system for speed, the search engine can handle complex queries and return relevant documents with search terms highlighted to aid use.

4. Web Site Management System

Websites, such as those run by the CEWES MSRC and the PET program, offer a sizable amount of information, with a large number of individuals contributing content. In addition, some of the information is time sensitive, as well as being subject to security review prior to release and other such concerns. The Web Site Management System is designed to the routine operation of such a site by providing a famework for the management not only of the information content itself, but also the "metadata" (i.e. author, revision date, review status, etc.) which is not easily maintained or used in a traditionally-maintained website. The Web Site Management System is currently under development, and shares the same database used to support the CEWES MSRC Search Engine. The plan is to transition the system to CEWES MSRC once appropriate infrastructure is in place.