VI. TOOLS INTRODUCED into CEWES MSRC The enhancement of the programming environment at CEWES MSRC through the identification and introduction of programming tools, computational tools, visualization tools, and collaboration/information tools is a major emphasis of the CEWES MSRC PET effort. Tools introduced into CEWES MSRC by the PET team during Year 3 are listed in Table 4. The CEWES MSRC PET team has provided training courses at the CEWES MSRC and at remote user sites for many of these tools (see Section VII), and continually provides guidance and assistance in their use through the on-site team. The purpose of this present section is to discuss the function of these tools and their importance to CEWES MSRC users. Many of these tools came from collaborative efforts across various components of the CEWES MSRC PET team, both on-site and at the universities. I. Programming Tools An ongoing evaluation of the state of the art in parallel debuggers and performance analysis tools is carried out by the PET SPPT team at Tennessee, as part of the National HPCC Software Exchange (NHSE) effort. From these findings Tennessee is able to recommend, install, test and evaluate the utility of such tools within the CEWES MSRC user community. At present, five parallel performance analysis tools (VAMPIR, MPE Logging Library and nupshot, AIMS, SvPablo, and ParaDyn) and one debugger (Totalview) have been made available to CEWES MSRC users on appropriate platforms. Performance analysis tools typically require the addition of code to a program (known as instrumentation) in order to output trace information to a file that will later be interpreted and displayed with a GUI interface. Their purpose is to document significant events (e.g., subroutine calls, sending or receiving messages, I/O, etc) within the execution of the user's code. Debuggers allow a programmer to examine codes as they execute in an attempt to determine the cause of and to then fix catastrophic errors. TotalView TotalView is a multiprocess and multithread source level debugger with a graphical user interface. Versions are available for all CEWES MSRC platforms. Tennessee evaluated TotalView and recommended purchase. Tennessee has worked with CEWES MSRC systems staff to get TotalView properly installed and integrated with the queueing system. Tennessee has provided site-specific usage information, training, and a Web-based tutorial for TotalView. Tennessee served as a beta-tester for new releases of TotalView, tested the CEWES MSRC Totalview installations, and reported bugs to the TotalView developer. Being able to quickly learn and start using a full-featured parallel debugger that works properly in the CEWES MSRC environment has enabled the NRC Computational Migration Group to understand the execution of parallel codes and find and fix elusive bugs. For example, TotalView was used to isolate Cray pointer bugs in a code called MSPHERE that would have been difficult to find without TotalView. Having a single debugger that works across platforms has given CEWES MSRC users a higher payoff for the time investing in learning the debugger and has enabled the CMG to more effectively support users in their debugging tasks. Vampir Vampir is a performance analysis tool for MPI parallel programs. Vampir consists of two parts - the Vampirtrace library that can be linked with an application to produce a trace file during program execution, and the Vampir visualization tools for analyzing the resulting tracefile. Pallas, the developer of Vampir, is working with OpenMP compiler vendors to develop Vampir support for tracing and visualizing execution of OpenMP applications. Users will be allowed to mix MPI and OpenMP, as long as the MPI implementation is sufficiently thread-safe. Tennessee has evaluated Vampir, recommended purchase, and made suggestions for improvements to Vampir 1.0 that were inplemented in Vampir 2.0. Tennessee has provided site-specific usage information, training, a Web-based tutorial, and user assistance for Vampir. Having a robust performance analyis tool that works across platforms has enabled CEWES MSRC users to quickly and easily collect performance trace data and analyze that data visually to spot communication bottlenecks in their codes. Vampir has been used by CEWES MSRC user David Rhodes to analyze and improve performance of the CEN-1 Harmonic Balance Simulation code. Vampir has been used by the RF Weapons Challenge team of AF Phillips Lab to achieve a significant performance improvement in the ICEPIC code. MPI_Connect MPI_Connect, a communication system developed at the Tennessee, enables multiple HPC systems to be used effectively on the same MPI application, thus enabling larger problems to be solved more quickly. The capability of using tuned vendor MPI implementations with small overhead permits highly efficient communication within a single system. Together with OpenMP and MPI, MPI_Connect has been used to reduce the runtime required for the CGWAVE harbor response simulation code of CEWES from months to days. This project won the Most Effective Engineering Methodology award for its SC'98 HPC Challenge entry. MPI_Connect is being made available to other MSRC users and application developers who have similar application coupling and metacomputing needs. Virtue The University of Illinois developed the Virtue system for visualization and analysis of performance data using virtual reality immersion. In the CEWES MSRC PET effort, Tennessee has written a vampir2virtue converter and is working with Illinois to implement additional features in Virtue that will provide more effective visualization of parallel programs (e.g., better labelling of Virtue objects). Virtue will enable effective visualization of the execution of large-scale applications through the use of 3-D and immersive virtual reality, thus enabling performance bottlenecks to be found and fixed more easily. Virtue can be used with the Immersadesk or with graphics workstations. Virtue has multimedia tools that allow remote collaborators to view and manipulate scaled-down versions of Virtue displays from their desktop workstations and interact using voice and video. PAPI With input from users, researchers, and vendors, Tennessee has written a specification for a portable API for accessing hardware performance counters, called PAPI. Tennessee is currently working on reference implementations of PAPI for the SGI/Cray Origin2000 and Linux, with an IBM SP implementation planned for the near future. The PAPI portable interface to hardware performance counters will enable CEWES MSRC users to use the same set of routines to access comparable performance data across platforms. Hardware performance counters can provide valuable information for tuning the cache and memory performance of applications. PAPI will also enable CHSSI code developers to more quickly and easily obtain data needed for performance reporting requirements. dyninst The University of Maryland has produced the dyninst library which provides an API for attaching to an instrumenting an executable running on a single process. In the CEWES MSRC PET effort, Tennessee has tested dyninst on the UTK IBM SP2 and is currently testing on the CEWES IBM SP pandion. After testing, plans are to install dyninst as unsupported software and to provide site-specific usage information. dyninst will enables users to attach to a running application and monitor or even change application behavior dynamically. DPCL The Dynamic Probe Class Library (DPCL) is a client-server extension of dyninst for use with parallel and distributed applications which has been developed by IBM. IBM has provided Tennessee with a beta release of DPCL for the IBM SP and with the DPCL source code. Tennessee will test DPCL on the CEWES IBM SP pandion and will port DPCL to the Origin2000. Tennessee will develop some demonstration end-user tools on top of DPCL and will provide site-specific usage information. DPCL will provide a cross-platform infrastructure for runtime application instrumentation for the purposes of performance analysis,computational steering, and data visualization. DPCL can be used directly by users but also by developers of end-user tools. For example, DPCL will be used to enable Vampir tracing to be turned on and off at runtime and to enable runtime selection and control of access to hardware performance counters. The DPCL infrastructure reduces the amount of effort needed to develop new tools, allowing end users to develop their own special-purpose tools quickly and easily. Repository in a Box (RIB) RIB is a toolkit developed at Tennessee for setting up and maintaining an interoperable distributed collection of software repositories. RIB provides an easy-to-use interface for CTA on-site leads to enter and maintain information about software being made available to CEWES MSRC users as a result of PET and CHSSI efforts. RIB's interoperation capabilities allow software cataloging done at one site to easily be made available to other sites and allows virtual views of a distributed collection of repositories to be created. The software deployment feature of RIB allows users to quickly access information about what software is available on what MSRC machines and about how to use the software. So far, RIB has been used to set up repositories for CEWES MSRC for SPP Tools, CFD, and grid generation software. CAPTools The Computer Aided Parallelisation Tools (CAPTools) is a semi-automatic tool to assist in the process of parallelizing Fortran codes. This tool is under development at University of Greenwich. The main components of the tools comprise: * A detailed control and dependence analysis of the source code, including the acquisition and embedding of user supplied knowledge. * User definition of the parallelization strategy. * Implementation. * Automatic migration, merger and generation of all required communications. * Code optimization including loop interchange, loop splitting, and communication/calculation overlap. To run a CAPTools generated code, users need to link the compiled code with the CAP library. Previously-Evaluated Tools The following lists programming tools previously evaluated and made available as appropriate at CEWES MSRC: nupshot nupshot is a trace visualization tool that has a very simple, easy-to-use interface and gives users a quick overview of the message-passing behavior and performance of their application. nupshot analyzes trace files that are produced by the MPE logging library. Both nupshot and the MPE Logging Library were originally designed as extensions of MPICH. Tennessee made minor modifications to the MPE Logging Library in order that it work with the native MPI libraries on the IBM SP and SGI Origin2000. AIMS AIMS is a freely available trace-based performance analysis tool developed by the NASA Ames NAS Division that provides flexible automatic instrumentation, monitoring and performance analysis of Fortran 77 and ASCI message-passing applications that use MPI or PVM. SvPablo SvPablo, another trace-based performance analysis tool, has a GUI that lets the user determine which portions of his source code are selected for instrumentation and then automatically produces the instrumented source code. After the code executes and a trace file has been produced, the SvPablo GUI displays the resulting performance data alongside the source code. SvPablo runs on Sun Solaris and SGI workstations and on the SGI PCA and Origin2000. It can access and report results from the MIPS R10000 hardware performance counters on the Origin2000. ParaDyn ParaDyn is designed to provide a performance measurement tool that scales to long-running programs on large parallel and distributed systems. Unlike the other tools that do post-mortem analysis of trace files, ParaDyn does interactive run-time analysis. MPE Graphics Library The MPE Graphics Library is part of the MPICH package distributed by Argonne National Laboratory. This graphics library gives the MPI programmer an easy-to-use, minimal set of routines that can asynchronously draw color graphics to an X11 window during the course of a numerical simulation. A user can use the graphics routines to scrutinize the execution of a code with respect to monitoring the accuracy and progress of the solution or as a debugging aid. ScaLAPACK ScaLAPACK is a library of routines for the solution of dense, banded and tri-diagonal linear systems of equations and other numerical linear algebra computations. Data sharing between distributed processors is accomplished using the Basic Linear Algebra Communication Subroutines (BLACS). PETSc The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines for the solution of large-scale scientific application problems modeled by partial differential equations. PETSc was developed within an object-oriented framework and is fully usable from Fortran, C and C++ codes. II. Visualization Tools Several visualization tools were introduced into the CEWES MSRC during Year 3. We conducted an in-depth survey of the software tools available for computational monitoring and interactive steering. These tools include CUMULVS (from Oak Ridge National Lab), DICE (from the Army Research Lab), pV3 (from MIT), and others. Computational monitoring tools can be very valuable for the MSRC user. On the one hand, visualizing intermediate results can identify a run that is not going well - the user can cancel with run without wasting their allocation. Further, computational monitoring is invaluable for debugging new codes. This is particularly useful as more teams move towards coupled models. Finally, for codes that produce very large data sets, computational monitoring might be the only practical way to view the simulation output. We experimented with using each of these tools applied to a CEWES MSRC code. Results from this study are provided in a CEWES MSRC PET Technical Report. We also conducted a one-day training session to introduce these tools. This has resulted in several follow-up contacts where CEWES MSRC users are interested in the benefits offered by applying these techniques to their codes. CbayVisGen & CbayTransport CbayVisGen was specially designed by the PET SV team at NCSA to support the visualization and collaboration needs of Carl Cerco and his EQM team at CEWES. This tool enables them to visualize the hydrodynamics and nutrient transport activity over 10-year and 20-year time periods of activity in Chesapeake Bay. A follow-on tool, CbayTransport, experimented with alternatives for visualizing the transport flux data. Cerco had no mechanisms for viewing this part of his data, so this tool has added new and needed capability. Collaborative Data Analysis Toolsuite (CDAT) A prototype toolsuite for collaborative visualization was also introduced to CEWES MSRC by NCSA. The Collaborative Data Analysis Toolsuite (CDAT) is a multi-platform collection of collaborative visualization tools. Using components from this toolsuite, participants on an ImmersaDesk, a desktop workstation, and a laptop are able to simultaneously explore simulation output. Collaborative visualization is potentially very useful to CEWES MSRC users. In particular, Carl Cerco is eager to try out these capabilities in support of his work. volDG The visualization tool volDG was introduced to Raju Namburu of CEWES to explore the usefulness of wavelet representations and structure-significant encoding of very large structures data. VolDG results from a GUI-based software project developed in collaboration of the ERC at Mississippi State with Mitsubishi Electric Research Laboratories. This tool includes both volume rendering capabilities and an inverse design algorithm that allows for the search for structures in the data. The latter feature of this tool allows the user to be cognizant of the inherent features in the dataset, while the former feature allows preview of the volume in a semi-transparent mode. The impact of this tool is currently limited to the use of the first feature, namely volume rendering. Since CSM datasets are composed of material interfaces, the use of semi-transparency can be useful in visualizing structures in a juxtaposed way. Such a tool can allow better understanding of the datasets. This is the first time such a capability has been provided to a CSM user at CEWES MSRC. ISTV The visualization tool ISTV from the ERC at Mississippi State was introduced to the CWO user community of CEWES MSRC. ISTV was used to visualize the output of a CH3D model of the lower Mississippi River and to show WAM output. Robert Jensen of CEWES reports that ISTV has been useful for looking at correlations between variables. Animating through the time-series with ISTV was especially revealing. These tools have potential for facilitating understanding of the forces encountered during littoral operations. III. Communication/Collaboration Tools Tango Interactive Tango Interactive is a Java-baed Web collaboratory developed by NPAC at Syracuse University (with initial funding from AF Rome Lab). It is implemented with standard Internet technologies and protocols, and runs inside an ordinary Netscape browser window (support for other browsers is in progress). Tango delivers real-time multimedia content in an authentic two-way interactive format. Tango was originally designed to suppport collaborative workgroups, though synchronous distance education and training, which can be thought of as a highly structured kind of collaboration, had become one of the key application areas of the system. The primary Tango window is called the Control Application (CA). From the CA, participants have access to many tools, including: * SharedBrowser, a special-purpose Web browser window that "pushes" Web documents onto remote client workstations. * WebWisdom, a presentation environment for lectures, foilsets, and similar materials. * Whiteboard, for interactive text and graphics display. * Several different kinds of chat tools. * BuenaVista, for two-way audio/video conferencing. Tango has been deployed for some time at the CEWES MSRC and at Jackson State University for use in joint Syracuse-Jackson State distance education work begun in Year 2. More recently, Tango Interactive has been used to deliver PET training, through installations at all four of the MSRCs and other sites as well. During Year 3, a substantial investment has been made in the Tango Interactive system to increase its stability and robustness to a level commensurate with demands of routine use in education, training, and collaboration. These efforts resulted in the first release suitable for general deployment, in May 1998, and a new release with additional improvements at the end of Year 3. IV. Computational Tools WebHLA The WebHLA set of tools is being introduced into CEWES MSRC by NPAC at Syracuse in connection with PET support of the FMS CTA. WebHLA is a collection of tools, packaged as HLA federates and used for integrating Web/Commodity based, HLA compliant and HPC enabled distributed applications. WebHLA tools/federates which are being completed now and to be transitioned soon to CEWES and ARL MSRCs include: * JWORB (Java Web Object Request Broker) - a universal middleware server written in Java that integrates HTTP, IIOP and DCE RPC protocols i.e. it can act simultaneously as Web Server, CORBA broker and DCOM server. * OWRTI (Object Web RTI) - a Java CORBA implementation of DMSO RTI 1.3, packaged as JWORB service and used as a general purpose federation and collaboration layer of the WebHLA framework. * JDIS - a Java based DIS./HLA bridge that allows to rapidly convert legacy DIS applications to HLA federates so that they can play in any standard-compliant HLA federation environment. * PDUDB - a SQL/XML based support for simulation logger and playback; all DIS PDUs or the equivalent HLA interaction events are passed as XML messages to and recorded in an SQL database to be replayed later for training, demo or other analysis purposes. * SimVis - a commodity (Microsoft DirectX/Direct3D) based 3D real-time battlefield visualizer for DIS/HLA simulations. CTH & EPIC Tools Several software modules and programming tools have been developed by TICAM at Texas in CEWES MSEC PET support of the CSM ATA. These include: * Software modules for error indicator computation in CTH. * Software modules for error indicator computation in EPIC. * Software to incorporate block refinement in parallel simulations for CTH. * Software to adaptively refine and update the data structure of EPIC. * Software for Morton/Hilbert space-filling curve generation. * A Utility tool for code testing and validation. * Software for advanced front tracking in CTH under adaptation. Some of the software modules for error indicators are more broadly applicable (with perhaps minor modification) to other PET program analysis components. The software for adaptive refinement and data structure modification is more specific to the application codes in question but the concepts are general. The module for the Morton/Hilbert curves is written in C++ and has quite general applicability. The Utility tool approach can be applied to other applications and codes. The integrated effect of these developments on the CEWES MSRC is significant since it will provide a new adaptive analysis capability and spearhead similar extensions to the DoD application codes both in CSM and the other CTA areas. USC Integrated Memory Hierarchy (IMH) Model The Integrated Memory Hierarchy (IMH) Model is a simple and easy to use model of current HPC platforms available at MSRCs. The IMH model allows end-users to predict the cost of data movement from various levels of the memory hierarchy. This consists of communication between the processor and its memory, other processors, and secondary storage. The model integrates the impact of various architectural features, operating system characteristics, communication environment, and compiler features that the user interacts with in implementing an algorithm. The model consists of a uniform (over several platforms) set of key parameters that captures the performance of the underlying platform from a user's perspective. Using the IMH Model, end-users can analyze and predict the performance of their algorithms for a particular HPC platform. This allows them to make intelligent trade-offs in their algorithm design without actual coding. This allows end-users to quickly develop optimized applications that are both scalable and portable across various HPC platforms.