VI. TOOLS INTRODUCED into CEWES MSRC

The enhancement of the programming environment at CEWES MSRC through the
identification and introduction of programming tools, computational tools,
visualization tools, and collaboration/information tools is a major emphasis of
the CEWES MSRC PET effort. Tools introduced into CEWES MSRC by the PET team
during Year 3 are listed in Table 4. The CEWES MSRC PET team has provided
training courses at the CEWES MSRC and at remote user sites for many of these
tools (see Section VII), and continually provides guidance and assistance in
their use through the on-site team. The purpose of this present section is to
discuss the function of these tools and their importance to CEWES MSRC users.
Many of these tools came from collaborative efforts across various components of
the CEWES MSRC PET team, both on-site and at the universities.

I. Programming Tools

An ongoing evaluation of the state of the art in parallel debuggers and
performance analysis tools is carried out by the PET SPPT team at Tennessee,
as part of the National HPCC Software Exchange (NHSE) effort. From these
findings Tennessee is able to recommend, install, test and evaluate the utility
of such tools within the CEWES MSRC user community. At present, five parallel
performance analysis tools (VAMPIR, MPE Logging Library and nupshot, AIMS,
SvPablo, and ParaDyn) and one debugger (Totalview) have been made available to
CEWES MSRC users on appropriate platforms.

Performance analysis tools typically require the addition of code to a program
(known as instrumentation) in order to output trace information to a file that
will later be interpreted and displayed with a GUI interface. Their purpose is
to document significant events (e.g., subroutine calls, sending or receiving
messages, I/O, etc) within the execution of the user's code. Debuggers allow a
programmer to examine codes as they execute in an attempt to determine the cause
of and to then fix catastrophic errors.

TotalView

TotalView is a multiprocess and multithread source level debugger with a
graphical user interface. Versions are available for all CEWES MSRC platforms.

Tennessee evaluated TotalView and recommended purchase. Tennessee has worked
with CEWES MSRC systems staff to get TotalView properly installed and integrated
with the queueing system. Tennessee has provided site-specific usage
information, training, and a Web-based tutorial for TotalView. Tennessee served
as a beta-tester for new releases of TotalView, tested the CEWES MSRC Totalview
installations, and reported bugs to the TotalView developer.

Being able to quickly learn and start using a full-featured parallel debugger
that works properly in the CEWES MSRC environment has enabled the NRC
Computational Migration Group to understand the execution of parallel codes and
find and fix elusive bugs. For example, TotalView was used to isolate Cray
pointer bugs in a code called MSPHERE that would have been difficult to find
without TotalView. Having a single debugger that works across platforms has
given CEWES MSRC users a higher payoff for the time investing in learning the
debugger and has enabled the CMG to more effectively support users in their
debugging tasks.

Vampir

Vampir is a performance analysis tool for MPI parallel programs. Vampir consists
of two parts - the Vampirtrace library that can be linked with an application to
produce a trace file during program execution, and the Vampir visualization
tools for analyzing the resulting tracefile. Pallas, the developer of Vampir, is
working with OpenMP compiler vendors to develop Vampir support for tracing and
visualizing execution of OpenMP applications. Users will be allowed to mix MPI
and OpenMP, as long as the MPI implementation is sufficiently thread-safe.

Tennessee has evaluated Vampir, recommended purchase, and made suggestions for
improvements to Vampir 1.0 that were inplemented in Vampir 2.0. Tennessee has
provided site-specific usage information, training, a Web-based tutorial, and
user assistance for Vampir.

Having a robust performance analyis tool that works across platforms has enabled
CEWES MSRC users to quickly and easily collect performance trace data and
analyze that data visually to spot communication bottlenecks in their codes.
Vampir has been used by CEWES MSRC user David Rhodes to analyze and improve
performance of the CEN-1 Harmonic Balance Simulation code. Vampir has been used
by the RF Weapons Challenge team of AF Phillips Lab to achieve a significant
performance improvement in the ICEPIC code.

MPI_Connect

MPI_Connect, a communication system developed at the Tennessee, enables
multiple HPC systems to be used effectively on the same MPI application, thus
enabling larger problems to be solved more quickly. The capability of using
tuned vendor MPI implementations with small overhead permits highly efficient
communication within a single system. Together with OpenMP and MPI, MPI_Connect
has been used to reduce the runtime required for the CGWAVE harbor response
simulation code of CEWES from months to days. This project won the Most
Effective Engineering Methodology award for its SC'98 HPC Challenge entry.
MPI_Connect is being made available to other MSRC users and application
developers who have similar application coupling and metacomputing needs.

Virtue

The University of Illinois developed the Virtue system for visualization and
analysis of performance data using virtual reality immersion. In the CEWES MSRC
PET effort, Tennessee has written a vampir2virtue converter and is working with
Illinois to implement additional features in Virtue that will provide more
effective visualization of parallel programs (e.g., better labelling of Virtue
objects).

Virtue will enable effective visualization of the execution of large-scale
applications through the use of 3-D and immersive virtual reality, thus enabling
performance bottlenecks to be found and fixed more easily. Virtue can be used
with the Immersadesk or with graphics workstations. Virtue has multimedia tools
that allow remote collaborators to view and manipulate scaled-down versions of
Virtue displays from their desktop workstations and interact using voice and
video.

PAPI

With input from users, researchers, and vendors, Tennessee has written a
specification for a portable API for accessing hardware performance counters,
called PAPI. Tennessee is currently working on reference implementations of PAPI
for the SGI/Cray Origin2000 and Linux, with an IBM SP implementation planned for
the near future.

The PAPI portable interface to hardware performance counters will enable CEWES
MSRC users to use the same set of routines to access comparable performance
data across platforms. Hardware performance counters can provide valuable
information for tuning the cache and memory performance of applications. PAPI
will also enable CHSSI code developers to more quickly and easily obtain data
needed for performance reporting requirements.

dyninst

The University of Maryland has produced the dyninst library which provides an
API for attaching to an instrumenting an executable running on a single process.
In the CEWES MSRC PET effort, Tennessee has tested dyninst on the UTK IBM SP2
and is currently testing on the CEWES IBM SP pandion. After testing, plans are
to install dyninst as unsupported software and to provide site-specific usage
information.

dyninst will enables users to attach to a running application and monitor or
even change application behavior dynamically.

DPCL

The Dynamic Probe Class Library (DPCL) is a client-server extension of dyninst
for use with parallel and distributed applications which has been developed by
IBM. IBM has provided Tennessee with a beta release of DPCL for the IBM SP and
with the DPCL source code. Tennessee will test DPCL on the CEWES IBM SP pandion
and will port DPCL to the Origin2000. Tennessee will develop some demonstration
end-user tools on top of DPCL and will provide site-specific usage information.

DPCL will provide a cross-platform infrastructure for runtime application
instrumentation for the purposes of performance analysis,computational steering,
and data visualization. DPCL can be used directly by users but also by
developers of end-user tools. For example, DPCL will be used to enable Vampir
tracing to be turned on and off at runtime and to enable runtime selection and
control of access to hardware performance counters. The DPCL infrastructure
reduces the amount of effort needed to develop new tools, allowing end users to
develop their own special-purpose tools quickly and easily.

Repository in a Box (RIB)

RIB is a toolkit developed at Tennessee for setting up and maintaining an
interoperable distributed collection of software repositories.

RIB provides an easy-to-use interface for CTA on-site leads to enter and
maintain information about software being made available to CEWES MSRC users as
a result of PET and CHSSI efforts. RIB's interoperation capabilities allow
software cataloging done at one site to easily be made available to other sites
and allows virtual views of a distributed collection of repositories to be
created. The software deployment feature of RIB allows users to quickly access
information about what software is available on what MSRC machines and about how
to use the software. So far, RIB has been used to set up repositories for CEWES
MSRC for SPP Tools, CFD, and grid generation software.

CAPTools

The Computer Aided Parallelisation Tools (CAPTools) is a semi-automatic tool to
assist in the process of parallelizing Fortran codes. This tool is under
development at University of Greenwich. The main components of the tools
comprise:

* A detailed control and dependence analysis of the source code, including the
acquisition and embedding of user supplied knowledge.

* User definition of the parallelization strategy.

* Implementation.

* Automatic migration, merger and generation of all required communications.

* Code optimization including loop interchange, loop splitting, and
communication/calculation overlap.

To run a CAPTools generated code, users need to link the compiled code with the
CAP library.

Previously-Evaluated Tools

The following lists programming tools previously evaluated and made available
as appropriate at CEWES MSRC:

nupshot

nupshot is a trace visualization tool that has a very simple, easy-to-use
interface and gives users a quick overview of the message-passing behavior and
performance of their application. nupshot analyzes trace files that are
produced by the MPE logging library. Both nupshot and the MPE Logging Library
were originally designed as extensions of MPICH. Tennessee made minor
modifications to the MPE Logging Library in order that it work with the native
MPI libraries on the IBM SP and SGI Origin2000.

AIMS

AIMS is a freely available trace-based performance analysis tool developed by
the NASA Ames NAS Division that provides flexible automatic instrumentation,
monitoring and performance analysis of Fortran 77 and ASCI message-passing
applications that use MPI or PVM.

SvPablo

SvPablo, another trace-based performance analysis tool, has a GUI that lets
the user determine which portions of his source code are selected for
instrumentation and then automatically produces the instrumented source code. After the code executes and a trace file has been produced, the SvPablo GUI
displays the resulting performance data alongside the source code.
SvPablo runs on Sun Solaris and SGI workstations and on the SGI PCA and
Origin2000. It can access and report results from the MIPS R10000 hardware
performance counters on the Origin2000.

ParaDyn

ParaDyn is designed to provide a performance measurement tool that scales to
long-running programs on large parallel and distributed systems. Unlike the
other tools that do post-mortem analysis of trace files, ParaDyn does
interactive run-time analysis.

MPE Graphics Library

The MPE Graphics Library is part of the MPICH package distributed by Argonne
National Laboratory. This graphics library gives the MPI programmer an
easy-to-use, minimal set of routines that can asynchronously draw color
graphics to an X11 window during the course of a numerical simulation. A user
can use the graphics routines to scrutinize the execution of a code with
respect to monitoring the accuracy and progress of the solution or as a
debugging aid.

ScaLAPACK

ScaLAPACK is a library of routines for the solution of dense, banded and
tri-diagonal linear systems of equations and other numerical linear algebra
computations. Data sharing between distributed processors is accomplished
using the Basic Linear Algebra Communication Subroutines (BLACS).

PETSc

The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a
suite of data structures and routines for the solution of large-scale
scientific application problems modeled by partial differential equations.
PETSc was developed within an object-oriented framework and is fully usable
from Fortran, C and C++ codes.

II. Visualization Tools

Several visualization tools were introduced into the CEWES MSRC during Year
3. We conducted an in-depth survey of the software tools available for
computational monitoring and interactive steering. These tools include
CUMULVS (from Oak Ridge National Lab), DICE (from the Army Research Lab),
pV3 (from MIT), and others. Computational monitoring tools can be very
valuable for the MSRC user. On the one hand, visualizing intermediate
results can identify a run that is not going well - the user can cancel
with run without wasting their allocation. Further, computational
monitoring is invaluable for debugging new codes. This is particularly
useful as more teams move towards coupled models. Finally, for codes that
produce very large data sets, computational monitoring might be the only
practical way to view the simulation output. We experimented with using
each of these tools applied to a CEWES MSRC code. Results from this study are
provided in a CEWES MSRC PET Technical Report. We also conducted a one-day
training session to introduce these tools. This has resulted in several
follow-up contacts where CEWES MSRC users are interested in the benefits offered
by applying these techniques to their codes.

CbayVisGen & CbayTransport

CbayVisGen was specially designed by the PET SV team at NCSA to support the
visualization and collaboration needs of Carl Cerco and his EQM team at CEWES.
This tool enables them to visualize the hydrodynamics and nutrient transport
activity over 10-year and 20-year time periods of activity in Chesapeake Bay.
A follow-on tool, CbayTransport, experimented with alternatives for visualizing
the transport flux data. Cerco had no mechanisms for viewing this part of his
data, so this tool has added new and needed capability.

Collaborative Data Analysis Toolsuite (CDAT)

A prototype toolsuite for collaborative visualization was also introduced to
CEWES MSRC by NCSA. The Collaborative Data Analysis Toolsuite (CDAT) is a
multi-platform collection of collaborative visualization tools. Using components
from this toolsuite, participants on an ImmersaDesk, a desktop workstation, and
a laptop are able to simultaneously explore simulation output. Collaborative
visualization is potentially very useful to CEWES MSRC users. In particular,
Carl Cerco is eager to try out these capabilities in support of his work.

volDG

The visualization tool volDG was introduced to Raju Namburu of CEWES to explore
the usefulness of wavelet representations and structure-significant encoding of
very large structures data. VolDG results from a GUI-based software project
developed in collaboration of the ERC at Mississippi State with Mitsubishi
Electric Research Laboratories. This tool includes both volume rendering
capabilities and an inverse design algorithm that allows for the search for
structures in the data. The latter feature of this tool allows the user to be
cognizant of the inherent features in the dataset, while the former feature
allows preview of the volume in a semi-transparent mode.

The impact of this tool is currently limited to the use of the first feature,
namely volume rendering. Since CSM datasets are composed of material interfaces,
the use of semi-transparency can be useful in visualizing structures in a
juxtaposed way. Such a tool can allow better understanding of the datasets. This
is the first time such a capability has been provided to a CSM user at CEWES
MSRC.

ISTV

The visualization tool ISTV from the ERC at Mississippi State was introduced to
the CWO user community of CEWES MSRC. ISTV was used to visualize the output of a
CH3D model of the lower Mississippi River and to show WAM output. Robert Jensen
of CEWES reports that ISTV has been useful for looking at correlations between
variables. Animating through the time-series with ISTV was especially
revealing. These tools have potential for facilitating understanding of the
forces encountered during littoral operations.

III. Communication/Collaboration Tools

Tango Interactive

Tango Interactive is a Java-baed Web collaboratory developed by NPAC at Syracuse
University (with initial funding from AF Rome Lab). It is implemented with
standard Internet technologies and protocols, and runs inside an ordinary
Netscape browser window (support for other browsers is in progress). Tango
delivers real-time multimedia content in an authentic two-way interactive
format. Tango was originally designed to suppport collaborative workgroups,
though synchronous distance education and training, which can be thought of as
a highly structured kind of collaboration, had become one of the key
application areas of the system.

The primary Tango window is called the Control Application (CA). From
the CA, participants have access to many tools, including:

* SharedBrowser, a special-purpose Web browser window that "pushes"
Web documents onto remote client workstations.

* WebWisdom, a presentation environment for lectures, foilsets, and
similar materials.

* Whiteboard, for interactive text and graphics display.

* Several different kinds of chat tools.

* BuenaVista, for two-way audio/video conferencing.

Tango has been deployed for some time at the CEWES MSRC and at Jackson
State University for use in joint Syracuse-Jackson State distance
education work begun in Year 2. More recently, Tango Interactive has
been used to deliver PET training, through installations at all four
of the MSRCs and other sites as well.

During Year 3, a substantial investment has been made in the Tango
Interactive system to increase its stability and robustness to a level
commensurate with demands of routine use in education, training, and
collaboration. These efforts resulted in the first release suitable
for general deployment, in May 1998, and a new release with additional
improvements at the end of Year 3.

IV. Computational Tools

WebHLA

The WebHLA set of tools is being introduced into CEWES MSRC by NPAC at Syracuse
in connection with PET support of the FMS CTA. WebHLA is a collection of tools,
packaged as HLA federates and used for integrating Web/Commodity based, HLA
compliant and HPC enabled distributed applications. WebHLA tools/federates which
are being completed now and to be transitioned soon to CEWES and ARL MSRCs
include:

* JWORB (Java Web Object Request Broker) - a universal middleware
server written in Java that integrates HTTP, IIOP and DCE RPC
protocols i.e. it can act simultaneously as Web Server, CORBA broker
and DCOM server.

* OWRTI (Object Web RTI) - a Java CORBA implementation of DMSO RTI
1.3, packaged as JWORB service and used as a general purpose
federation and collaboration layer of the WebHLA framework.

* JDIS - a Java based DIS./HLA bridge that allows to rapidly
convert legacy DIS applications to HLA federates so that they can play
in any standard-compliant HLA federation environment.

* PDUDB - a SQL/XML based support for simulation logger and
playback; all DIS PDUs or the equivalent HLA interaction events are
passed as XML messages to and recorded in an SQL database to be
replayed later for training, demo or other analysis purposes.

* SimVis - a commodity (Microsoft DirectX/Direct3D) based 3D
real-time battlefield visualizer for DIS/HLA simulations.

CTH & EPIC Tools

Several software modules and programming tools have been developed by TICAM at
Texas in CEWES MSEC PET support of the CSM ATA. These include:

* Software modules for error indicator computation in CTH.
* Software modules for error indicator computation in EPIC.
* Software to incorporate block refinement in parallel simulations for CTH.
* Software to adaptively refine and update the data structure of EPIC.
* Software for Morton/Hilbert space-filling curve generation.
* A Utility tool for code testing and validation.
* Software for advanced front tracking in CTH under adaptation.

Some of the software modules for error indicators are more broadly applicable
(with perhaps minor modification) to other PET program analysis components. The
software for adaptive refinement and data structure modification is more
specific to the application codes in question but the concepts are general. The
module for the Morton/Hilbert curves is written in C++ and has quite general
applicability. The Utility tool approach can be applied to other applications
and codes. The integrated effect of these developments on the CEWES MSRC is
significant since it will provide a new adaptive analysis capability and
spearhead similar extensions to the DoD application codes both in CSM and the
other CTA areas.

USC

Integrated Memory Hierarchy (IMH) Model

The Integrated Memory Hierarchy (IMH) Model is a simple and easy to
use model of current HPC platforms available at MSRCs. The IMH model
allows end-users to predict the cost of data movement from various
levels of the memory hierarchy. This consists of communication
between the processor and its memory, other processors, and secondary
storage. The model integrates the impact of various architectural
features, operating system characteristics, communication environment,
and compiler features that the user interacts with in implementing an
algorithm. The model consists of a uniform (over several platforms)
set of key parameters that captures the performance of the underlying
platform from a user's perspective. Using the IMH Model, end-users
can analyze and predict the performance of their algorithms for a
particular HPC platform. This allows them to make intelligent
trade-offs in their algorithm design without actual coding. This
allows end-users to quickly develop optimized applications that are
both scalable and portable across various HPC platforms.