VI. TOOLS INTRODUCED into CEWES MSRC

The enhancement of the programming environment at CEWES MSRC through the
identification and introduction of programming tools, computational tools,
visualization tools, and collaboration/information tools is a major emphasis of
the CEWES MSRC PET effort. Tools introduced into CEWES MSRC by the PET team 
during Year 3 are listed in Table 4. The CEWES MSRC PET team has provided 
training courses at the CEWES MSRC and at remote user sites for many of these 
tools (see Section VII), and continually provides guidance and assistance in 
their use through the on-site team. The purpose of this present section is to 
discuss the function of these tools and their importance to CEWES MSRC users. 
Many of these tools came from collaborative efforts across various components of
the CEWES MSRC PET team, both on-site and at the universities.


I. Programming Tools

An ongoing evaluation of the state of the art in parallel debuggers and 
performance analysis tools is carried out by the PET SPPT team at Tennessee, 
as part of the National HPCC Software Exchange (NHSE) effort. From these 
findings Tennessee is able to recommend, install, test and evaluate the utility
of such tools within the CEWES MSRC user community.  At present, five parallel 
performance analysis tools (VAMPIR, MPE Logging Library and nupshot, AIMS, 
SvPablo, and ParaDyn) and one debugger (Totalview) have been made available to 
CEWES MSRC users on appropriate platforms.  

Performance analysis tools typically require the addition of code to a program 
(known as instrumentation) in order to output trace information to a file that 
will later be interpreted and displayed with a GUI interface.  Their purpose is 
to document significant events (e.g., subroutine calls, sending or receiving 
messages, I/O, etc) within the execution of the user's code.  Debuggers allow a 
programmer to examine codes as they execute in an attempt to determine the cause
of and to then fix catastrophic errors.

TotalView

TotalView is a multiprocess and multithread source level debugger with a 
graphical user interface.  Versions are available for all CEWES MSRC platforms.

Tennessee evaluated TotalView and recommended purchase. Tennessee has worked 
with CEWES MSRC systems staff to get TotalView properly installed and integrated
with the queueing system.  Tennessee has provided site-specific usage 
information, training, and a Web-based tutorial for TotalView. Tennessee served 
as a beta-tester for new releases of TotalView, tested the CEWES MSRC Totalview 
installations, and reported bugs to the TotalView developer.

Being able to quickly learn and start using a full-featured parallel debugger 
that works properly in the CEWES MSRC environment has enabled the NRC 
Computational Migration Group to understand the execution of parallel codes and 
find and fix elusive bugs.  For example, TotalView was used to isolate Cray 
pointer bugs in a code called MSPHERE that would have been difficult to find 
without TotalView. Having a single debugger that works across platforms has 
given CEWES MSRC users a higher payoff for the time investing in learning the 
debugger and has enabled the CMG to more effectively support users in their
debugging tasks.

Vampir

Vampir is a performance analysis tool for MPI parallel programs. Vampir consists
of two parts - the Vampirtrace library that can be linked with an application to
produce a trace file during program execution, and the Vampir visualization 
tools for analyzing the resulting tracefile. Pallas, the developer of Vampir, is
working with OpenMP compiler vendors to develop Vampir support for tracing and 
visualizing execution of OpenMP applications.  Users will be allowed to mix MPI 
and OpenMP, as long as the MPI implementation is sufficiently thread-safe.
 
Tennessee has evaluated Vampir, recommended purchase, and made suggestions for
improvements to Vampir 1.0 that were inplemented in Vampir 2.0. Tennessee has 
provided site-specific usage information, training, a Web-based tutorial, and 
user assistance for Vampir.

Having a robust performance analyis tool that works across platforms has enabled
CEWES MSRC users to quickly and easily collect performance trace data and 
analyze that data visually to spot communication bottlenecks in their codes.
Vampir has been used by CEWES MSRC user David Rhodes to analyze and improve 
performance of the CEN-1 Harmonic Balance Simulation code. Vampir has been used 
by the RF Weapons Challenge team of AF Phillips Lab to achieve a significant 
performance improvement in the ICEPIC code.

MPI_Connect

MPI_Connect, a communication system developed at the Tennessee, enables 
multiple HPC systems to be used effectively on the same MPI application, thus 
enabling larger problems to be solved more quickly.  The capability of using 
tuned vendor MPI implementations with small overhead permits highly efficient 
communication within a single system.  Together with OpenMP and MPI, MPI_Connect
has been used to reduce the runtime required for the CGWAVE harbor response 
simulation code of CEWES from months to days. This project won the Most 
Effective Engineering Methodology award for its SC'98 HPC Challenge entry.
MPI_Connect is being made available to other MSRC users and application
developers who have similar application coupling and metacomputing needs.

Virtue

The University of Illinois developed the Virtue system for visualization and 
analysis of performance data using virtual reality immersion.  In the CEWES MSRC
PET effort, Tennessee has written a vampir2virtue converter and is working with 
Illinois to implement additional features in Virtue that will provide more 
effective visualization of parallel programs (e.g., better labelling of Virtue
objects).

Virtue will enable effective visualization of the execution of large-scale 
applications through the use of 3-D and immersive virtual reality, thus enabling
performance bottlenecks to be found and fixed more easily.  Virtue can be used 
with the Immersadesk or with graphics workstations.  Virtue has multimedia tools
that allow remote collaborators to view and manipulate scaled-down versions of 
Virtue displays from their desktop workstations and interact using voice and 
video. 

PAPI

With input from users, researchers, and vendors, Tennessee has written a 
specification for a portable API for accessing hardware performance counters, 
called PAPI. Tennessee is currently working on reference implementations of PAPI
for the SGI/Cray Origin2000 and Linux, with an IBM SP implementation planned for
the near future.

The PAPI portable interface to hardware performance counters will enable CEWES
MSRC users to use the same set of routines to access comparable performance
data across platforms.  Hardware performance counters can provide valuable 
information for tuning the cache and memory performance of applications.  PAPI 
will also enable CHSSI code developers to more quickly and easily obtain data 
needed for performance reporting requirements.

dyninst

The University of Maryland has produced the dyninst library which provides an 
API for attaching to an instrumenting an executable running on a single process.
In the CEWES MSRC PET effort, Tennessee has tested dyninst on the UTK IBM SP2 
and is currently testing on the CEWES IBM SP pandion.  After testing, plans are 
to install dyninst as unsupported software and to provide site-specific usage 
information.

dyninst will enables users to attach to a running application and monitor or 
even change application behavior dynamically.

DPCL

The Dynamic Probe Class Library (DPCL) is a client-server extension of dyninst 
for use with parallel and distributed applications which has been developed by 
IBM. IBM has provided Tennessee with a beta release of DPCL for the IBM SP and
with the DPCL source code.  Tennessee will test DPCL on the CEWES IBM SP pandion
and will port DPCL to the Origin2000.  Tennessee will develop some demonstration
end-user tools on top of DPCL and will provide site-specific usage information.

DPCL will provide a cross-platform infrastructure for runtime application 
instrumentation for the purposes of performance analysis,computational steering,
and data visualization. DPCL can be used directly by users but also by 
developers of end-user tools.  For example, DPCL will be used to enable Vampir 
tracing to be turned on and off at runtime and to enable runtime selection and 
control of access to hardware performance counters. The DPCL infrastructure 
reduces the amount of effort needed to develop new tools, allowing end users to 
develop their own special-purpose tools quickly and easily.

Repository in a Box (RIB)

RIB is a toolkit developed at Tennessee for setting up and maintaining an 
interoperable distributed collection of software repositories.

RIB provides an easy-to-use interface for CTA on-site leads to enter and 
maintain information about software being made available to CEWES MSRC users as 
a result of PET and CHSSI efforts. RIB's interoperation capabilities allow 
software cataloging done at one site to easily be made available to other sites 
and allows virtual views of a distributed collection of repositories to be 
created. The software deployment feature of RIB allows users to quickly access
information about what software is available on what MSRC machines and about how
to use the software.  So far, RIB has been used to set up repositories for CEWES
MSRC for SPP Tools, CFD, and grid generation software.

CAPTools

The Computer Aided Parallelisation Tools (CAPTools) is a semi-automatic tool to 
assist in the process of parallelizing Fortran codes. This tool is under 
development at University of Greenwich. The main components of the tools 
comprise:

  * A detailed control and dependence analysis of the source code, including the
    acquisition and embedding of user supplied knowledge.

  * User definition of the parallelization strategy.

  * Implementation.

  * Automatic migration, merger and generation of all required communications.

  * Code optimization including loop interchange, loop splitting, and 
    communication/calculation overlap.

To run a CAPTools generated code, users need to link the compiled code with the 
CAP library.

Previously-Evaluated Tools

The following lists programming tools previously evaluated and made available 
as appropriate at CEWES MSRC:

  nupshot

  nupshot is a trace visualization tool that has a very simple, easy-to-use 
  interface and gives users a quick overview of the message-passing behavior and
  performance of their application.  nupshot analyzes trace files that are 
  produced by the MPE logging library.  Both nupshot and the MPE Logging Library
  were originally designed as extensions of MPICH.  Tennessee made minor 
  modifications to the MPE Logging Library in order that it work with the native
  MPI libraries on the IBM SP and SGI Origin2000.
 
  AIMS

  AIMS is a freely available trace-based performance analysis tool developed by 
  the NASA Ames NAS Division that provides flexible automatic instrumentation, 
  monitoring and performance analysis of Fortran 77 and ASCI message-passing 
  applications that use MPI or PVM.   

  SvPablo

  SvPablo, another trace-based performance analysis tool, has a GUI that lets 
  the user determine which portions of his source code are selected for 
  instrumentation and then automatically produces the instrumented source code.   After the code executes and a trace file has been produced, the SvPablo GUI 
  displays the resulting performance data alongside the source code.
  SvPablo runs on Sun Solaris and SGI workstations and on the SGI PCA and 
  Origin2000.  It can access and report results from the MIPS R10000 hardware 
  performance counters on the Origin2000.
 
  ParaDyn

  ParaDyn is designed to provide a performance measurement tool that scales to 
  long-running programs on large parallel and distributed systems.  Unlike the 
  other tools that do post-mortem analysis of trace files, ParaDyn does 
  interactive run-time analysis. 

  MPE Graphics Library

  The MPE Graphics Library is part of the MPICH package distributed by Argonne 
  National Laboratory.  This graphics library gives the MPI programmer an 
  easy-to-use, minimal set of routines that can asynchronously draw color 
  graphics to an X11 window during the course of a numerical simulation.  A user
  can use the graphics routines to scrutinize the execution of a code with
  respect to monitoring the accuracy and progress of the solution or as a 
  debugging aid.  

  ScaLAPACK 

  ScaLAPACK is a library of routines for the solution of dense, banded and 
  tri-diagonal linear systems of equations and other numerical linear algebra 
  computations.  Data sharing between distributed processors is accomplished 
  using the Basic Linear Algebra Communication Subroutines (BLACS). 

  PETSc

  The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a 
  suite of data structures and routines for the solution of large-scale 
  scientific application problems modeled by partial differential equations.  
  PETSc was developed within an object-oriented framework and is fully usable 
  from Fortran, C and C++ codes. 


II. Visualization Tools

Several visualization tools were introduced into the CEWES MSRC during Year
3.  We conducted an in-depth survey of the software tools available for
computational monitoring and interactive steering.  These tools include
CUMULVS (from Oak Ridge National Lab), DICE (from the Army Research Lab),
pV3 (from MIT), and others.  Computational monitoring tools can be very
valuable for the MSRC user.  On the one hand, visualizing intermediate
results can identify a run that is not going well - the user can cancel
with run without wasting their allocation.  Further, computational
monitoring is invaluable for debugging new codes.  This is particularly
useful as more teams move towards coupled models.  Finally, for codes that
produce very large data sets, computational monitoring might be the only
practical way to view the simulation output.  We experimented with using
each of these tools applied to a CEWES MSRC code.  Results from this study are
provided in a CEWES MSRC PET Technical Report.  We also conducted a one-day 
training session to introduce these tools.  This has resulted in several 
follow-up contacts where CEWES MSRC users are interested in the benefits offered
by applying these techniques to their codes.

CbayVisGen & CbayTransport

CbayVisGen was specially designed by the PET SV team at NCSA to support the 
visualization and collaboration needs of Carl Cerco and his EQM team at CEWES.
This tool enables them to visualize the hydrodynamics and nutrient transport 
activity over 10-year and 20-year time periods of activity in Chesapeake Bay. 
A follow-on tool, CbayTransport, experimented with alternatives for visualizing 
the transport flux data.  Cerco had no mechanisms for viewing this part of his 
data, so this tool has added new and needed capability.

Collaborative Data Analysis Toolsuite (CDAT)

A prototype toolsuite for collaborative visualization was also introduced to
CEWES MSRC by NCSA. The Collaborative Data Analysis Toolsuite (CDAT) is a 
multi-platform collection of collaborative visualization tools. Using components
from this toolsuite, participants on an ImmersaDesk, a desktop workstation, and 
a laptop are able to simultaneously explore simulation output.  Collaborative
visualization is potentially very useful to CEWES MSRC users.  In particular, 
Carl Cerco is eager to try out these capabilities in support of his work.

volDG

The visualization tool volDG was introduced to Raju Namburu of CEWES to explore
the usefulness of wavelet representations and structure-significant encoding of
very large structures data. VolDG results from a GUI-based software project 
developed in collaboration of the ERC at Mississippi State with Mitsubishi 
Electric Research Laboratories. This tool includes both volume rendering 
capabilities and an inverse design algorithm that allows for the search for 
structures in the data. The latter feature of this tool allows the user to be 
cognizant of the inherent features in the dataset, while the former feature 
allows preview of the volume in a semi-transparent mode.

The impact of this tool is currently limited to the use of the first feature, 
namely volume rendering. Since CSM datasets are composed of material interfaces,
the use of semi-transparency can be useful in visualizing structures in a 
juxtaposed way. Such a tool can allow better understanding of the datasets. This
is the first time such a capability has been provided to a CSM user at CEWES 
MSRC.

ISTV

The visualization tool ISTV from the ERC at Mississippi State was introduced to 
the CWO user community of CEWES MSRC. ISTV was used to visualize the output of a
CH3D model of the lower Mississippi River and to show WAM output.  Robert Jensen
of CEWES reports that ISTV has been useful for looking at correlations between 
variables. Animating through the time-series with ISTV was especially 
revealing. These tools have potential for facilitating understanding of the 
forces encountered during littoral operations.


III. Communication/Collaboration Tools

Tango Interactive

Tango Interactive is a Java-baed Web collaboratory developed by NPAC at Syracuse
University (with initial funding from AF Rome Lab).  It is implemented with 
standard Internet technologies and protocols, and runs inside an ordinary 
Netscape browser window (support for other browsers is in progress).  Tango 
delivers real-time multimedia content in an authentic two-way interactive 
format.  Tango was originally designed to suppport collaborative workgroups,
though synchronous distance education and training, which can be thought of as 
a highly structured kind of collaboration, had become one of the key
application areas of the system.

The primary Tango window is called the Control Application (CA).  From
the CA, participants have access to many tools, including:

  * SharedBrowser, a special-purpose Web browser window that "pushes"
    Web documents onto remote client workstations.

  * WebWisdom, a presentation environment for lectures, foilsets, and
    similar materials.

  * Whiteboard, for interactive text and graphics display.

  * Several different kinds of chat tools.

  * BuenaVista, for two-way audio/video conferencing.

Tango has been deployed for some time at the CEWES MSRC and at Jackson
State University for use in joint Syracuse-Jackson State distance
education work begun in Year 2.  More recently, Tango Interactive has
been used to deliver PET training, through installations at all four
of the MSRCs and other sites as well.

During Year 3, a substantial investment has been made in the Tango
Interactive system to increase its stability and robustness to a level
commensurate with demands of routine use in education, training, and
collaboration.  These efforts resulted in the first release suitable
for general deployment, in May 1998, and a new release with additional
improvements at the end of Year 3.


IV. Computational Tools


WebHLA

The WebHLA set of tools is being introduced into CEWES MSRC by NPAC at Syracuse
in connection with PET support of the FMS CTA. WebHLA is a collection of tools, 
packaged as HLA federates and used for integrating Web/Commodity based, HLA 
compliant and HPC enabled distributed applications. WebHLA tools/federates which
are being completed now and to be transitioned soon to CEWES and ARL MSRCs 
include:

* JWORB (Java Web Object Request Broker) - a universal middleware
  server written in Java that integrates HTTP, IIOP and DCE RPC
  protocols i.e. it can act simultaneously as Web Server, CORBA broker
  and DCOM server.

* OWRTI (Object Web RTI) - a Java CORBA implementation of DMSO RTI
  1.3, packaged as JWORB service and used as a general purpose
  federation and collaboration layer of the WebHLA framework.

* JDIS - a Java based DIS./HLA bridge that allows to rapidly
  convert legacy DIS applications to HLA federates so that they can play
  in any standard-compliant HLA federation environment.

* PDUDB - a SQL/XML based support for simulation logger and
  playback; all DIS PDUs or the equivalent HLA interaction events are
  passed as XML messages to and recorded in an SQL database to be
  replayed later for training, demo or other analysis purposes.

* SimVis - a commodity (Microsoft DirectX/Direct3D) based 3D
  real-time battlefield visualizer for DIS/HLA simulations.


CTH & EPIC Tools

Several software modules and programming tools have been developed by TICAM at
Texas in CEWES MSEC PET support of the CSM ATA. These include:

* Software modules for error indicator computation in CTH.
* Software modules for error indicator computation in EPIC.
* Software to incorporate block refinement in parallel simulations for CTH.
* Software to adaptively refine and update the data structure of EPIC.
* Software for Morton/Hilbert space-filling curve generation.
* A Utility tool for code testing and validation.
* Software for advanced front tracking in CTH under adaptation.

Some of the software modules for error indicators are more broadly applicable
(with perhaps minor modification) to other PET program analysis components.  The
software for adaptive refinement and data structure modification is more 
specific to the application codes in question but the concepts are general. The 
module for the Morton/Hilbert curves is written in C++ and has quite general 
applicability. The Utility tool approach can be applied to other applications 
and codes.  The integrated effect of these developments on the CEWES MSRC is 
significant since it will provide a new adaptive analysis capability and
spearhead similar extensions to the DoD application codes both in CSM and the
other CTA areas.


USC

Integrated Memory Hierarchy (IMH) Model

The Integrated Memory Hierarchy (IMH) Model is a simple and easy to
use model of current HPC platforms available at MSRCs.  The IMH model
allows end-users to predict the cost of data movement from various
levels of the memory hierarchy.  This consists of communication
between the processor and its memory, other processors, and secondary
storage.  The model integrates the impact of various architectural
features, operating system characteristics, communication environment,
and compiler features that the user interacts with in implementing an
algorithm.  The model consists of a uniform (over several platforms)
set of key parameters that captures the performance of the underlying
platform from a user's perspective.  Using the IMH Model, end-users
can analyze and predict the performance of their algorithms for a
particular HPC platform.  This allows them to make intelligent
trade-offs in their algorithm design without actual coding.  This
allows end-users to quickly develop optimized applications that are
both scalable and portable across various HPC platforms.