VIII. OUTREACH to CEWES MSRC USERS


Since the great majority of users of the CEWES MSRC are off-site, the CEWES 
MSRC PET effort places emphasis on outreach to remote users, as well as to
users located on-site at CEWES. Table 3 lists the contacts made with CEWES MSRC 
users by the CEWES MSRC PET team during Year 3, and Table 2 lists all travel by 
the CEWES MSRC PET team in connection with the Year 3 effort. A major component 
of outreach to CEWES MSRC users is the training courses (described in Section 
VII) conducted by the CEWES MSRC PET team, some of which are conducted at remote
user sites and some of which are web-based. The CEWES MSRC PET website, 
accessible from the CEWES MSRC website, is also a major medium for outreach to 
CEWES MSRC users, and all material from the training courses is posted on the 
PET website. A CD-ROM of training material has also been prepared.
 
Specific outreach activities conducted in Year 3 are described in this section,
which is organized by individual components of the CEWES MSRC PET effort.


CFD: Computational Fluid Dynamics CTA 
     (ERC - Mississippi State)

Interactions with CEWES MSRC users have been initiated by a variety of means.
Telephone, e-mail, and personal visits have all resulted in opportunities for
user support and more specific collaborative efforts.  Face-to-face
visits have resulted from training participation such as the Parallel
Programming Workshop for Fortran Programmers (BYOC) wherein users are
introduced to parallel programming within the context of their own code.
This is a particularly effective opportunity for user outreach and training
since it gives the on-site CTA lead an opportunity to meet and interact with
users on an individual basis and learn about their work within a semi-formal
classroom environment. Some of the more significant outreach interactions
are described below.

In collaboration with Henry Gabb of the NRC Computational Migration Group, 
assistance was provided to J. C. T. Wang of the Aerospace Corporation by 
analyzing the message flow in a section of his PVM code and with general 
support resulting in a successful code port to the IBM SP system. David Medina 
of AF Phillips Lab implemented a graph-based reordering strategy within the MAGI
solver resulting from PET consultation. The objective is to improve cache 
performance and interprocessor data locality. Results report ~30% execution time
reduction for two, four and sixteen processors compared to execution without
reordering. 

Extensive interaction with Fernando Grinstein of NRL has continued in the third 
contract year. Collaboration via phone, e-mail and personal visits occurred in 
order to provide user assistance in development of a parallel version of 
NSTURB3D capable of efficient execution on all CEWES MSRC computing platforms. 

A dual-level parallel algorithm using both MPI and OpenMP was designed and 
implemented into the CGWAVE solver supporting Zeki Demirbilek of CEWES CHL. This
resulted in dramatic reduction in turnaround time. Turnaround time for the 
demonstration case was reduced from 2.1 days to 12 minutes using 256 SGI O2K 
processors. This project also involved extensive collaboration with the SPP 
Tools team, and Tennessee and served as a testbed for their MPI_Connect Tool.

Collaboration was initiated with Bob Robins of NW Research Associates as
a result of the BYOC workshop, via follow-up email and phone contact. His
code has been analyzed for inherent parallelism, and continued collaboration
is planned to produce a parallel version of his solver. 

Specific support was provided to both the COBALT and OVERFLOW CHSSI development 
teams. Coordination with Hank Kuehn resulted in a special queue priority for
the COBALT development team to assist in timely debugging of their solver.
This helped identify an implementation problem that only manifested itself
for applications using more than forty processors. The CFD team worked to
secure FY99 allocation on CEWES MSRC hardware for the OVERFLOW development
team.


CSM: Computational Structural Mechanics CTA 
     (TICAM - Texas & ERC - Mississippi State)

The first step in outreach was the short course taught by Graham Carey at the 
June 1998 DoD HPCMP Users Group Meeting.  Rick Weed, the CSM On-Site Lead, has 
been instrumental in coordinating our interactions with the applications 
analysts at CEWES and elsewhere.  For example, during the CAM Workshop in 
November at CEWES, we met with on-site engineering analysts involved in 
application studies using CTH and EPIC. Follow up was made at the February PET
Annual Review by Carey and Littlefield with the EPIC analysts to discuss our 
recent adaptive grid capability.  As a result of this meeting we mapped out 
a strategy for collaborating with the users and testing their nonlinear material
models in our adaptive EPIC code. We also developed a goal for a parallel 
roadmap task and CEWES MSRC code migration plan to be undertaken in Year 4 or 
the following year.
  
David Littlefield has been working closely with the Sandia applications software
group (D. Crawford, G. Hertel).  He has also been in regular close contact with 
the major ARL users (K. Kimsey, D. Scheffler, D. Kleponis).  We have had
several discussions with Raju Namburu at CEWES and recently discussed tech 
transfer to transition the adaptive CTH code version. This transfer is being 
coordinated with Gene Hertel and the Sandia group. Carey has been
interacting with Rob Leland's group at Sandia on grid partitioning, grid 
quality, and parallel partitioning issues. Following his February visit to 
Sandia the group obtained the CHACO software and has implemented it at Texas.  
We are currently experimenting with this partitioning software and will be 
incorporating our space-filling curve scheme into it.


CWO: Climate/Weather/Ocean Modeling CTA
     (Ohio State)

Sadayappan and Welsh of Ohio State had three major co-ordination meetings with
CEWES MSRC user Bob Jensen during PET Year 3. These meetings were to plan
ongoing efforts on the parallelization and migration of the WAM wind-wave
model, and the coupling of WAM with the CH3D marine circulation model. SGI on-
site representative Carol Beaty was present at the first and second meetings,
and government CWO monitor Alex Carillo attended the second and third
meetings. There were numerous e-mail and telephone contacts with Bob Jensen
throughout the year, and Welsh made four additional one-week trips to CEWES
MSRC to provide CWO core support.

At the May 1998 WAM co-ordination meeting a major topic was the disagreement
of predictions of the pre-existing (MPI-based) parallel WAM code with those
of the original sequential WAM code. Subsequent debugging traced the problems
in MPI WAM to shallow water current-related propagation effects and inter-grid
communication in nested grid runs. The coding errors responsible for the
problems were not found, however, and it was agreed shortly after the August
1998 co-ordination meeting to replace MPI WAM with an OpenMP version of WAM.
The OSU team then deployed OpenMP WAM in the coupled CH3D/WAM system.

Apparent errors in the WAM treatment of current-induced wave refraction were
also discussed at the May 1998 WAM co-ordination meeting. Welsh subsequently
examined the WAM propagation scheme and found a sign error repeated several
times in the main propagation subroutine. A corrected version of the
subroutine was delivered to Bob Jensen.

In May 1998 Welsh also met with Bob Jensen, Don Resio, and Linwood Vincent of
CEWES to review the coupling physics being implemented in the
CH3D/WAM system. This meeting resulted in a modification of the WAM bottom
friction algorithm.

At the October 1998 WAM co-ordination meeting the major issue was the
unexpected termination of OpenMP WAM simulations on the SGI Origin2000.
Follow-up work with Carol Beaty and NRC Computational Migration Group staff 
Henry Gabb and Trey White showed that explicit stack size specification was 
necessary, and that aggressive optimization during code compilation caused small
errors in the WAM results. Welsh also found that if the WAM grid is trivially 
split into one block (the computational grid is divided into a user-specified 
number of blocks to save memory), the simulation will terminate unexpectedly.

In May 1998, Sadayappan and Zhang of Ohio State travelled to Mississippi
State to meet with CEWES MSRC user Billy Johnson. The purpose of
the trip was to discuss the physics required for the coupling of CH3D and
WAM at the atmospheric boundary layer, and the coupling of CH3D-SED and WAM
at the bottom boundary layer. The parallelization of CH3D-SED was also
planned with Puri Bangalore of MSU. CH3D-SED was subsequently parallelized
by MSU staff, and the parallelization was verified by OSU staff.

In March 1999, CWO On-Site Lead Steve Wornom met with CEWES MSRC users Lori
Hadley and Bob Jensen and SPPT On-Site Lead Clay Breshears, and Henry Gabb
and Ben Willhoite of the NRC Computational Migration Group, to discuss the 
parallelization of the SWAN wind-wave model. This has resulted in a joint 
CWO/CMG/SPP Tools effort to migrate the code to the CEWES MSRC parallel 
platforms.


EQM: Environmental Quality Modeling CTA
     (TICAM - Texas)

The EQM team working with Mark Noel of CEWES had verified version 1.0 of
parallel CE-QUAL-ICM in December of 1997, and in March of 1998, the first
10-year Chesapeake Bay calibration run had beencompleted for the EPA.
Starting in April 1998, we interacted mainly with Mark Noel to add features
to the parallel code needed to run the EPA scenarios.  We also trained him
on how to write routines for post-processing parallel runs.
>From March through September 1998, Noel was able to run 53 10-year
calibration runs, and 53 actual runs for the EPA Chesapeake Bay project,
using a toatal of 24663 cpu hours.

In November 1998, Carl Cerco of CEWES asked us to help improve the scalability 
of CE-QUAL-ICM for future planned large-scale work.  We analyzed the parallel
code using a 10-year Chesapeake Bay run as a benchmark, and improved
parallel CE-QUAL-ICM in several respects.  These improvements required
interaction with Mark Dortch, Carl Cerco, Mark Noel, and Barry Bunch of CEWES.  We improved the I/O performance and developed techniques which allowed the
code to run on more CPUs (up to 110) with good parallel performance.  The
new code was verified by the end of February 1999.

Discussions with Mark Dortch, Carl Cerco, Barry Bunch, and Mark Noel of CEWES 
led the us to improve the grid decomposition algorithms crucial to our method
of parallelization.  We investigated the use of the grid partition package
METIS 4.0.  We built an interface to this package, and it greatly improved
the scalability of CE-QUAL-ICM. Mark Dortch asked the EQM team to help transfer 
the parallel computing technology to the toxic version of the code.  We trained 
Terry Gerald and Ross Hall of CEWES in the basic techniques, and he was able on 
his own to code a parallel version of CE-QUAL-ICM/TOXI.


FMC: Forces Modeling and Simulation/C4I CTA 
     (NPAC - Syracuse)

Currently, our main DoD FMS user group that provides the application focus
and testbed for the WebHLA framework is the Night Vision Lab at
Ft. Belvoir which develops CMS as part of their R&D in the area of
countermine engineering. Our support for CMS includes:

* Building Parallel CMS module by porting sequential CMS to the Origin2000
  (and later on also to a commodity cluster).

* Integrating Parallel CMS module with other WebHLA federates
  described above (JDIS, PDUDB, SimVis).

* Planning future joint work with Ft. Belvoir. This includes
  development of joint proposals such as a currently pending CHSSI
  proposal on WebHLA for Metacomputing CMS.

We are also interacting closely with and provide PET support for the
current FMS CHSSI projects, including:

* FMS-3 where we are builing Web based interactive training for the
  SPEEDES simulation kernel;

* FMS-4 where we are acting as external technical reviewer and we
  recently developed a joint CHSSI proposal with SPAWAR for a follow-on
  project on using SPEEDES, Parallel IMPORT and WebHLA to build
  Intelligent Agent support for FMS.

* FMS-5 where we expect to directly participate and we were asked to
  provide our Object Web RTI as a test implementation of the RTI 1.3
  standard, to be certified by DMSO and used by FMS-5 as a fully
  compliant reference prototype.


C/C: Collaboration and Communications 
     (NPAC - Syracuse)

Since our primary C/C effort in Year 3 focused on the Tango Interactive
electronic collaboration system for use in education, training, and
small-group collaboration, our outreach efforts focused on enlarging
the base of knowledgeable and experienced users of the system as a base
to help support broader deployment of the tools.  This work was
conducted with on-site Training and C/C (or equivalent) support
personnel at the four MSRCs, as well as with staff at the Naval Research
Lab in DC.  Our collaborators at the Ohio Supercomputer Center gained
experience not only in operation and support of Tango from the point of
view of course recipients, but also as instructors.  All four MSRCs, as
well as OSC, now have Tango server installations, and along with NRL
have substantial experience with the client side of the system.  This
group of experienced users will form a core of support that will
facilitate wider use of Tango for both training and other
collaborative applications in the coming years.


SPPT: Scalable Parallel Programming Tools
      (CRPC - Rice/Tennessee)

Outreach to CEWES MSRC users during Year 3 by the SPP Tools team included the
following specific assists: 

* Marvin Moulton: Contact made though Steve Bova, on-site CFD lead, for 
  information about the HELIX code.

* David Medina: Various e-mail exchanges and various telephone contacts for 
  information of the MAGI code.  MAGI code is being studied at Rice as a test 
  bed of compiler optimization studies.

* Fred Tracy (CEWES): On possible collaboration on optimization of FEMWATER.

* Ann Sherlock, Jane Smith: On possible collaboration of parallelization of 
  STWAVE and tutoring program at Rice.

* Zeki Demirbilek (CEWES Coastal Hydraulics Lab): Clay Breshears implemented 
  MPI_Connect version of CGWAVE code.  This effort was in collaboration with 
  other members of CEWES MSRC PET, the NRC Compuational Migration Group and the
  CEWES MSRC Computational Science and Engineering group. The code went on to 
  win the Most Effective Engineering Methodology Award from the SC'98 HPC 
  Challenge competition.

* Bob Robins (Northwest Research Associates, Inc.): Found potential Fast Poisson
  Solver libraries during BYOC workshop held at CEWES MSRC.

Chuck Koelbel, Gina Goff and Ehtesham Hayder offered tutorials at CEWES 
MSRC and at NRL and discussed with the participants their codes and 
possible ways of parallelization.

Clay Breshears assisted the CEWES MSRC CSE group members with the use of the 
Vampir performance analysis tool and the TotalView debugger.  Also, he was able 
to help members of the RF Weapons Challenge Project (Kirtland AFB) with these 
tools.

CHSSI Project Support:

Tennessee has worked with CEWES MSRC user and CEN-1 CHSSI code developer
David Rhodes to use the Vampir performance analysis tool to analyze
and improve the performance of the Harmonic Balance Simulation code.
Mr Rhodes reports:

    "The Vampir toolset provided just what I was looking for in tuning
  my parallel application. This application contains tasks that range
  from small to large granulity. Vampir allowed me to view dynamic 
  program execution with a very low level of intrusion. After
  making some significant algorithmic changes - e.g. changing from a 
  dynamic to static scheduling approach - I was able to achieve much 
  better levels of scalability and parallel efficiency. The data needed
  to determine existing problem areas would have been much harder
  to gather without Vampir."

Challenge Project Support:

Tennessee has worked with two members of the RF Weapons Challenge Project team
Gerald Sasser and Shari Collela of AF Phillips Lab in their use of Vampir.  
Vampir was used by Sasser to find and fix a bottleneck in the Icepic code and to
significantly improve the communication performance of that code.
Collela has been unsuccessful so far in using Vampir on the Mach3 code
because the code is very large and produces huge and unwieldy trace files.
Tennessee plans to use the Mach3 code as as test case for new dynamic 
instrumentation techniques that can be used to dynamically turn Vampir tracing 
on and off and thus reduce the size of the tracefiles while still collecting 
trace data for "interesting" parts of program execution.

SC'98 HPC Challenge Support:

Graham Fagg of Tennessee worked with the CEWES MSRC team to use Tennessee's 
MPI_Connect system, along with OpenMP and MPI, to achieve multiple levels of 
parallelism in the CGWAVE harbor response simulation code which reduced the 
runtime for this code from months to days.  The team won the Most Effective
Engineering Methodology award for their SC'98 HPC Challenge entry.

User Support:

Tennessee has put together an SPP Tools repository at

  http://www.nhse.org/rib/repositories/cewes_spp_tools/catalog/

which lists programming tools being made available and/or supported
as part of PET efforts.  The tools include parallel debuggers,
performance analyzers, compilers and language analyzers, math libraries,
and parallel I/O systems.  In addition to giving information about
the available tools, the repository includes a concise matrix
view of what tools are available on what platforms with links to
site-specific usage information and tips and web-based tutorials.
Because CEWES MSRC users typically use multiple MSRC platforms,
an emphasis has been placed on providing cross-platform tools, such
as the TotalView debugger and the Vampir performance analysis tool
which both work on all MSRC platforms, but information has also been
provided about platform-specific tools in case the special features
of these tools are needed.

Tennessee has tested the installed versions of the tools and has worked with
CEWES MSRC systems staff to ensure that the tools are working correctly
in the programming environments used by CEWES MSRC users, including
the PBS queuing system, and has reported any bugs discovered to
the tool developers and followed up on getting them fixed.
An e-mail message about the repository was sent to CEWES MSRC users, and
an article about it has been written for the CEWES MSRC Journal.
The repository and the tools have already had an impact on the
NRC Computational Migration Group (CMG) which has started using some of
the tools such as TotalView on a daily basis to do their jobs more
effectively, and the CMG is encouraging other CEWES MSRC users to do likewise.


SV: Scientific Visualization
    (NCSA-Illinois & ERC-Mississippi State)

During Year 3, the PET visualization team worked with users, the PET
on-site staff for each CTA, and the CEWES MSRC visualization staff.  

Early in the year, we surveyed several users to discuss their data
management strategies.  This led to the summary report in which we
recommended that HDF, a data management package in use by NASA EOSDIS and
DOE's ASCI project, be introduced to the CEWES MSRC.  In July, we arranged for
HDF project lead, Dr Mike Folk, to visit the CEWES MSRC and present an overview
of HDF. 

We also worked with a variety of users to assist in visualization
production.  We worked with Andrew Wissink to produce a
visualization of his store separation problem.  Still imagery and
time-series animations were produced for Robert Jensen of CEWES.  Additional
visualization production was undertaken with Carl Cerco's data (CEWES).  This
includes a movie sequence that can take advantage of the very-wide screen
(the Panoram) installed at the CEWES MSRC.  

The PET Vis team has had long-term relationship with Robert Jensen,
particularly as it relates to customizing visualization tools to support
his wave modeling work.  We also work with Raju Namburu's team (CEWES) on novel
application of wavelet techniques to build structure-significant
representations of his data.  This is particularly important as 
Namburu's data sets are very large.  

In another long-term collaboration, the PET Visualization team has ongoing 
communication with Carl Cerco, Mark Dortch, and Mark Noel of CEWES, in relation 
to their Chesapeake Bay project and visual analysis of the output of the CEWES 
CEQUAL-IQM code.  This is a continuation of the relationship that was begun in 
Year 1.  This year, we have worked with them on defining their requirements for 
desktop visualization support, prototyping solutions for those needs, and 
iterating on the design process to refine their specifications.  We have 
provided them a production-quality version of a tool that they are currently 
using to view data from their 10- and 20-year productions run of the Chesapeake
Bay model.  This tool also supports a limited form of collaboration that
they are using to share their results with their project monitor at the
Environmental Protection Agency.

The CEWES PET Vis team has had a variety of contacts with the PET CTA
on-site staff, including Steve Bova (CFD) and Rick Weed (CSM), to consult on
how to best assist their users in the areas of computational monitoring and
visualization.  And we have worked with our PET academic counterparts, 
Mary Wheeler (EQM, Texas), Keith Bedford (CWO, Ohio State), and Geoffrey Fox
(FMS & C/C, Syracuse).

In Year 3, we also had numerous contacts with CEWES MSRC visualization 
personnel, including Michael Stephens, Richard Strelitz, Kent Eschenberg,
Richard Walters, and John West.  We have advised on new software
packages and techniques for visualization and virtual environments.  We
also have continuing contact with Milti Leonard, visualization staff at
Jackson State University, consulting on visualization software and
mentoring her in her own skill development.

MSU's major SV interaction has been with CSM scientist Raju Namburu of CEWES.
He has guided the project and has influenced the quality deliverables to a 
great extent.  In  addition, he also explored the use of several algorithms 
including volume rendering for his datasets and has spurred the development of 
tools to aid the main project.


University of Southern California

The USC team met with Rama Valisetty at CEWES as well as at USC to
discuss the computation and communication structure of key
applications in Computational Structural Mechanics (CSM).  This
interaction helped us focus our benchmarking and modeling efforts to
accommodate the problems that real end-users face in parallelizing
their code.

The USC team interacted with a DoD end-user through Valisetty.  An
unoptimized CFD application was given to USC.
USC employed the benchmark results and the IMH model to initially
analyze the computation and communication requirements of the original
algorithm.  A flow diagram was then created and the major bottleneck
sections of the code were identified.  Using the IMH model, the USC
team analyzed the performance of various data reorganization
techniques, communication latency hiding techniques, and computation
scheduling choices.  This allowed USC to optimize the algorithm to
minimize the communication overhead of parallelization while
distributing the workload evenly among the processors.  The optimized
algorithm developed by USC was scalable and portable.  Using 30
processing nodes, the performance was improved by approximately 5 fold
over the original algorithm.