YEAR 2 ACCOMPLISHMENTS

As has been noted above, the PET effort at the CEWES MSRC operates through providing core support to CEWES MSRC users, performing specific focused efforts designed to introduce new tools and computational technology into the CEWES MSRC, conducting training courses and workshops for CEWES MSRC users, and operating an HBCU/MI enhancement program.

The major accomplishments of the CEWES MSRC PET effort in enhancing the programming environment at the CEWES MSRC are described in this present section. The presentation here is according to CTAs and technical infrastructure support areas, but there is much overlap in effort. Finally, the cross-CTA Grid Workshop conducted as part of the CEWES MSRC PET effort in Year 2 is described.

Tools introduced into the CEWES MSRC in the course of Year 2 of the PET effort are listed in Table 4, and are described in Section VII. Specific CEWES MSRC codes impacted by the PET effort during Year 2 are listed in Table 5, and items of technology transfer into the CEWES MSRC are listed in Table 6. More detail on the Year 2 effort is given in the specific Focused Efforts described in the Appendix.

Training during Year 2 is described specifically in Section VIII, and specific outreach to CEWES MSRC users in Section IX. The accomplishments in the HBCU/MI component of the CEWES MSRC PET effort are covered in Section X.

------------------------------------------------------------------------------

CFD: Computational Fluid Dynamics CTA

During Year 2, the CFD team from the ERC at Mississippi State, with both on-site and at-university support, in the CEWES MSRC PET effort logged over 45 days of in-person consulting and collaboration with CEWES MSRC users through visits to CEWES MSRC and to remote user sites, reciprocal visits by users to the ERC at Mississippi State, participation in CEWES MSRC technical meetings, and participation in relevant national meetings for the purpose of making further contact with CEWES MSRC users and to foster technology transfer to and from the CEWES MSRC. Numerous phone and email contacts were made with CEWES MSRC users. The CFD team led the compilation of the CEWES MSRC user taxonomy reported in Section IX, which serves to guide the PET team in outreach to CEWES MSRC users.

Specific technical contributions were made by the CFD team in the following categories:

(a) Collation of parallel benchmarks. (b) Local support of CHSSI CFD codes. (c) Code migration assistance. (d) Development of appropriate parallel tools.

Delivery of tools and technology of immediate applicability is only one component of the CEWES MSRC PET program. We also assist CEWES MSRC users by investigating less mature technologies that have significant potential for improving DoD CFD simulation capability and applicability. Specifically in this regard, the CFD team provided individual training and worked in collaboration with DoD personnel during Year 2 to evaluate and demonstrate techniques of computational design by coupling CFD simulation capability, direct differentiation techniques, and nonlinear optimization.

Major Year 2 efforts on specific CEWES MSRC CFD user codes were as follows:

1. Demonstration of Computational Design Technology: HIVEL2D A free-surface, incompressible Navier-Stokes solver (HIVEL2D from CEWES Coastal and Hydraulics Lab) was selected as the CFD code utilized to demonstrate computational design technology. Computational design is applicable to a wide range of engineering problems relevant to DoD. HIVEL2D was selected because of general user interest within the DoD civil works user base, because of related military engineering applications such as maintenance of dam spillways and drainage networks, and because of the technology that the code exhibits. HIVEL2D is a true finite element solver that was developed primarily for application to simulation of supercritical flows. The techniques demonstrated can, in principle, be coupled with virtually any CFD solver, and HIVEL2D thus serves as a convenient demonstration test bed.

Accomplishments included successful coupling and demonstration of direct differentiation calculation of design space gradients from the HIVEL2D solver. This is the first published example that we are aware of coupling direct differentiation concepts with a true finite element solver. The technique was demonstrated for open-channel configurations that included viscous, supercritical flow simulation. This is significant due to the complexities introduced within the direct differentiation method due to poor differentiability of fluid properties across hydraulic jumps and by CFD turbulence models. Simple design optimization examples were generated using both gradient and genetic optimization algorithms. Technical summaries of this work were provided in a CEWES MSRC pre-print, a conference presentation and a CEWES MSRC seminar. Two abstracts were accepted for presentation in 1998.

2. MPI Parallelization of Hydraulic Simulation: CH3D During Year 2, the CFD team continued to collaborate with CEWES Coastal and Hydraulics Lab personnel regarding parallelization of CH3D using MPI for maximum portability. The CH3D solver is a three-dimensional, hydraulic simulation code that has been used to model various coastal and estuarine phenomena. It is a complex, legacy code with more than 18,000 lines. The code is capable of handling highly irregular geometric domains involving coast lines and rivers and has been widely used by the Army Corps of Engineers for both hydrodynamics simulations and as a foundation for many of the environment quality modeling and sediment transport models, as well as for remediation of land around DoD bases, environmental impact studies, ensuring waterways remain navigable, etc. As such, the reduction of execution time of CH3D is critical to various DoD hydrodynamics simulations and environment quality modeling projects.

This collaboration has been successful in that we have been able to (1) provide specific HPC support and training to a major CEWES MSRC user, (2) develop CH3D expertise that enables use as an HPC test and evaluation platform, and (3) utilize CH3D as a mechanism to promote collaboration with the CWO and EQM CTAs in the CEWES MSRC PET effort.

The initial MPI parallelization of CH3D was completed in 1997. Numerical experiments verify that the results for velocity, water surface elevation, salinity, and temperature distributions from the parallel code are identical to results from the sequential code (16 digits using free format output in Fortran). Almost every subroutine in the original code was modified. More than 2000 lines have been added to the code for parallelization. The execution time has been significantly reduced using multiprocessors. For example, a simulation that would have taken five days to finish on a single processor can now be done in just one day using 7 processors. For historical reasons, there have been a large number of legacy production codes written in Fortran for vector computers. The data structure, controlling logic, and the numerical algorithms in these codes are in general not suitable for parallel computers. The parallelization work on CH3D not only reduces the execution time for this particular code, but also provides valuable experience on migrating other complex application codes with similar data structures and numerical schemes from vector computers to parallel computers.

Year 2 efforts helped to improve the parallel efficiency by reducing memory usage, minimizing interprocessor communication and maintaining good load balance. Collaboration with the CWO and EQM teams was initiated to integrate hydraulic, sediment transport, and wave models for large-scale simulations. It is also planned, as a long-term goal, that different numerical integration algorithms will be tested and compared with the ADI-type scheme used in the CH3D code, and a better scheme will be developed and implemented in CH3D to further improve its efficiency on parallel computers.

3. Assistance with CHSSI Codes: OVERFLOW and FAST3D

Immediate HPC assistance was provided to CEWES MSRC users by direct collaboration on targeted codes. Efforts are continuing to collaborate with developers of the CHSSI CFD codes to ensure that these codes are available to users at the CEWES MSRC. Contacts were made with each of the CHSSI CFD teams with varying levels of interaction established. We expect to continue and expand this effort in Year 3. Particularly successful collaborations have been established with the CHSSI OVERFLOW (Army Aeroflightdynamics Directorate at NASA Ames Research Center) and FAST3D (NRL) development teams. In this regard, we provided the FAST3D team with the first port of their code to an IBM SP. Similarly, we assisted the OVERFLOW team with their first port of OVERFLOW-D1 to a Cray T3E.

4. Evaluation of Parallel Programming Models

There is an ongoing effort in the CFD team to evaluate the trade-offs associated with the various parallel programming models which are available. We are using two DoD codes as testbeds to this end. More specifically, we are collaborating with CEWES MSRC users David Medina of AF Phillips Lab and Fernando Grinstein of NRL. Medina has a Smooth Particle Hydrodynamics (SPH) code - MAGI - which is used by the Air Force for hypervelocity impact studies. We have assisted him with optimizing a shared memory version using PCF directives on the CEWES MSRC Origin 2000. We have also assisted him with an HPF implementation, and have developed a Fortran binding for the Posix Pthreads standard in an attempt to further optimize the shared memory version. We feel that it is important to investigate the feasibility of the Pthreads model for high-performance parallel computing. Grinstein has a three-dimensional, chemically reacting Navier-Stokes code - TURB3D - which is used by the Navy for investigating mixing phenomena associated with noncircular jets. We are assisting him to develop a scalable, parallel version of TURB3D. In this case, both MPI and HPF approaches are being used to evaluate the relative merits of each tool within the context of an actual DoD application.

---------------------------------------------------------------------------

CSM: Computational Structural Mechanics CTA

Year 2 support of the CSM CTA at CEWES MSRC consisted of on-site support of CTH and Dyna3D codes for simulation of damaged structures, provided by the ERC at Mississippi State, and evolving Focused Efforts performed by NCSA at Illinois which include

(a) Developing an Dyna3D-to-EPIC translator. (b) Parallelizing EPIC on the SGI Origin 2000. (c) Developing a tool to monitor CTH simulations while running on a high performance computer.

In addition, the on-site CSM lead assisted in the coordination of the SciVis effort for the damaged structures Challenge Project and supported the Grid Capabilities Enhancement Focused Effort and the grid workshop by coordinating the acquisition and evaluation of CSM-related grid codes.

Specific effort on Computational Structural Mechanics tools for the CEWES MSRC during Year 2 follows:

1. On-Site Support for Damaged Structures Challenge Project: CTH and Dyna3D

The on-site CSM lead, Rick Weed, provided support for Raju Namburu's (CEWES) damaged structures Challenge Project by assisting with the resolution of problems encoutered with the CTH computer code. Visualization support of this same CSM effort was provided by NCSA. This visualization support of CSM is discussed below as the second item in the Scientific Visualization part of this section.

2. Dyna3D-to-EPIC Translator

CEWES MSRC scientists use both Dyna3D and EPIC for analysis of CSM problems involving large deflection of structures with material nonlinearities. EPIC has similar functionality, with special emphasis in the area of projectile impact and penetration. Dyna3D input has become a standard for this type of analysis and is easily produced by many pre-processors. A Dyna3D-to-EPIC translator has been delivered by LeRay Dandy of NCSA at Illinois to CEWES MSRC users which will allow models to be created in Dyna3D format and subsequently translated to EPIC format with minimal user intervention.

3. EPIC Optimization on Origin2000

EPIC is increasingly being used for projectile impact and penetration problems by CEWES MSRC users. Most EPIC runs are on a single processor of the Cray C90. This machine is scheduled to be decommissioned during Year 3, which will pose a problem for EPIC users at CEWES MSRC. LeRay Dandy and Bruce Loftis at NCSA are developing a parallel version of EPIC which will run on the SGI Origin2000. A single-processor optimization of this code has been performed and has been verified using the test suite included in the EPIC distribution. NCSA engineers are working with CEWES engineers to verify the code with real engineering data. Future work will involve completion of the single-processor verifications and parallelization using OpenMP. In addition, NCSA will investigate introducing a new solver into the code.

4. Monitoring CTH Simulations with CUMULVS

Simulations using codes like CTH, Dyna3D, or EPIC can take many CPU hours. Often it is not possible to discern if the simulation is progressing appropriately until the end. NCSA is developing a tool which will monitor specific variables during the simulation. This will allow the CEWES MSRC user to track the progress of the simulation as it is running. During Year 2, CUMULVS and EnSight were installed on systems at NCSA. The NCSA CSM Team has been developing its skills with the respective packages. During Year 3, CUMULVS will be linked into CTH at the source code level and will be used to track variables during CTH simulations. The variable will be sent to a client workstation for visualization. EnSight will be used to visualize the variables from the CTH simulation.

-------------------------------------------------------------------------------

CWO: Climate/Weather/Ocean Modeling CTA

Year 2 effort in support of CWO at CEWES MSRC concentrated on the physics and coupling of circulation, wave and sediment codes of interest to CEWES MSRC users, performance improvements of the WAM code involved, and in support of a major CEWES MSRC user at NRL-Stennis.

1. Deployment of CH3D, WAM, and CH3D-SED

A major focus of the Year 2 CWO effort was the coupling of circulation, wave, and sediment codes of interest to CEWES MSRC users Robert Jensen and Billy Johnson (Coastal and Hydraulics Lab at CEWES). Implementation and testing was carried out using Lake Michigan as the modeling domain. This selection simplified the specification of accurate boundary conditions, and the lake is also the site of the NSF EEGLE project, an ongoing extensive data collection program.

Three codes have been deployed for Lake Michigan :

CH3D (marine circulation model): The deployment of CH3D involved the following tasks:

(a) The generation of an appropriate format grid(mesh) for Lake Michigan. (b) The modification of hardwired codings previously set for the New York Bight deployment. (c) Adaptation of the heat transfer computation to function correctly for Lake Michigan conditions. (d) Modification of the wind input codings.

We initially used the CH3D code to simulate the internal Kelvin waves and coastal upwelling fronts commonly observed in Lake Michigan. The simulation objectives were two-fold: Firstly, to examine the capability of the modified CH3D-s code to reproduce robust thermodynamical phenomena, and secondly, to tune the model's coefficients and dynamic options to the Lake Michigan environment.

WAM (wave model): The Lake Michigan WAM deployment uses a 3-minutes longitude by 2-minutes latitude, spherical coordinate computational grid. This corresponds to approximately 4-km square cells; the dimensions of the grid are 67 cells in the East-West direction by 139 cells in the North-South direction. This grid is equivalent to the curvilinear CH3D grid. The grid's depths were re-sampled from the NOAA 9-arc-second Lake Michigan bathymetry database. The basic WAM deployment was tested using idealized winds to generate typical fetch-limited wave growth, sea rotation in response to wind rotation, and wave decay.

CH3D-SED (an extension of CH3D, into which a sediment module has been coupled): The sediment module computes mobile-bed processes, including aggradation and scour, bed-material sorting, and movement of bedload and suspended load of nonuniform sediment mixtures. The tasks completed for the Lake Michigan deployment of CH3D-SED were the same as those for the basic CH3D deployment. In addition, further examination was made of the suitability of the sediment transport equations. It was found that at locations where the bathymetry has a large gradient, the parameterized thickness of the active layer can be greater than that of the lowest sigma layer. It will, therefore, be necessary to calibrate and modify the sediment transport equations in Year 3 so that they better reflect conditions in active bedload transport regions.

2. Coupling of CH3D, WAM, and CH3D-SED

The principal reasons for coupling a marine circulation model with a surface wave model are found in the dynamical interactions taking place in the surface and bottom boundary layers of a water body. At the surface, a wave model's wave stress and radiation stress terms provide additional adjustments to water levels and mean horizontal currents in the circulation model's barotropic mode. In the wave model there will be changes due to current-induced wave refraction and unsteady depth adjustments to depth-induced refraction. The bottom boundary layer is also influenced by wave-induced bottom stresses, resulting in a feedback to both wave and current circulation models. In the bottom boundary layer, the current-induced and wave-induced bottom stresses may be directed in different, even opposing, directions and the re-suspension, entrainment and deposition processes are nonlinearly related to the relative magnitudes of the wave- and current-induced bottom stresses. Thus, properly resolving the relative magnitudes of the two stresses is important to properly application of the wave-current bottom boundary condition. So far, the wave-current interactions have been effectively coupled in the surface boundary layer. The wave-current coupling in the bottom boundary layer is being implemented in the CH3D/WAM/SED coupling system.

In order to investigate the dynamical interaction between the waves and the current circulation, we implemented both simple one-way and more sophisticated two-way couplings between WAM and CH3D, with extension of the model physics as described above. The one-way coupling was designed to isolate the individual effects of wave motion (or current) from the effects of current (or waves). The two-way coupling more realistically represents wave-current interaction by including the feedback effects due to the dynamical interactions.

The most crucial technical point in developing a realistically interactive two-way coupling lies with the synchronization of the CH3D and WAM computations at the desired time instants. This has been achieved by inserting a synchronization module into both CH3D and WAM. A series of tests has been performed for various synchronization frequencies, and the coupling synchronization between CH3D and WAM can now be performed at an arbitrary, user-specified frequency.

3. Performance Improvement of WAM

Since WAM is a key code for CEWES MSRC users such as Robert Jensen, and is very computationally intensive, optimization of its performance has been a high priority. A parallel version of the WAM code had previously been developed under a CHSSI project. The CWO PET team at CEWES MSRC has worked on improving the performance of the code. Both single-node optimization issues and parallel performance aspects were investigated.

Parallel performance optimization involved two improvements: 1) improved load balancing and 2) reduced communication overhead. Improved load balancing was achieved through an iterative process of optimizing the choice of blocking parameters in WAM to make the number of grid points mapped to each of the processors as even as possible. The reduction of communication overhead was achieved by analyzing detailed timing traces of parallel executions to identify bottlenecks. By introducing some additional synchronization, it was possible to change the pattern of communication and improve performance. For example, a two-fold improvement in performance was achieved for the LUIS dataset on 8 processors of the SGI Origin 2000 at CEWES MSRC.

4. Optimization of the Navy Layered Ocean Model (NLOM) Model

The Navy Layered Ocean Model (NLOM) of Allan Wallcraft (NRL-Stennis), running at CEWES MSRC, is extremely compute-intensive. One of the most computationally demanding steps in the code is that of inverting a large dense matrix. As part of CWO effort at CEWES MSRC, a parallel matrix inversion routine was deployed on the Cray T3E that resulted in significant improvement for this bottleneck computation.

The NLOM model uses a pre-processor module to compute and invert a matrix which represents the boundary conditions to be subsequently applied in the simulation module. Preliminary runs indicated that pre-processing demands were excessive. Attempts were made to improve pre-processor performance by implementing the ScaLAPACK library routines in the T3E deployment of NLOM.

ScaLAPACK is a scalable version of the LAPACK simultaneous equation solution library, built upon the PBLAS and BLACS libraries. Difficulties arose in obtaining T3E versions of ScaLAPACK, PBLAS, and BLACS which were compatible with the Cray MPI multi-processing synchronization library. Complete Cray releases were not available, and the Oak Ridge National Laboratory versions designed for this application encountered problems related to differences in default data types between the T3E f90 and cc compilers. The incompatibilities were resolved by the redefinition of certain data types and the use of a special MPI flag, resulting in a consistent 64-bit package.

This implementation of ScaLAPACK resulted in significant performance enhancements. Differences between the matrices generated by the original and modified inversion procedures were consistently acceptable. Upgrades of Cray T3E ScaLAPACK and MPI libraries mean that additional efforts will be required for a stable ScaLAPACK version of the NLOM pre-processor to be secured. Ongoing work of this kind is therefore being performed by personnel at the University of Tennessee, also as part of the CEWES MSRC PET effort, as described below in the SPPT part of this section.

-----------------------------------------------------------------------------

EQM: Environmental Quality Modeling CTA

Year 2 efforts in support of the EQM CTA at CEWES MSRC were directed principally at parallization of CE-QUAL-ICM, web-based launching of ParSSim, and parallelization of ADCIRC:

1. Parallization of CE-QUAL-ICM

Our primary effort in Year 2 was the parallelization of CE-QUAL-ICM. This project involved a close collaboration between the EQM team at the University of Texas (UT) and CEWES MSRC users, in particular Carl Cerco, Barry Bunch and Mark Noel of the Coastal and Hydraulics Lab at CEWES. After discussions between the two groups, we decided to pursue a single program, multiple data (SPMD) approach, which would involve

(a) Development of a preprocessor code which decomposes the computational domain, using a mesh partitioning algorithm, and correspondingly decomposes all global input files.

(b) Modification of the ICM source code by incorporating MPI calls to pass information from one processor to another.

(c) Development of a postprocessor code that assembles the local output generated by each processor into global output files.

This approach to parallelization, which the UT team has used successfully on other projects, allows for scalability of the parallel computation, and hides much of the details of the parallelization from the everyday user. Scripts have been developed which the user can execute that run the pre-processor, compile and run the ICM code in parallel, and then run the post-processor. Therefore, to the user the input and output files are the same for both serial and parallel machines.

The pre-processor and post-processor codes were developed by the UT team. The pre-processor uses a space-filling curve algorithm to partition the mesh into sub-meshes for each processor. This algorithm is easy to implement and preserves locality of the mesh, so that subdomains with a good "surface-to-volume" ratio can be obtained.

The parallel version of CE-QUAL-ICM was developed in stages. After being given the source code by CEWES in early 1997, we developed a preliminary parallel code based on this version, which was completed in late summer of 1997. Simultaneously, the CEWES group was adding new features to the code, more components, new input files, etc., and these had to be incorporated into the parallel framework. This took an additional 3-4 months. By mid-December, testing began on this new code and proceeded through February.

As of late March 1998, the parallel code is being used in production mode by the CEWES team. A ten-year Chesapeake Bay simulation using 32 processors on the T3E was completed. Model calibration will be completed by mid-April and twenty-year scenarios commenced. On a single processor it was estimated that such scenarios would have been infeasible, taking over two weeks of CPU time. 2. Web-based Launching of ParSSim

The UT team developed a web-based, code "launching" capability, and demonstrated this capability on a UT parallel groundwater code, ParSSim. This simulator can handle multicomponent and multiphase flow and transport involving one fluid phase and an arbitrary number of mineral phases. It is scalable, has been fully tested, and is being employed on a collection of realistic applications.

This launching capability allows a remote user to login to a website, choose from a suite of datasets, submit a remote job on a parallel machine, and obtain graphical output of the results. A Perl script was written to drive the launching. Two groundwater remediaton data sets were constructed, which are representative of typical data sets in such problems. The ParSSim code was then executed on an IBM SP2 located at UT, on four processors. Graphical output was obtained using Tecplot and ported back to the user's machine.

A demonstration of this capability was given during the CEWES MSRC PET annual review in February. This effort is meant to serve as a prototype of web-based launching, which could be incorporated, for example, into the Groundwater Modeling System at CEWES.

3. Parallelization of ADCIRC

The UT team began an effort in the parallelization of ADCIRC, a shallow water circulation model used by several CEWES MSRC users (Norm Scheffner, Rao Vemulokanda, etc., of CEWES Coastal and Hydraulics Lab) The approach being taken here is very similar to the approach used for CE-QUAL-ICM. In particular, pre- and post-processors are being developed, and MPI calls are being added to the existing serial code. This development is being carried out in close collaboration with the authors of ADCIRC - Rick Luettich (North Carolina) and Joannes Westerink (Notre Dame) - who are funded by CEWES through other projects. Our goal is to incorporate the parallelism in such a way that all future versions of ADCIRC will have a parallel capability.

The accomplishments to date include the development of a pre-processor and a parallel version of ADCIRC. These codes are being tested on a number of data sets which exercise various aspects of the simulator. The pre-processor and parallel codes execute correctly for a number of the simpler data sets. Some of the more difficult cases involve large wind stress data sets and wetting and drying. The pre-processing of wind stress data and the parallelization of the wetting and drying code are proceeding, but substantial debugging and testing remains to be done.

------------------------------------------------------------------------------

FMS: Forces Modeling and Simulation/C4I CTA

Year 2 effort in support of the FMS CTA at CEWES MSRC consisted of on-site support of a major battle simulation program (SF Express) and focused effort on run-time infrastructure by NPAC at Syracuse.

1. Battle Simulations: SF Express Demos

During Year 2, the CEWES MSRC IBM SP played a major role in two record-setting military battle simulations. This was accomplished with the assistance of CEWES MSRC PET on-site team and the NRC infrastructure staff.

The Synthetic Forces (SF) Express application is based on the Modular Semi-Automated Forces (ModSAF) simulation engine with a scalable communications architecture running on SPPs from multiple vendors. The SF Express project is funded under the DARPA Synthetic Theater of War (STOW) program, and is supported by development teams at JPL, Caltech, and NRaD.

A record simulation of 66,239 entities, including fixed-wing aircraft, rotary-wing aircraft, fighting vehicles, and infantry, was conducted on November 20, 1997 at the DoD HPCMP Booth as part of the DoD contribution to the SC97 Conference. The simulation was executed by the JPL team lead by David Curkendall. The computers used for this simulation were the 256-processor IBM SP at CEWES MSRC, the 256-processor IBM SP at ASC MSRC and two 64-processor SGI Origin 2000s at ASC MSRC, all interconnected over the DREN.

Another record was set at the Technology Area Review and Assessment (TARA) briefings at NRaD on March 20, 1998 in San Diego. Again, the CEWES MSRC SP was a major player. This simulation was conducted by the Caltech team lead by Sharon Brunett. A total of 13 computers from 9 different sites were used to host the 100,298-vehicle entity level simulation. A total 1386 processors were used in the simulation. The simulation made use of software developed in the Globus project - a research effort funded by DARPA, DoE, and NSF to investigate and develop software components for next-generation high-performance internet computing. A list of the sites, numbers of processors, and vehicles simulated appears in the following table:

Site Computer Processors Vehicles ---- -------- ---------- -------- ASC MSRC SP 130 10,818 ARL MSRC SGI 60 4,333 SGI 60 3,347 Caltech HP 240 21,951 CEWES MSRC SP 232 17,049 HP HP 128 8,599 MHPCC SP 148 9,485 SP 100 6,796 NAVO MSRC SGI 60 4,238 NCSA SGI 128 6,693 UCSD SP 100 6,989 ------- --- ----- ------- Totals 1,386 100,298

2. Object Web Run-Time Infrastructure (RTI) Prototype

DMSO recently introduced a new integration framework for advanced simulation networking called High Level Architecture (HLA) and based on the Run-Time Infrastructure (RTI) software bus model. RTI enables federations of real-time/time-stepped and logical-time/event-driven simulations/federations, and it optimizes communication via event filtering and publisher/subscriber region/interest matching, supported by the Data Distribution Management (DDM) service.

Full and rapid DoD-wide transition to the HLA is strongly advocated by DMSO and is facilitated by open public specifications of all HLA components, extensive nation-wide tutorial programs, and prototype RTI implementations.

Given the systematic shift of the DoD training, testing and wargaming activities from physical to synthetic environments, and the ever-increasing computational demands imposed on advanced modeling and simulation systems, high performance distributed computing support for HLA will likely play a crucial role in the DoD Modernization Program.

At NPAC, we are currently developing a Java-based Web Object Request Broker (WORB) server that will support HTTP and IIOP protocols, and that will act as a universal node of our HPcc (High Performance commodity computing) environment. Given that the RTI object bus model is strongly influenced by CORBA and that DMSO is in fact interacting with OMG towards proposing HLA as CORBA simulation facility/framework, an early Java/CORBA-based RTI prototype seems to be a natural effort in the domain of interactive HLA training. Our Object Web/WORB-based RTI subset would support and integrate Web DIS (Java and VRML based) applications under development at the Naval Postgraduate School at Monterey CA, as well as more traditional and substantial simulation codes such as ModSAF and perhaps also SPEEDES, TEMPO or IMPORT, currently at the planning stage as possible FMS training targets for our PET activities at ARL.

By the end of this project, we will deliver a prototype object web (CORBA) based RTI kernel (subset) capable of running a simple demonstration application to be developed locally. This would serve as a demonstration of the integration of DMSO and web technologies and provide a freely available tool. A follow-on project could further develop the system into a full FTI implementation, at which point it would be possible to run real RTI applications as a demonstration of this tool.

------------------------------------------------------------------------------

SPPT: Scalable Parallel Programming Tools

The SPP Tools team of the PET effort at the CEWES MSRC has mounted a coordinated, sustained effort in Year 2 to provide DoD users with the best possible programming environment and with knowledge and skills for effective use of HPC platforms. This effort can be divided into four major thrusts:

(1) Working directly with users to understand requirements and provide direct help on their codes.

(2) Supplying essential software to meet those requirements.

(3) Training in the use of that software and in parallelism issues generally.

(4) Tracking and transferring technology with enhanced capabilities.

Our Year 2 effort in each of these categories is described below:

1. Working with Users: Code Migration, Pthreads, HPF

A key tactic for SPP Tools is to engage users in working on their codes. It is only through such interactions that we can identify real requirements for new tools, thus ensuring that they will be relevant to CEWES MSRC. (For purposes of this discussion, we include the NRC Code Migration Group [CMG] at CEWES MSRC as "users": in fact, many of our closest contacts with parallel programmers are through that group.) In addition to directing future SPP Tools efforts, these interactions often lead to direct collaborations on specific codes. These have the potential for double success stories: the code improvements lead the DoD user directly to doing better science, and the lessons learned about tools provide experience for future endeavors. The latter type of success is particularly important when the user is a member of the NRC Code Migration Group (CMG) at the CEWES MSRC, who will directly work with many CEWES MSRC users in the future.

Out of many projects with users in Year 2, we mention only two. Clay Breshears collaborated with members of the CMG to develop Fortran 90 bindings for Pthreads on the Power Challenge Array. In principle, the same bindings could be used on other shared-memory machines with Fortran 90 compilers; however, the detailed implementation might well differ. The importance of these bindings is that they provide a means for using fine-grain parallelism in a modern Fortran compiler. Henry Gabb of the CMG is using these bindings to parallelize David Medina's MAGI code (AF Phillips Lab) on the PCA and Origin 2000. The relevance of the bindings extends over a wide range of potential applications. A paper on the implementation and use of the bindings will be presented at the June 1998 DoD HPC Users Group meeting.

Fortran 90 code development and testing has also been partially a user-driven activity. Breshears worked in conjunction with Steve Bova of the CFD team to produce a Fortran 90 / MPI code and an algorithm for computing shared edges on finite-element grids. This is a very generic component of CFD codes, and its efficient migration to parallelism bodes well for production codes. It also illustrates the close interaction between SPP Tools and the CTA on-site support members of the PET effort at CEWES MSRC.

2. Supplying Essential Software: Parallel Debuggers, Performance Analysis

To efficiently build codes, it is vital to have basic systems software available. The SPP Tools team has been active in the effort to ensure that essential system software - such as BLAS libraries, MPI implementations, and compilers - are installed on CEWES MSRC machines. These implementations are then tested and bugs are reported, so that CEWES MSRC users can build their applications on a firm foundation of properly functioning systems and library software. In particular, the SGI/Cray implementation of MPI had a number of bugs discovered in the process of installing and testing the BLACS, which underlie ScaLAPACK, on the SGI/Cray T3E.

Although correctly functioning software is necessary, correctness alone is not sufficient for effective use of HPC platforms by large-scale, computationally intensive applications. These applications also require a high level of performance from the underlying software, necessitating the development and use of benchmarks. Thus, over the past year Susan Blackford and Clint Whaley have carried out timings on a portion of the ScaLAPACK timing suite running on CEWES MSRC platforms. In addition, Phil Mucci has been active in developing a suite of low-level benchmarks for measuring application-critical performance of key linear algebra operations, of the cache and memory hierarchy, and of the communication subsystem and then running them on CEWES MSRC platforms. Gina Goff has performed similar work on a set of benchmarks to evaluate HPF constructs. The results of these performance evaluations are being made available in technical reports and in on-line performance data repositories and will be presented at the June 1998 DoD HPC Users Group meeting.

As important as bringing the software to DoD sites is, it is also necessary to have a good facility for storing and announcing it. Shirley Browne and others at University of Tennessee-Knoxville (UTK) on the SPPT Tools team are putting in place a repository infrastructure at CEWES MSRC that will enable sharing of software, algorithms, technical information, and experiences within and across CTAs and MSRCs and with the broader HPC community. The repository infrastructure is based on an IEEE standard software cataloging format that is implemented in the Repository in a Box (RIB) toolkit produced by the National High-performance Software Exchange (NHSE) project. Use of a standard format and of RIB provides a uniform interface for constructing, browsing and searching virtual repositories that draw on resources selected, evaluated, and maintained by experts in autonomous discipline-oriented repositories. Interoperation mechanisms allow catalog information and software to be shared between repositories. Appropriate labeling and access control mechanisms support and enforce intellectual property rights and access restrictions where needed. The repository infrastructure will facilitate easy discovery and widespread use of tools such as those described above, as well as of application codes developed by CTA teams.

3. Training: Parallel Programming Techniques and Tools

Training in the use of tools, including both specific tools and general parallel programming techniques, is a vital part of our PET mission. The SPP Tools training program in Year 2 included workshops on Performance Analysis Tools, to acquaint users with software that could enhance their productivity as well as their codes; ScaLAPACK, to help users efficiently solve large dense linear systems; PETSc, to introduce users to modern template-oriented scientific libraries; and "Bring Your Own Code" (often taught in conjunction with the CMG), where CEWES MSRC users have a chance to receive assistance in dealing with their individual code problems.

In the interest of brevity, we will only discuss two of the courses we offered in Year 2. A highly tuned and efficient systems and library software base is essential for good application performance. However, for the results of performance evaluation studies to be of maximal benefit to DoD application developers and users, they need to be incorporated into an application performance analysis framework that provides the skills and tools needed to model, analyze, and tune application performance. To this end, UTK researchers have taught two courses in this area during the past year at CEWES MSRC - one on benchmarking and performance modeling, and one on code optimizations for MPPs. The first course shows how benchmark results can be used as a starting point and combined with application instrumentation, scalability analysis, and statistical analysis techniques to model and predict application performance on various HPC architectures. The second course covers performance optimization techniques ranging from tuning single-processor performance to tuning of communication patterns.

4. Tracking and Transferring Technology

As a comprehensive technology transfer program, the PET effort at the CEWES MSRC has a strong responsibility to evaluate and offer new technologies for DoD use. The SPP Tools team has been aggressive in both seeking out recent research and advertising it within DoD. The tools mentioned above (see also Table 4 and Section VII) are all examples of tools identified by the SPP Tools team, and their availability through RIB is a fine example of how such technologies can be made available to DoD. We mention here a few other examples of technology transfer.

Because CEWES MSRC users work on multiple platforms, they need portable performance analysis tools that allow them to analyze and compare application performance on different platforms without the burden of learning a different tool for each platform. Consequently, Shirley Browne led an effort at UTK to carry out an evaluation of currently available portable performance analysis tools, including both research and commercial offerings. A CEWES MSRC technical report describes the results. On the basis of this evaluation, the SPP Tools team chose a handful of tools for porting and further evaluation on CEWES MSRC platforms. VAMPIR, a commercial trace-based performance analysis tool, runs on all CEWES MSRC platforms, is highly robust and relatively scalable, and has already been used to achieve significant performance improvement on a challenge application. nupshot, a freely available trace visualization tool, has an intuitive easy-to-use interface that can quickly provide information about application communication performance. We are currently working on a robust version of nupshot and the MPE logging library that produces tracefiles for it. Furthermore, the SPP Tools team is working with developers to debug and implement Fortran 90 support for two other promising trace-based performance analysis tools, AIMS and SvPablo, as requested by CEWES MSRC users.

In the area of introducing cutting-edge research into CEWES MSRC production use, Graham Fagg and others at UTK are bringing MPI-Connect to CEWES MSRC. Although good communication performance can be achieved on a single platform by using the vendor's optimized MPI implementation, and metacomputing using MPI can be achieved using portable implementations such as MPICH, CEWES MSRC users now need the ability to couple MPI applications on multiple platforms while retaining vendor-tuned MPI implementations. To achieve this goal, the MPI-Connect system developed at UTK - which enables transparent intercommunication between different MPI implementations - is being ported to MSRC platforms and deployed to couple MPI applications in application areas such as FMS, CWO, and IMT. Without a high-level metacomputing system like this or Legion (in use at some other MSRCs), developing such multi-disciplinary applications is virtually hopeless.

CEWES MSRC application developers also need the best available debugging and performance tools to enable them to quickly find and fix bugs and performance problems. However, those tools that are available have not been put into wide use for various reasons, including lack of awareness by users, poor user interface, and steep learning curves for different tools on different platforms. Ideally, a portable parallel debugging interface should be available across all CEWES MSRC platforms. Browne and Koelbel participated during the past year in a standardization effort, called the High Performance Debugging Forum (HPDF), which has produced a specification for a command-line parallel debugging interface that addresses important issues such as scalability and asynchronous control. They have solicited CEWES MSRC user input throughout the HPDF process and used it to help guide choices about what features to include and how the interface should appear. Vendor participation in HPDF indicates that the standard will be incorporated into commercial debuggers, and public-domain reference implementations are planned for the IBM SP and the SGI Origin Thus, the efforts of HPDF promise to eventually provide a portable, easy-to-use command-line debugger for CEWES MSRC platforms.

The HPDF experience is only one of a number of efforts by the SPP Tools team to form better connections between CEWES MSRC users and the wider HPC community. Team members have attended approximately one meeting per month, publishing trip reports that have been widely circulated within DoD. Many CEWES MSRC users have commented on the value of these reports in keeping them abreast of the computational science field, even when the conference is not in their direct line of interest.

-----------------------------------------------------------------------------

SV: Scientific Visualization

Scientific Visualization training efforts during Year 2 included development and delivery of a class in using the new visualization package VTK (the Visualization ToolKit). This 2-day class was delivered on-site at the CEWES MSRC. Future work will include building a web-based tutorial for this package. We also directed effort toward furthering the skills of two individuals. John West, a CEWES MSRC employee currently on long-term training leave, spent the summer with us at NCSA. West worked as a member of the NCSA Visualization team on a prototype system for delivering visualization capability to users who are remote from the CEWES MSRC. Milti Leonard of Jackson State University, and a member of the CEWES MSRC PET team, worked with us at NCSA from September through December building her skills in C++ programming, HDF, and VTK. We will continue to work with Leonard, at least through the life of the PET program. She will concentrate on techniques for supporting visualizat! ion among remote users.

Specific effort on Scientific Visualization tools for the CEWES MSRC during Year 2 follows:

1. Collaborative Visualization: VisGen

User-directed, technology-transfer efforts in Year 2 included delivering two end-user tools. The VisGen tool (currently in an alpha release) is in use by Carl Cerco and his team at CEWES Coastal and Hydraulics Lab in analyzing data from their simulations of phenomenon in the Chesapeake Bay. In this work, Cerco's team models 10- and 20-year time periods, modeling concentration and transport of over 20 components (such as chlorophyll, dissolved oxygen, nitrogen, etc.) Data of this complexity could not be analyzed without visual techniques. The VisGen tool was designed and delivered to assist Cerco's team in analyzing this data. It also allows the team to capture the visualization in a web-based media, allowing Cerco to share his results with his colleagues and project managers at the Environmental Protection Agency.

2. Damaged Structures Challenge Project: Structures Visualization

The second end-user tool is an application for viewing output from the structures codes CTH and Dyna3D. This was in support of the DoD Challenge in large-scale shock physics and structural deformation, directed by Raju Namburu of the CEWES Structures Lab. The goals of the visualization activity were to (1) provide support for this particular Challenge application, (2) highlight the science by showing the visualization at SC97, and (3) work toward advancing the methodology for managing and interpreting data from very large-scale calculations. We followed a strategy of sub-sampling the CTH data to make it more manageable, mapping that data to geometric form, and supporting interactive exploration of the playback of that geometry. Similarly, for the Dyna3D data, we extracted the region of highest interest, decimated the geometry where possible, and supported interactive playback of the geometry. To support the researchers needs for sharing results with their colleagues, we i! ncorporated support for image- and movie-capture. This application was used by the researchers to visually validate the results of their simulations. It was also used by the CEWES researchers to show and explain their work at SC97.

3. Visualization ToolKit: VTK

In addition to end-user tools, the CEWES MSRC PET team transferred certain technologies to the visualization specialists at the CEWES MSRC. We shared our expertise in VTK. We also made available a variety of NCSA-developed tools, including the audio development library vss, image- and movie-capture code, and a VTK-to-Perfomer utility for using VTK on the ImmersaDesk.

4. Multiresolutional Representation: Terascale Visualization

Preliminary Investigations into terascale visualization were started in December by Raghu Machiraju of the ERC at Mississippi State, in coordination with the NCSA Visualization team. Basic investigations into the use of multiresolutional representation for computational datasets and comparative visualization were conducted. As a result of these investigations, a journal paper was accepted for publication in a special issue of IEEE Transactions and another paper was submitted to Visualization'98 to be held in Research Triangle Park, North Carolina.

Trips to IBM TJ Watson and other companies engaged in analysis and visualization of large datasets were made. Visualization experts DoE labs (Livermore and Los Alamos) and NASA Ames were contacted and trips have been scheduled. A state-of-the report on terascale visualization is being prepared.

-----------------------------------------------------------------------------

C/C: Collaboration/Communication

During Year 2, the Collaboration/Communication technical infrastructure at the CEWES MSRC was supported by CEWES MSRC PET effort at both NCSA at Illinois and NPAC at Syracuse.

1. Website and Collaborative Environment: NCSA (Illinois) Efforts

The primary objectives of the C/C effort at NCSA was to promote better information dissemination to CEWES MSRC PET users and to provide team collaboration tools for PET management and researchers. This was accomplished in Year 2 through

(a) Refinement of the CEWES MSRC PET website framework and modification of the Collaboration/Communication webpages.

(b) Deployment of an initial collaboration environment - netWorkPlace - enabling asynchronous postings of status reports, meetings and discussions between CEWES MSRC PET team members.

A brief description of the activities in (a) and (c) are described in the following sections. The activities of the netWorkPlace Focused Effort project in (b) is discussed in the Appendix. The CEWES MSRC PET website serves as a mechanism to provide timely information to users on PET program activities. NCSA developed the framework for implementing pages on the CEWES MSRC PET website in a consistent and uniform manner. This framework was developed to ensure ease of navigation throughout the site regardless of the web browser used and accessibility to visually impaired users. This framework included HTML structures for presenting frame and no-frame views of the documents as appropriate for the end-user browser software and provided recommendations for universal access considerations. Additional website support was provided through discussion with the CEWES MSRC PET webmaster, and a report was written describing website log analysis tools and how they could be used to better administer the site and to provide information to web content providers on the effectiveness of their pages.

NCSA provides website support to three of the MSRC PET sites: CEWES, ASC, and ARL. Because of the similarity of support provided to each site, it is beneficial to each site to leverage these activities across MSRCs. The mechanism chosen to support this leveraging was to build a sense of community among the MSRC PET webmasters to facilitate communication and sharing across sites. NCSA held a MSRC PET webmasters meeting in February 1998 to begin fostering this sense of community. Presentations on technology areas of interest to the web masters were given, and discussions were held on how to foster further collaboration the web masters. The meeting attendees felt that the meeting was very beneficial and expressed a desire for continuation of this effort. An online discussion forum moderated by NCSA was initiated in February, 1998 and has been well received by the web masters. Additional face-to-face and online meetings will be hosted by the individual sites.

2. Tango and Search Engines: NPAC (Syracuse) Efforts

The growth of web technologies offers some special opportunities to facilitate the work of CEWES MSRC PET team and CEWES MSRC researchers. Researchers at Syracuse University's Northeast Parallel Architectures Center (NPAC) have long been on the cutting edge of using web technology in support of high-performance computing. Tango is a Java-based collaborative tool that offers chat, whiteboard, and shared web browser capabilities, as well as two-way audio and video conferencing, which was developed previously with support from the DoD's Rome Laboratory and SU's L.C. Smith College of Engineering and Computer Science. Besides deploying Tango for a CEWES MSRC PET-supported distance education project with Jackson State University (see Sections VIII and X), NPAC researchers have also expanded the capabilities of Tango to support consulting and software development activities in geographically separated groups with the addition of a shared tool to view and modify source code as well as to debug and analyze performance.

Perhaps even more apparent than facilitating collaboration, advances in web and internet technologies have greatly increased the amount of information which can be found on the network for a broad range of subjects. This can be an important resource for CEWES MSRC researchers, but only if it is possible to locate the desired information in the first place. To facilitate access and management of networked information, NPAC is introducing relational database systems coupled with web servers to the CEWES MSRC PET program. Initial applications include the management of large websites and the development of search engines focused on particular knowledge domains. In the latter case, a search engine focusing on grid generation (a technology cutting across several CTA areas) was developed as a prototype (see Section VII), and plans are in place for another search engine focus on Climate Weather and Oceanography (CWO) resources.

-----------------------------------------------------------------------------

Cross-CTAs: Gridding Workshop

As a part of the CEWES MSRC PET Year 2 effort, an evaluation of currently available grid codes (COTS, freeware, and research codes) was conducted at CEWES MSRC, and a workshop on the utility of grid generation systems for MSRC users was held at the University of Texas in Austin in February 1998. This grid workshop was targeted specifically at the five "grid-related" CTAs: CFD, CSM, CWO, EQM, and CEA.

A total of 42 attendees participated in this grid workshop:

8 from CEWES MSRC 2 from ASC MSRC 1 from NAVO MSRC 2 from NRL 1 from LLNL 2 from Sandia 17 from CEWES MSRC PET 1 from ASC/ARL MSRC PET 1 from ARL MSRC PET 1 from NAVO MSRC PET 6 from Texas, but not PET The purpose of this grid workshop was discovery and strategy:

(a) Identify the needs of CTA users that are not being met with currently available grid (mesh) generation systems.

(b) Formulate strategy to work with grid code developers and/or vendors to meet those needs.

The mode of this workshop was evaluation and focused discussion:

(a) At CEWES MSRC, evaluate all currently available grid generation systems of potential interest to CTA users and report the results at the WS.

(b) Report at the WS the capabilities of currently available geometry interfaces to grid generation systems.

(c) Report at the WS the capabilities of currently available domain decomposition and other parallel considerations for grid generation.

(d) Hear from the CTA users at the WS the grid-related needs that are not being met with grid generation systems now in use.

(e) Through focused and directed discussion at the WS, formulate strategy to meet the grid-related needs identified. The development of a new grid generation package from scratch was specifically *not* a purpose of this workshop.

This grid workshop also served to broaden the awareness of the availability of grid generation resources in the CEWES MSRC user community (see Section VII).

This grid workshop focused on four specific gridding issues:

(a) CAD and other input interfaces.

(b) Adaptation driven by solution systems.

*9d) Scalable parallel concerns, including decomposition.

The intent of this workshop was to compare the currently available grid generation capabilities with the identified needs of CEWES MSRC users in the "grid related" CTAs, and then to formulate a strategy to advance grid generation capability to meet those needs. This strategy - a Year 3 effort - may include interactions with commercial and/or research grid code developers to add features to existing grid codes or the development of add-ons, wrappers, translators, etc for attachment to existing grid codes.

This evaluation and strategy must first be confirmed through extended interactions with the CTA Leaders and the CEWES MSRC users during Year 3 of the CEWES MSRC PET effort. Execution of the strategy for enhancing grid generation capability in the MSRCs then must proceed during the years three through five of the PET effort. This effort should clearly involve all four MSRCs, with at least coordination - and possible collaboration - with the DoE ASCI effort. Clear stages of delivery, user training, and user evaluation of enhancing components must be delineated as a part of the strategy that emerges.

-------------------------------------------------------------------------------