YEAR 2 ACCOMPLISHMENTS

The major accomplishments of the CEWES MSRC PET effort in enhancing the programming environment at the CEWES MSRC are described in this section. The grouping is according to CTAs and technical infrastructure support areas, but there is much overlap in effort. Finally, the cross-CTA Grid Workshop conducted as part of the CEWES MSRC PET effort in Year 2 is described. More detail on Year 2 effort is given in the specific Focused Efforts in the Appendix.

Training during Year 2 is described specifically in a later section, as is specific outreach to CEWES MSRC users.

Specific CEWES MSRC codes impacted by the PET effort during Year 2 are listed in Table __, and items of technology transfer into the CEWES MSRC are listed in Table __.

------------------------------------------------------------------------------

CFD: Computational Fluid Dynamics CTA

During Year 2, the CFD team logged over 45 days of in-person user consulting and collaboration through visits to user sites, reciprocal visits by users, participation in CEWES MSRC technical meetings, and participation in relevant national meetings. Numerous phone and email contacts were made with CEWES MSRC users. Specific technical contributions were also made including

(a) collation of parallel benchmarks, (b) local support of CHSSI CFD codes, (c) code migration assistance, (d) development of appropriate parallel tools.

Delivery of tools and technology of immediate application is only one component of the CEWES MSRC PET program. We would be remiss if we neglected the duty to help our DoD customer investigate less mature technologies that have significant potential for improving DoD CFD simulation capability and applicability. During Year 2 the CFD team provided individual training and worked in collaboration with DoD personnel to evaluate and demonstrate computational design by coupling CFD simulation capability, direct differentiation techniques, and nonlinear optimization.

Description of effort on specific CEWES MSRC user codes follows:

1. HIVEL2D A shallow-water solver (HIVEL2D from CEWES CHL) was selected as the CFD code utilized to demonstrate this technology. HIVEL2D was selected because of general user interest within the DoD civil works user base and because of the technology that the code exhibits. HIVEL2D is a true finite element solver that was developed primarily for application to simulation of supercritical flows. Selection of this test bed consequently includes hydraulic jumps which are analogous to shock waves that would be of interest in compressible flows. Accomplishments included successful coupling and demonstration of direct differentiation calculation of design space gradients from the HIVEL2D solver. This is the first published example that we are aware of coupling direct differentiation concepts with a true finite element solver such as HIVEL2D. The technique was demonstrated for open-channel configurations that included viscous, supercritical flow simulation. This is significant due to the complexities introduced within the direct differentiation method due to poor differentiability of fluid properties across hydraulic jumps and by CFD turbulence models. Simple design optimization examples were generated using both gradient and genetic optimization algorithms. Technical summaries of this work were provided in a CEWES MSRC pre-print, a conference presentation and a CEWES MSRC seminar. Two abstracts were accepted for presentation in 1998.

2. CH3D During Year 2, the CFD team continued to collaborate with CEWES Coastal and Hydraulics Lab personnel regarding parallelization of CH3D using MPI for maximum portability. This collaboration has been successful in that we have been able to (1) provide specific HPC support and training to a major MSRC user, (2) develop CH3D expertise that enables use as an HPC test and evaluation platform, and (3) utilize CH3D as a mechanism to promote collaboration with the CWO and EQM CTAs. The CH3D solver is a three-dimensional, hydraulic simulation code that has been used to model various coastal and estuarine phenomena. It is a complex, legacy code with more than 18,000 lines. The code is capable of handling highly irregular geometric domains involving coastal lines and rivers and has been widely used by Army Corps of Engineers for both hydrodynamics simulations and as a foundation for many of the environment quality modeling and sediment transport models. As such, the reduction of execution time of CH3D is critical to various DoD hydrodynamics simulations and environment quality modeling projects.

The initial MPI parallelization of CH3D was completed in 1997. Numerical experiments verify that the results of velocity, water surface elevation, salinity, and temperature distributions from the parallel code are identical to that from the sequential code (16 digits using free format output in Fortran). Almost every subroutine in the original code was modified. More than two thousand lines have been added to the code for parallelization. The execution time has been significantly reduced using multiprocessors. For example, a simulation that would have taken five days to finish on a single processor can now be done in just one day using 7 processors. For historical reasons, there have been a large number of legacy production codes written in Fortran for vector computers. The data structure, controlling logic, and the numerical algorithms in these codes are in general not suitable for parallel computers. The parallelization work on CH3D not only reduces the execution time for this code, but also provides valuable experience on porting other complex application codes with similar data structures and numerical schemes from vector computers to parallel computers.

Year 2 efforts helped to improve the parallel efficiency by reducing memory usage, minimizing interprocessor communication and maintaining good load balance. Collaboration with CWO and EQM was initiated to integrate hydraulic, sediment transport, and wave models for large-scale simulations. It is also planned that, as a long term goal, different numerical integration algorithms will be tested and compared with the ADI type scheme used in the CH3D code, and a better scheme will be developed and implemented in CH3D to further improve its efficiency on parallel computers.

3. CHSSI Codes

Immediate HPC assistance was provided to CEWES MSRC users by direct collaboration on targeted codes. Efforts are continuing to collaborate with developers of the CHSSI CFD codes to ensure that these codes are available to users at the CEWES MSRC. Contacts were made with each of the CHSSI teams with varying levels of interaction established. We expect to continue and expand this effort in Year 3. A particularly successful collaboration has been established with the CHSSI OVERFLOW development team.

---------------------------------------------------------------------------

CSM: Computational Structural Mechanics CTA

Year 2 efforts in support of the CSM CTA at CEWES MSRC were divided between on-site support of CEWES MSRC users with the CTH and Dyna3D codes applied to the simulation of damaged structures and visualization support of these same users. This visualization support of CSM is discussed below as the second item in the Scientific Visualization section.

-------------------------------------------------------------------------------

CWO: Climate/Weather/Ocean Modeling CTA

Year 2 effort in support of CWO at CEWES MSRC concentrated on the physics and coupling of circulation, wave and sediment codes of interest to CEWES MSRC users ( Dr Bob Jensen and Dr Billy Johnson) at CEWES for coastal region dynamics, with extension of the model physics as required. The implementation and computational testing of the coupled codes are made using Lake Michigan as the modeling domain. This selection simplifies the specification of accurate boundary conditions, and the lake is also the site of the NOAA EEGLE project, an ongoing extensive data collection program.

Three codes - CH3D (marine circulation model), WAM (wave model) and CH3D-SED (bottom boundary layer model) - have been modified and deployed for the lake. Both one-way and two-way couplings have been made between WAM and CH3D. The dynamical effects of wave-current interactions at the surface are reported for demonstration examples. Wave-current coupling in the benthic boundary layer will be included in the follow-on Year 3 of this project. The following table gives a brief summary on the tasks that we have carried out during Year 2:

Task Completion Date

Deploy WAM-s in Lake Michigan 10/97 Deploy CH3D-s in Lake Michigan 10/97 Parallel WAM (WAM-p) 2/98 Deploy CH3D-p in Lake Michigan 3/98 Modify CH3D-s to include heat transport 11/98 Deploy, modify CH3D-SED-s for Lake Michigan 12/97 Modify WAM-s to include unsteady current 1/98 Modify CH3D-s to include wave's effects 1/98 One-way coupling of WAM-s with CH3D-s 1/98 Two-way coupling of WAM-s with CH3D-s 3/98

Here WAM-s refers to the sequential version of WAM, CH3D-p refers to the parallel version of CH3D, etc.

-----------------------------------------------------------------------------

EQM: Environmental Quality Modeling CTA

Year 2 efforts in support of the EQM CTA were directed principally at parallization of CE-QUAL-ICM, web-based launching of ParSSim, and parallelization of ADCIRC:

1. Parallization of CE-QUAL-ICM:

Our primary effort in Year 2 was the parallelization of CE-QUAL-ICM. This project involved a close collaboration between the UT (University of Texas) team and CEWES MSRC users, in particular Carl Cerco, Barry Bunch and Mark Noel at CEWES. After discussions between the two groups, we decided to pursue a single program, multiple data (SPMD) approach, which would involve

(a) development of a preprocessor code which decomposes the computational domain, using a mesh partitioning algorithm, and correspondingly decomposes all global input files,

(b) modifying the ICM source code by incorporating MPI calls to pass information from one processor to another,

(c) development of a postprocessor code which assembles the local output generated by each processor into global output files.

This approach to parallelization, which the UT team has used successfully on other projects, allows for scalability of the parallel computation, and hides much of the details of the parallelization from the everyday user. Scripts have been developed which the user can execute that run the preprocessor, compile and run the ICM code in parallel, and then run the postprocessor. Therefore, to the user the input and output files are the same for both serial and parallel machines.

The preprocessor and postprocessor codes were developed by the UT team. The preprocessor uses a space-filling curve algorithm to partition the mesh into sub-meshes for each processor. This algorithm is easy to implement and preserves locality of the mesh, so that subdomains with a good ``surface-to-volume'' ratio can be obtained.

The parallel version of CE-QUAL-ICM was developed in stages. We were given source code in early 1997 and developed a preliminary parallel code based on this version, which was completed in late summer of 1997. Simultaneously, the CEWES group was adding new features to the code, more components, new input files, etc., and these had to be incorporated into the parallel framework. This took an additional 3-4 months. By mid-December, testing began on this new code and proceeded through February.

As of late March 1998, the parallel code is being used in production mode by the CEWES team. A ten-year Chesapeake Bay simulation using 32 processors on the T3E was completed. Model calibration will be completed by mid-April and twenty-year scenarios commenced. On a single processor it was estimated that such scenarios would have been infeasible, taking over two weeks of CPU time. Therefore, we feel, and CEWES personnel agree, that this project has been a huge success.

2. Web-based Launching of ParSSim:

The UT team developed a web-based, code ``launching'' capability, and demonstrated this capability on a UT parallel groundwater code, ParSSim. This simulator can handle multicomponent and multiphase flow and transport involving one fluid phase and an arbitrary number of mineral phases. It is scalable and has been fully tested and is being employed on a collection of realistic applications.

This launching capability allows a remote user to login to a website, choose from a suite of datasets, submit a remote job on a parallel machine, and obtain graphical output of the results. A perl script was written to drive the launching. Two groundwater remediaton data sets were constructed, which are representative of typical data sets in such problems. The ParSSim code was then executed on an IBM SP2 located at UT, on four processors. Graphical output was obtained using Tecplot, and ported back to the user's machine.

A demonstration of this capability was given during the CEWES MSRC PET annual review in February. This effort is meant to serve as a prototype of web-based launching, which could be incorporated into the Groundwater Modeling System, for example, at CEWES.

3. Parallelization of ADCIRC:

The UT team began an effort in the parallelization of ADCIRC, a shallow water circulation model used by several CEWES MSRC users (Norm Scheffner, Rao Vemulokanda, etc., of CEWES) The approach being taken here is very similar to the approach used for CE-QUAL-ICM. In particular, pre- and postprocessors are being developed, and MPI calls are being added to the existing serial code. This development is being carried out in close collaboration with the authors of ADCIRC, Rick Luettich and Joannes Westerink, who also collaborate and are funded by CEWES. Our goal is to incorporate the parallelism in such a way that all future versions of ADCIRC will have a parallel capability.

The accomplishments to date include the development of a preprocessor and a parallel version of ADCIRC. These codes are being tested on a number of data sets which exercise various aspects of the simulator. The preprocessor and parallel codes execute correctly for a number of the simpler data sets. Some of the more difficult cases involve large wind stress data sets and wetting and drying. The preprocessing of wind stress data and the parallelization of the wetting and drying code are proceeding, but substantial debugging and testing remains to be done.

------------------------------------------------------------------------------

FMS: Forces Modeling and Simulation/C4 CTA

Year 2 effort in support of the FMS CTA consisted of on-site support of a major battle simulation program (SF Express) and focused effort on run-time infrastructure at NPAC at Syracuse.

1. SF Express Demos

During Year 2, the CEWES MSRC IBM SP played a major role in two record setting military battle simulations. This was accomplished with the assistance of CEWES MSRC PET and the NRC infrastructure staff:

The Synthetic Forces (SF) Express application is based on the Modular Semi-Automated Forces (ModSAF) simulation engine with a scalable communications architecture running on SPPs from multiple vendors. The SF Express project is funded under the DARPA Synthetic Theater of War (STOW) program, and is supported by development teams at JPL, Caltech, and NRaD.

A record simulation of 66,239 entities, including fixed wing aircraft, rotary wing aircraft, fighting vehicles, and infantry, was conducted on November 20, 1997 at the DoD HPCMP Booth at the San Jose McEnery Convention Center as part of the DoD contribution to the ninth annual SC97 Conference. The simulation was executed by the JPL team lead by Dr. David Curkendall. The computers used for this simulation were the 256-processor IBM SP at CEWES MSRC, the 256-processor IBM SP at ASC and two 64-processor SGI Origin2000s at ASC, all interconnected over the DREN.

Another record was set at the Technology Area Review and Assessment (TARA) briefings at NRaD on March 20, 1998 in San Diego. Again, the CEWES MSRC SP was a major player. This simulation was conducted by the Caltech team lead by Dr. Sharon Brunett. A total of 13 computers from 9 different sites were used to host the 100,298 vehicle entity level simulation. A total 1,386 processors were used in the simulation. The simulation made use of software developed in the Globus project, a research effort funded by DARPA, DoE, and NSF to investigate and develop software components for next-generation high-performance internet computing. A list of the sites, numbers of processors, and vehicles simulated appears in the following table:

Site Computer Processors Vehicles ---- -------- ---------- -------- ASC MSRC SP 130 10,818 ARL MSRC SGI 60 4,333 SGI 60 3,347 Caltech HP 240 21,951 CEWES MSRC SP 232 17,049 HP HP 128 8,599 MHPCC SP 148 9,485 SP 100 6,796 NAVO MSRC SGI 60 4,238 NCSA SGI 128 6,693 UCSD SP 100 6,989 ------- --- ----- ------- Totals 1,386 100,298

2. Object Web RTI Prototype

DMSO introduced recently new integration framework for advanced simulation networking called High Level Architecture (HLA) and based on Run-Time Infrastructure (RTI) software bus model. RTI enables federations of real-time/time-stepped and logical-time/event-driven simulations/federates and it optimizes communication via event filtering and publisher/subscriber region/interest matching, supported by the Data Distribution Management (DDM) service.

Full and rapid DoD-wide transition to the HLA is strongly advocated by DMSO and facilitated by open public specifications of all HLA components, extensive nation-wide tutorial programs and prototype RTI implementations.

Given the systematic shift of the DoD training, testing and wargaming activities from the physical to synthetic environments, and the ever increasing computational demands imposed on advanced modeling and simulation systems, high performance distributed computing support for HLA will likely play the crucial role in the DoD Modernization Program.

At NPAC, we are currently developing Java based Web Object Request Broker (WORB) server that will support HTTP and IIOP protocols and will act as a univeral node of our HPcc (High Performance commodity computing) environment. Given that RTI object bus model is strongly influenced by CORBA and DMSO is in fact interacting with OMG towards proposing HLA as CORBA simulation facility/framework, an early Java/CORBA based RTI prototype seems to be a natural effort in the domain of interactive HLA training. Our Object Web/WORB based RTI subset would support and integrate Web DIS (Java and VRML based) applications under development at the Naval Postgradual School at Monterey, CA, as well as more traditional and substantial simulation codes such as ModSAF and perhaps also SPEEDES, TEMPO or IMPORT, currently at the planning stage as possible FMS training targets for our PET activities at ARL.

By the end of this project, we will deliver a prototype object web (CORBA) based RTI kernel (subset), capable of running a a simple demonstration application to be developed locally. This would serve as a demonstration of the integration of DMSO and web technologies and provide a freely available tool. A follow-on project could further develop the system into a full FTI implementation, at which point it would be possible to run real RTI applications as a demonstration of this tool.

Aspects of this work have been presented at or submitted to a variety of conferences, including the Workshop on Object Orientation and VRML at the Virtual Reality Modeling Language Symposium 1998 (VRML98), the Seventh International Symposium on High Performance Distributed Computing (HPDC7), Workshop on Web-based Infrastructures for Collaborative Enterprises (WETICE'98) and IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing (Middleware'98).

------------------------------------------------------------------------------

SV: Scientific Visualization

Scientific Visualization training efforts during Year 2included development and delivery of a class in using the new visualization package VTK (the Visualization ToolKit). This 2-day class was delivered on-site at the CEWES MSRC. Future work will include building a Web-based tutorial for this package. We also directed effort toward furthering the skills of two individuals. John West, a CEWES MSRC employee currently on long-term training leave, spent the summer with us at NCSA. Mr West worked as a member fo the NCSA Visualization team on a prototype system for delivering visualization capability to users who are remote from the CEWES MSRC. Ms Milti Leonard of Jackson State University, and a member of the CEWES MSRC PET team, worked with us at NCSA from September through December building her skills in C++ programming, HDF, and VTK. We will continue to work with Ms Leonard, at least through the life of the PET program. She will concentrate on techniques for supporting vi! sualization among remote users.

Specific effort on Scientific Visualization tools for the CEWES MSRC during Year 2 follows:

1. VisGen

User-directed, technology-transfer efforts in Year 2 included delivering two end-user tools. The VisGen tool (currently in an alpha release) is in use by Dr Carl Cerco and his team at CEWES in analyzing data from their simulations of phenomenon in the Chesapeake Bay. In this work, Dr Cerco's team models 10- and 20-year time periods, modeling concentration and transport of over 20 components (such as chlorophyll, dissolved oxygen, nitrogen, etc.) Data of this complexity could not be analyzed without visual techniques. The VisGen tool was designed and delivered to assist Dr Cerco's team in analyzing this data. It also allows the team to capture the visualization in a web-based media, allowing Dr. Cerco to share his results with his colleagues and project managers at the Environmental Protection Agency.

2. Structures Visualization

The second end-user tool is an application for viewing output from the structures codes CTH and Dyna3D. This was in support of the DoD Challenge in large-scale shock physics and structural deformation, directed by Dr. Raju Namburu of the CEWES Structures Lab. The goals of the visualization activity were to (1) provide support for this particular Challenge application, (2) highlight the science by showing the visualization at SC97, and (3) work toward advancing the methodology for managing and interpreting data from very large-scale calculations. We followed a strategy of sub-sampling the CTH data to make it more manageable, mapping that data to geometric form, and supporting interactive exploration of the playback of that geometry. Similarly, for the Dyna3D data, we extracted out the region of highest interest, decimated the geometry where possible, and supported interactive playback of the geometry. To support the researchers needs for sharing results with their colleagu! es, we incorporated support for image- and movie-capture. This application was used by the researchers to visually validate the results of their simulations. It was also used by the researchers to show and explain their work at SC97.

3. VTK (Visualization ToolKit)

In addition to end-user tools, the CEWES MSRC PET team transferred certain technologies to the visualization specialists at the CEWES MSRC. We shared our expertise in VTK. We also made available a variety of NCSA-developed tools, including the audio development library vss, image- and movie-capture code, and a VTK-to-Perfomer utility for using VTK on the ImmersaDesk.

-----------------------------------------------------------------------------

SPPT: Scalable Parallel Programming Tools

The SPP Tools team has mounted a coordinated, sustained effort in Year 2 to provide DoD users with the best possible programming environment and with knowledge and skills for effective use of HPC platforms. This effort can be divided into four major thrusts:

(a) Working directly with users to understand requirements and provide direct help on their codes.

(b) Supplying essential software to meet those requirements.

(c) Training in the use of that software and in parallelism issues generally.

(d) Tracking and transferring technology with enhanced capabilities.

Year 2 effort in each of these categories is decribed below:

1. Working with Users

A key tactic for SPP Tools is to engage users in working on their codes. It is only through such interactions that we can identify real requirements for new tools, thus ensuring that they will be relevant to CEWES MSRC. (For purposes of this discussion, we include the Code Migration Group as "users"; in fact, many of our closest contacts with parallel programmers are through that group.) In addition to directing future SPP Tools efforts, these interactions often lead to direct collaborations on specific codes. These have the potential for double success stories-the code improvements lead the DoD user directly to doing better science, and the lessons learned about tools provide experience for future endeavors. The latter type of success is particularly important when the user is a member of the NRC Code Migration Group (CMG) at the CEWES MSRC, who will directly work with many users in the future.

Out of many projects with users in Year 2, we mention only two. Dr Clay Breshears collaborated with members of the CMG to develop Fortran90 bindings for Pthreads on the Power Challenge Array. In principle, the same bindings could be used on other shared-memory machines with Fortran90 compilers; however, the detailed implementation might well differ. The importance of these bindings is that they provide a means for using fine-grain parallelism in a modern Fortran compiler. Henry Gabb of the CMG is using these bindings to parallelize David Medina's MAGI code (AF Phillips Lab)on the PCA and Origin 2000. The applicability of the bindings-and their implementation-is, however, much greater and they have been submitted as a paper to the 1998 DoD HPC Users Group Conference.

High Performance Fortran (HPF) code development and testing has also been partially a user-driven activity. Breshears worked in conjunction with Steve Bova to produce an HPF code and an algorithm for computing shared edges on finite-element grids. This is a very generic component of CFD codes, and its efficient translation to HPF bodes well for data parallelism as a useful tool for production codes. Of course, the translation also enabled the overall code to be migrated to a parallel machine without resorting to explicit message-passing. This effort also led to a submission to the DoD Users Group meeting.

2. Supplying Essential Software

To efficiently build codes, it is vital to have basic systems software available. The SPP Tools team has been active in the effort to ensure that essential system software, such as BLAS libraries, MPI implementations, and compilers, are installed on CEWES MSRC machines. These implementations are then tested and bugs reported, so that users can build their applications on a firm foundation of properly functioning systems and library software. In particular, the Cray/SGI implementation of MPI had a number of bugs discovered in the process of installing and testing the BLACS, which underlie ScaLAPACK, on the Cray T3E.

Although correctly functioning software is necessary, correctness alone is not sufficient for effective use of HPC platforms by large-scale computationally intensive applications. These applications also require a high level of performance from the underlying software, necessitating the development and use of benchmarks. Thus, over the past year Susan Blackford and Clint Whaley of UTK have carried out timings of a portion of the ScaLAPACK timing suite on MSRC platforms. In addition, Phil Mucci has been active in developing a suite of low-level benchmarks for measuring application-critical performance of key linear algebra operations, of the cache and memory hierarchy, and of the communication subsystem and running them on MSRC platforms. Gina Goff has performed similar work on a set of benchmarks to evaluate HPF constructs. The results of these performance evaluations are being made available in technical reports and in on-line performance data repositories and will be presented at the June 1998 DoD HPC Users Group meeting.

As important as bringing the software to DoD sites is, it is also necessary to have a good facility for storing and announcing it. Browne and others at UTK are putting in place a repository infrastructure that will enable sharing of software, algorithms, technical information, and experiences within and across CTAs and MSRCs and with the broader HPC community. The repository infrastructure is based on an IEEE standard software cataloging format that is implemented in the Repository in a Box (RIB) toolkit produced by the National High-performance Software Exchange (NHSE) project. Use of a standard format and of RIB provides a uniform interface for constructing, browsing and searching virtual repositories that draw on resources selected, evaluated, and maintained by experts in autonomous discipline-oriented repositories. Interoperation mechanisms allow catalog information and software to be shared between repositories. Appropriate labeling and access control mechanisms support and enforce intellectual property rights and access restrictions where needed. The repository infrastructure will facilitate easy discovery and widespread use of tools such as those described above, as well as of application codes developed by CTA teams.

3. Training

Training in the use of tools, including both specific tools and general parallel programming techniques, is a vital part of our mission. The SPP Tools training program in Year 2 included workshops on Performance Analysis Tools, to acquaint users with software that could enhance their productivity as well as their codes; ScaLAPACK, to help users efficiently solve large dense linear systems; PETSc, to introduce users to modern template-oriented scientific libraries; and "Bring Your Own Code" (sometimes taught in conjunction with the Code Migration Group), where users have a chance to receive assistance in dealing with their individual code problems.

In the interest of brevity, we will only discuss two of the courses we offered in Year 2. A highly tuned and efficient systems and library software base is essential for good application performance. However, for the results of performance evaluation studies to be of maximal benefit to DoD application developers and users, they need to be incorporated into an application performance analysis framework that provides the skills and tools needed to model, analyze, and tune application performance. To this end, UTK researchers have taught two courses in this area during the past year at CEWES MSRC - one on benchmarking and performance modeling and one on code optimizations for MPPs. The first course shows how benchmark results can be used as a starting point and combined with application instrumentation, scalability analysis, and statistical analysis techniques to model and predict application performance on various HPC architectures. The second course covers performance optimization techniques ranging from tuning single-processor performance to tuning of communication patterns.

4. Tracking and Transferring Technology

As a comprehensive technology transfer program, the PET effort at the CEWES MSRC has a strong responsibility to evaluate and offer new technologies for DoD use. The SPP Tools team has taken this responsibility seriously, and has been aggressive in both seeking out recent research and advertising it within DoD. The tools mentioned above are all examples of tools identified by the SPP Tools team, and their availability through RIB is a fine example of how such technologies can be made available to DoD. We mention here a few other examples of technology transfer.

Because CEWES MSRC users work on multiple platforms, they need portable performance analysis tools that allow them to analyze and compare application performance on different platforms without the burden of learning a different tool for each platform. Consequently, Dr Shirley Browne led an effort at UTK to carry out an evaluation of currently available portable performance analysis tools, including both research and commercial offerings. A CEWES MSRC technical report describes the results. On the basis of this evaluation, the SPP Tools team chose a handful of tools for porting and further evaluation on CEWES MSRC platforms. VAMPIR, a commercial trace-based performance analysis tool, runs on all CEWES MSRC platforms, is highly robust and relatively scalable, and has already been used to achieve significant performance improvement on a challenge application. Nupshot, a freely available trace visualization tool, has an intuitive easy-to-use interface that can quickly provide information about application communication performance. We are currently working on a robust version of nupshot and the MPE logging library that produces tracefiles for it. Furthermore, SPP Tools is working with developers to debug and implement Fortran90 support for two other promising trace-based performance analysis tools, AIMS and SvPablo, as requested by DoD users.

In the area of introducing cutting-edge research into DoD production use, Graham Fagg and others at UTK are bringing MPI-Connect to CEWES MSRC. Although good communication performance can be achieved on a single platform by using the vendor's optimized MPI implementation, and metacomputing using MPI can be achieved using portable implementations such as MPICH, users now need the ability to couple MPI applications on multiple platforms while retaining vendor-tuned MPI implementations. To achieve this goal, the MPI-Connect system developed at UTK, which enables transparent intercommunication between different MPI implementations, is being ported to MSRC platforms and deployed to couple MPI applications in application areas such as FMS, CWO, and IMT. Without a high-level metacomputing system like this or Legion (in use at some other MSRCs), developing such multi-disciplinary applications is virtually hopeless.

DoD application developers also need the best available debugging and performance tools to enable them to quickly find and fix bugs and performance problems. However, those tools that are available have not been put into wide use for various reasons, including lack of awareness by users, poor user interface, and steep learning curves for different tools on different platforms. Ideally, a portable parallel debugging interface should be available across all CEWES MSRC platforms. Shirley Browne and Chuck Koelbel participated during the past year in a standardization effort called the High Performance Debugging Forum (HPDF), which has produced a specification for a command-line parallel debugging interface that addresses important issues such as scalability and asynchronous control. They have solicited user input from DoD throughout the HPDF process and used it to help guide choices about what features to include and how the interface should appear. Vendor participation in HPDF indicates that the standard will be incorporated into commercial debuggers, and public-domain reference implementations are planned for the IBM SP and the SGI Origin 2000. Thus, HPDF promises to provide a portable, easy-to-use command-line debugger for CEWES MSRC platforms.

The HPDF experience is only one of a number of efforts by the SPP Tools team to form better connections between DoD users and the wider HPC community. Team members have attended approximately one meeting per month, publishing trip reports that have been widely circulated within DoD. Many CEWES MSRC users have commented on the value of these reports in keeping them abreast of the computational science field, even when the conference is not in their direct line of interest.

-------------------------------------------------------------------------------

C/C: Collaboration/Communication

During Year 2, the Collaboration/Communication technical infrastructure at the CEWES MSRC was supported by CEWES MSRC PET effort at both NCSA at Illinois and NPAC at Syracuse.

1. NCSA (Illinois) Efforts

The primary objectives of the C/C effort at NCSA is to promote better information dissemination to CEWES MSRC PET users and to provide team collaboration tools for PET management and researchers. This was accomplished in Year 2 through

(a) refinement of the PET website framework and modification of the Collab/Communication webpages,

(b) deployment of an initial collaboration environment, netWorkPlace, enabling asynchronous postings of status reports, meetings and discussions between team members, and

(c) building a sense of community among PET webmasters to facilitate technology transfer across the four MSRC sites.

A brief description of the activities in (a) and (c) are described in the following sections. The activities of the netWorkPlace focused effort project in (b) is discussed in the appendix this report. The CEWES MSRC PET website serves as a mechanism to provide timely information to users on PET program activities. NCSA developed the framework for implementing pages on the CEWES MSRC PET website in a consistent and uniform manner. This framework was developed to ensure ease of navigation throughout the site regardless of the web browser used and accessibility to visually impaired users. This framework included HTML structures for presenting frame and no-frame views of the documents as appropriate for the end-user browser software and provided recommendations for universal access considerations. Additional website support was provided through discussion with the CEWES MSRC PET webmaster and a report. The report described website log analysis tools and how they could be used to better administer the site and to provide information to web content providers on the effectiveness of their pages.

NCSA provides website support to three of the MSRC PET sites: CEWES, ASC, and ARL. Because of the similarity of support provided to each site it is beneficial to each site to leverage these activities across MSRCs. The mechanism chosen to support this leveraging was to build a sense of community among the MSRC PET webmasters to facilitate communication and sharing across sites. NCSA held a MSRC PET webmasters meeting in February 1998 to begin fostering this sense of community. d by the user. This effort involves selection of state-of-the-art asynchronous training tools for use in developing multimedia short courses and assistance in developing courses utilizing the recommended tools. This assistance will include development of course materials to demonstrate capability, training on how to use the tools, and guidelines on course development using web-based training tools.

2. NPAC (Syracuse) Efforts

The growth of web technologies offers some special opportunities to facilitate the work of CEWES MSRC PET team and DoD researchers. Researchers at Syracuse University's Northeast Parallel Architectures Center (NPAC) have long been on the cutting edge of using web technology in support of high-performance computing. Tango is a Java-based collaboratory tool which offers chat, whiteboard and shared web browser capabilities, as well as two-way audio and video conferencing, developed previously with support from the DoD's Rome Laboratory and SU's L.C. Smith College of Engineering and Computer Science. Besides deploying Tango for a CEWES MSRC PET-supported distance education project with Jackson State University, NPAC researchers have also expanded the capabilities of Tango to support consulting and software development activities in geographically separated groups with the addition of a shared tool to view and modify source code as well as debug an analyze performance.

Perhaps even more apparent than facilitating collaboration, advances in web and internet technologies have increased enormously the amount of information on a huge range of subjects which can be found on the network. This can be an important resource for DoD researchers, but only if it is possible to locate the desired information in the first place. To facilitate access and management of networked information, NPAC is introducing relational database systems coupled with web servers to the CEWES MSRC PET program. Initial applications include the management of large websites and the development of search engines focused on particular knowledge domains. In the latter case, a search engine focusing on grid generation (a technology cutting across several CTA areas) was developed as a prototype, and plans are in place for another search engine focus on Climate Weather and Oceanography (CWO) resources.

-----------------------------------------------------------------------------

Cross-CTA: Gridding Workshop

As a part of the CEWES MSRC PET Year 2 effort, an evaluation of currently available grid codes (COTS, freeware, and research codes) was conducted at CEWES MSRC, and a workshop on the utility of grid generation systems for MSRC users was being held at the University of Texas in Austin on 11-12 February 1998. This workshop was targeted specifically at the five "grid-related" CTAs (CFD, CSM, CWO, EQM, CEA).

A total of 42 attendees participated in this grid workshop:

8 from CEWES MSRC 2 from ASC MSRC 1 from NAVO MSRC 2 from NRL 1 from LLNL 2 from Sandia 17 from CEWES MSRC PET 1 from ASC/ARL MSRC PET 1 from ARL PET 1 from NAVO MSRC PET 6 from Texas, but not PET The purpose of this workshop was discovery and strategy:

(a) Identify the needs of CTA users that are not being met with currently available grid (mesh) generation systems, and to

(b) Formulate strategy to work with grid code developers and/or vendors to meet those needs.

The mode of this workshop was evaluation and focused discussion:

(a) At CEWES MSRC, evaluate all currently available grid generation systems of potential interest to CTA users and report the results at the WS,

(b) Report at the WS the capabilities of currently available geometry interfaces to grid generation systems,

(c) Report at the WS the capabilities of currently available domain decomposition and other parallel considerations for grid generation,

(d) Hear from the CTA users at the WS the grid-related needs that are not being met with grid generation systems now in use,

(e) Through focused and directed discussion at the WS, formulate strategy to meet the grid-related needs identified. The development of a new grid generation package from scratch was specifically *not* a purpose of this workshop.

The workshop also served to boaden the awareness of the availability of grid generation resources in the CEWES MSRC user community.

This workshop focused on four specific gridding issues:

* CAD and other input interfaces,

* Adaptation driven by solution systems,

* Coupling among grid systems and with solution systems, and

* Scalable parallel concerns, including decomposition.

The intent of this workshop was to compare the currently available grid generation capabilities with the identified needs of CEWES MSRC users in the "grid related" CTAs, and then to formulate a strategy to advance grid generation capability to meet those needs. This strategy may include interactions with commercial and/or research grid code developers to add features to existing grid codes or the development of add-ons, wrappers, translators, etc for attachment to existing grid codes. And this strategy may be executable within PET resources or may require additional support. In any case the outcome of the evaluation and workshop will be the formulation of a solution path to meet the grid generation needs of the CEWES MSRC users.

This evaluation and strategy must first be confirmed through extended interactions with the CTA Leaders and the CEWES MSRC users during Year 3 of the CEWES MSRC PET effort.

Execution of the strategy for enhancing grid generation capability in the MSRCs then must proceed during the years three through five of the PET effort. This effort should clearly involve all four MSRCs, with at least coordination - and possible collaboration - with the DoE ASCI effort.

Clear stages of delivery, user training, and user evaluation of enhancing components must be delinearated as a part of the strategy that emerges.

Effort now under consideration for Year 3 of the CEWES MSRC PET includes:

* Build interfaces/wrappers. Here the PET team would build GUIs that would allow users to access various grid generations systems from a common graphical interface. Also included would be translators for data formats between grid systems, and between grid ans solution systems. And perhaps the assignment of solution boundary and material condidtions.

* Assemble viable existing tools for interfacing with CAD systems and for repairing and filtering CAD files for input into grid generators. Perhaps make needed extensiona or additions - working with vendors or developers.

* Set up a web-based central information source on grids that is easily accessible to MSRC users, both to get information and to provide feedback on grid system usage and effectiveness. Also included would be a grid generation system repository providing access to codes and to training thereon.

* Form a High Performance Grid Generation Forum (HPGGF) of users and grid generation people in DoD, DoE, NASA, and the supporting universities. Use this forum to define needed grid generation objects and operations, and to standardize characteristics of these objects and interfaces among these operations. The idea here would be to define and standardize the needed elements, and then to locate, or encourage the development of, tools to implement such.

* Provide support for the operation of grid generation systems on scalable computing systems in order to enable larger numbers of points. One possible approach here might be to use the Origin/2000 as the grid generation engine, then translating and addembling the grid generated thereon for use in solution systems on the T3E and the SP. Also included here would be the assembling of existing domain decomposition tools into a form that is readily applied.

* Interact with the SV team to improve visualization capability for grids and solutions on the various grid forms. Here existing diagnostic tools would be assembled into common interfaces for use as grid debuggers.

* Support special CTA and Challenge Project needs in grid generation.

* Set up a grid generation benchmark suite.

Then beyond the CEWES MSRC PET resources, more work in needed in the following areas:

* Automation of the grid generation process.

* Improved domain decomposition methods.

* Improved grid quality analysis.

* Improved adaptive grid/solution coupling.

* Better support for grid generation systems.

-------------------------------------------------------------------------------