VIII. OUTREACH to CEWES MSRC USERS Since the great majority of users of the CEWES MSRC are off-site, the CEWES MSRC PET effort places emphasis on outreach to remote users, as well as to users located on-site at CEWES. Table 3 lists the contacts made with CEWES MSRC users by the CEWES MSRC PET team during Year 3, and Table 2 lists all travel by the CEWES MSRC PET team in connection with the Year 3 effort. A major component of outreach to CEWES MSRC users is the training courses (described in Section VII) conducted by the CEWES MSRC PET team, some of which are conducted at remote user sites and some of which are web-based. The CEWES MSRC PET website, accessible from the CEWES MSRC website, is also a major medium for outreach to CEWES MSRC users, and all material from the training courses is posted on the PET website. A CD-ROM of training material has also been prepared. Specific outreach activities conducted in Year 3 are described in this section, which is organized by individual components of the CEWES MSRC PET effort. CFD: Computational Fluid Dynamics CTA (ERC - Mississippi State) Interactions with CEWES MSRC users have been initiated by a variety of means. Telephone, e-mail, and personal visits have all resulted in opportunities for user support and more specific collaborative efforts. Face-to-face visits have resulted from training participation such as the Parallel Programming Workshop for Fortran Programmers (BYOC) wherein users are introduced to parallel programming within the context of their own code. This is a particularly effective opportunity for user outreach and training since it gives the on-site CTA lead an opportunity to meet and interact with users on an individual basis and learn about their work within a semi-formal classroom environment. Some of the more significant outreach interactions are described below. In collaboration with Henry Gabb of the NRC Computational Migration Group, assistance was provided to J. C. T. Wang of the Aerospace Corporation by analyzing the message flow in a section of his PVM code and with general support resulting in a successful code port to the IBM SP system. David Medina of AF Phillips Lab implemented a graph-based reordering strategy within the MAGI solver resulting from PET consultation. The objective is to improve cache performance and interprocessor data locality. Results report ~30% execution time reduction for two, four and sixteen processors compared to execution without reordering. Extensive interaction with Fernando Grinstein of NRL has continued in the third contract year. Collaboration via phone, e-mail and personal visits occurred in order to provide user assistance in development of a parallel version of NSTURB3D capable of efficient execution on all CEWES MSRC computing platforms. A dual-level parallel algorithm using both MPI and OpenMP was designed and implemented into the CGWAVE solver supporting Zeki Demirbilek of CEWES CHL. This resulted in dramatic reduction in turnaround time. Turnaround time for the demonstration case was reduced from 2.1 days to 12 minutes using 256 SGI O2K processors. This project also involved extensive collaboration with the SPP Tools team, and Tennessee and served as a testbed for their MPI_Connect Tool. Collaboration was initiated with Bob Robins of NW Research Associates as a result of the BYOC workshop, via follow-up email and phone contact. His code has been analyzed for inherent parallelism, and continued collaboration is planned to produce a parallel version of his solver. Specific support was provided to both the COBALT and OVERFLOW CHSSI development teams. Coordination with Hank Kuehn resulted in a special queue priority for the COBALT development team to assist in timely debugging of their solver. This helped identify an implementation problem that only manifested itself for applications using more than forty processors. The CFD team worked to secure FY99 allocation on CEWES MSRC hardware for the OVERFLOW development team. CSM: Computational Structural Mechanics CTA (TICAM - Texas & ERC - Mississippi State) The first step in outreach was the short course taught by Graham Carey at the June 1998 DoD HPCMP Users Group Meeting. Rick Weed, the CSM On-Site Lead, has been instrumental in coordinating our interactions with the applications analysts at CEWES and elsewhere. For example, during the CAM Workshop in November at CEWES, we met with on-site engineering analysts involved in application studies using CTH and EPIC. Follow up was made at the February PET Annual Review by Carey and Littlefield with the EPIC analysts to discuss our recent adaptive grid capability. As a result of this meeting we mapped out a strategy for collaborating with the users and testing their nonlinear material models in our adaptive EPIC code. We also developed a goal for a parallel roadmap task and CEWES MSRC code migration plan to be undertaken in Year 4 or the following year. David Littlefield has been working closely with the Sandia applications software group (D. Crawford, G. Hertel). He has also been in regular close contact with the major ARL users (K. Kimsey, D. Scheffler, D. Kleponis). We have had several discussions with Raju Namburu at CEWES and recently discussed tech transfer to transition the adaptive CTH code version. This transfer is being coordinated with Gene Hertel and the Sandia group. Carey has been interacting with Rob Leland's group at Sandia on grid partitioning, grid quality, and parallel partitioning issues. Following his February visit to Sandia the group obtained the CHACO software and has implemented it at Texas. We are currently experimenting with this partitioning software and will be incorporating our space-filling curve scheme into it. CWO: Climate/Weather/Ocean Modeling CTA (Ohio State) Sadayappan and Welsh of Ohio State had three major co-ordination meetings with CEWES MSRC user Bob Jensen during PET Year 3. These meetings were to plan ongoing efforts on the parallelization and migration of the WAM wind-wave model, and the coupling of WAM with the CH3D marine circulation model. SGI on- site representative Carol Beaty was present at the first and second meetings, and government CWO monitor Alex Carillo attended the second and third meetings. There were numerous e-mail and telephone contacts with Bob Jensen throughout the year, and Welsh made four additional one-week trips to CEWES MSRC to provide CWO core support. At the May 1998 WAM co-ordination meeting a major topic was the disagreement of predictions of the pre-existing (MPI-based) parallel WAM code with those of the original sequential WAM code. Subsequent debugging traced the problems in MPI WAM to shallow water current-related propagation effects and inter-grid communication in nested grid runs. The coding errors responsible for the problems were not found, however, and it was agreed shortly after the August 1998 co-ordination meeting to replace MPI WAM with an OpenMP version of WAM. The OSU team then deployed OpenMP WAM in the coupled CH3D/WAM system. Apparent errors in the WAM treatment of current-induced wave refraction were also discussed at the May 1998 WAM co-ordination meeting. Welsh subsequently examined the WAM propagation scheme and found a sign error repeated several times in the main propagation subroutine. A corrected version of the subroutine was delivered to Bob Jensen. In May 1998 Welsh also met with Bob Jensen, Don Resio, and Linwood Vincent of CEWES to review the coupling physics being implemented in the CH3D/WAM system. This meeting resulted in a modification of the WAM bottom friction algorithm. At the October 1998 WAM co-ordination meeting the major issue was the unexpected termination of OpenMP WAM simulations on the SGI Origin2000. Follow-up work with Carol Beaty and NRC Computational Migration Group staff Henry Gabb and Trey White showed that explicit stack size specification was necessary, and that aggressive optimization during code compilation caused small errors in the WAM results. Welsh also found that if the WAM grid is trivially split into one block (the computational grid is divided into a user-specified number of blocks to save memory), the simulation will terminate unexpectedly. In May 1998, Sadayappan and Zhang of Ohio State travelled to Mississippi State to meet with CEWES MSRC user Billy Johnson. The purpose of the trip was to discuss the physics required for the coupling of CH3D and WAM at the atmospheric boundary layer, and the coupling of CH3D-SED and WAM at the bottom boundary layer. The parallelization of CH3D-SED was also planned with Puri Bangalore of MSU. CH3D-SED was subsequently parallelized by MSU staff, and the parallelization was verified by OSU staff. In March 1999, CWO On-Site Lead Steve Wornom met with CEWES MSRC users Lori Hadley and Bob Jensen and SPPT On-Site Lead Clay Breshears, and Henry Gabb and Ben Willhoite of the NRC Computational Migration Group, to discuss the parallelization of the SWAN wind-wave model. This has resulted in a joint CWO/CMG/SPP Tools effort to migrate the code to the CEWES MSRC parallel platforms. EQM: Environmental Quality Modeling CTA (TICAM - Texas) The EQM team working with Mark Noel of CEWES had verified version 1.0 of parallel CE-QUAL-ICM in December of 1997, and in March of 1998, the first 10-year Chesapeake Bay calibration run had beencompleted for the EPA. Starting in April 1998, we interacted mainly with Mark Noel to add features to the parallel code needed to run the EPA scenarios. We also trained him on how to write routines for post-processing parallel runs. >From March through September 1998, Noel was able to run 53 10-year calibration runs, and 53 actual runs for the EPA Chesapeake Bay project, using a toatal of 24663 cpu hours. In November 1998, Carl Cerco of CEWES asked us to help improve the scalability of CE-QUAL-ICM for future planned large-scale work. We analyzed the parallel code using a 10-year Chesapeake Bay run as a benchmark, and improved parallel CE-QUAL-ICM in several respects. These improvements required interaction with Mark Dortch, Carl Cerco, Mark Noel, and Barry Bunch of CEWES. We improved the I/O performance and developed techniques which allowed the code to run on more CPUs (up to 110) with good parallel performance. The new code was verified by the end of February 1999. Discussions with Mark Dortch, Carl Cerco, Barry Bunch, and Mark Noel of CEWES led the us to improve the grid decomposition algorithms crucial to our method of parallelization. We investigated the use of the grid partition package METIS 4.0. We built an interface to this package, and it greatly improved the scalability of CE-QUAL-ICM. Mark Dortch asked the EQM team to help transfer the parallel computing technology to the toxic version of the code. We trained Terry Gerald and Ross Hall of CEWES in the basic techniques, and he was able on his own to code a parallel version of CE-QUAL-ICM/TOXI. FMC: Forces Modeling and Simulation/C4I CTA (NPAC - Syracuse) Currently, our main DoD FMS user group that provides the application focus and testbed for the WebHLA framework is the Night Vision Lab at Ft. Belvoir which develops CMS as part of their R&D in the area of countermine engineering. Our support for CMS includes: * Building Parallel CMS module by porting sequential CMS to the Origin2000 (and later on also to a commodity cluster). * Integrating Parallel CMS module with other WebHLA federates described above (JDIS, PDUDB, SimVis). * Planning future joint work with Ft. Belvoir. This includes development of joint proposals such as a currently pending CHSSI proposal on WebHLA for Metacomputing CMS. We are also interacting closely with and provide PET support for the current FMS CHSSI projects, including: * FMS-3 where we are builing Web based interactive training for the SPEEDES simulation kernel; * FMS-4 where we are acting as external technical reviewer and we recently developed a joint CHSSI proposal with SPAWAR for a follow-on project on using SPEEDES, Parallel IMPORT and WebHLA to build Intelligent Agent support for FMS. * FMS-5 where we expect to directly participate and we were asked to provide our Object Web RTI as a test implementation of the RTI 1.3 standard, to be certified by DMSO and used by FMS-5 as a fully compliant reference prototype. C/C: Collaboration and Communications (NPAC - Syracuse) Since our primary C/C effort in Year 3 focused on the Tango Interactive electronic collaboration system for use in education, training, and small-group collaboration, our outreach efforts focused on enlarging the base of knowledgeable and experienced users of the system as a base to help support broader deployment of the tools. This work was conducted with on-site Training and C/C (or equivalent) support personnel at the four MSRCs, as well as with staff at the Naval Research Lab in DC. Our collaborators at the Ohio Supercomputer Center gained experience not only in operation and support of Tango from the point of view of course recipients, but also as instructors. All four MSRCs, as well as OSC, now have Tango server installations, and along with NRL have substantial experience with the client side of the system. This group of experienced users will form a core of support that will facilitate wider use of Tango for both training and other collaborative applications in the coming years. SPPT: Scalable Parallel Programming Tools (CRPC - Rice/Tennessee) Outreach to CEWES MSRC users during Year 3 by the SPP Tools team included the following specific assists: * Marvin Moulton: Contact made though Steve Bova, on-site CFD lead, for information about the HELIX code. * David Medina: Various e-mail exchanges and various telephone contacts for information of the MAGI code. MAGI code is being studied at Rice as a test bed of compiler optimization studies. * Fred Tracy (CEWES): On possible collaboration on optimization of FEMWATER. * Ann Sherlock, Jane Smith: On possible collaboration of parallelization of STWAVE and tutoring program at Rice. * Zeki Demirbilek (CEWES Coastal Hydraulics Lab): Clay Breshears implemented MPI_Connect version of CGWAVE code. This effort was in collaboration with other members of CEWES MSRC PET, the NRC Compuational Migration Group and the CEWES MSRC Computational Science and Engineering group. The code went on to win the Most Effective Engineering Methodology Award from the SC'98 HPC Challenge competition. * Bob Robins (Northwest Research Associates, Inc.): Found potential Fast Poisson Solver libraries during BYOC workshop held at CEWES MSRC. Chuck Koelbel, Gina Goff and Ehtesham Hayder offered tutorials at CEWES MSRC and at NRL and discussed with the participants their codes and possible ways of parallelization. Clay Breshears assisted the CEWES MSRC CSE group members with the use of the Vampir performance analysis tool and the TotalView debugger. Also, he was able to help members of the RF Weapons Challenge Project (Kirtland AFB) with these tools. CHSSI Project Support: Tennessee has worked with CEWES MSRC user and CEN-1 CHSSI code developer David Rhodes to use the Vampir performance analysis tool to analyze and improve the performance of the Harmonic Balance Simulation code. Mr Rhodes reports: "The Vampir toolset provided just what I was looking for in tuning my parallel application. This application contains tasks that range from small to large granulity. Vampir allowed me to view dynamic program execution with a very low level of intrusion. After making some significant algorithmic changes - e.g. changing from a dynamic to static scheduling approach - I was able to achieve much better levels of scalability and parallel efficiency. The data needed to determine existing problem areas would have been much harder to gather without Vampir." Challenge Project Support: Tennessee has worked with two members of the RF Weapons Challenge Project team Gerald Sasser and Shari Collela of AF Phillips Lab in their use of Vampir. Vampir was used by Sasser to find and fix a bottleneck in the Icepic code and to significantly improve the communication performance of that code. Collela has been unsuccessful so far in using Vampir on the Mach3 code because the code is very large and produces huge and unwieldy trace files. Tennessee plans to use the Mach3 code as as test case for new dynamic instrumentation techniques that can be used to dynamically turn Vampir tracing on and off and thus reduce the size of the tracefiles while still collecting trace data for "interesting" parts of program execution. SC'98 HPC Challenge Support: Graham Fagg of Tennessee worked with the CEWES MSRC team to use Tennessee's MPI_Connect system, along with OpenMP and MPI, to achieve multiple levels of parallelism in the CGWAVE harbor response simulation code which reduced the runtime for this code from months to days. The team won the Most Effective Engineering Methodology award for their SC'98 HPC Challenge entry. User Support: Tennessee has put together an SPP Tools repository at http://www.nhse.org/rib/repositories/cewes_spp_tools/catalog/ which lists programming tools being made available and/or supported as part of PET efforts. The tools include parallel debuggers, performance analyzers, compilers and language analyzers, math libraries, and parallel I/O systems. In addition to giving information about the available tools, the repository includes a concise matrix view of what tools are available on what platforms with links to site-specific usage information and tips and web-based tutorials. Because CEWES MSRC users typically use multiple MSRC platforms, an emphasis has been placed on providing cross-platform tools, such as the TotalView debugger and the Vampir performance analysis tool which both work on all MSRC platforms, but information has also been provided about platform-specific tools in case the special features of these tools are needed. Tennessee has tested the installed versions of the tools and has worked with CEWES MSRC systems staff to ensure that the tools are working correctly in the programming environments used by CEWES MSRC users, including the PBS queuing system, and has reported any bugs discovered to the tool developers and followed up on getting them fixed. An e-mail message about the repository was sent to CEWES MSRC users, and an article about it has been written for the CEWES MSRC Journal. The repository and the tools have already had an impact on the NRC Computational Migration Group (CMG) which has started using some of the tools such as TotalView on a daily basis to do their jobs more effectively, and the CMG is encouraging other CEWES MSRC users to do likewise. SV: Scientific Visualization (NCSA-Illinois & ERC-Mississippi State) During Year 3, the PET visualization team worked with users, the PET on-site staff for each CTA, and the CEWES MSRC visualization staff. Early in the year, we surveyed several users to discuss their data management strategies. This led to the summary report in which we recommended that HDF, a data management package in use by NASA EOSDIS and DOE's ASCI project, be introduced to the CEWES MSRC. In July, we arranged for HDF project lead, Dr Mike Folk, to visit the CEWES MSRC and present an overview of HDF. We also worked with a variety of users to assist in visualization production. We worked with Andrew Wissink to produce a visualization of his store separation problem. Still imagery and time-series animations were produced for Robert Jensen of CEWES. Additional visualization production was undertaken with Carl Cerco's data (CEWES). This includes a movie sequence that can take advantage of the very-wide screen (the Panoram) installed at the CEWES MSRC. The PET Vis team has had long-term relationship with Robert Jensen, particularly as it relates to customizing visualization tools to support his wave modeling work. We also work with Raju Namburu's team (CEWES) on novel application of wavelet techniques to build structure-significant representations of his data. This is particularly important as Namburu's data sets are very large. In another long-term collaboration, the PET Visualization team has ongoing communication with Carl Cerco, Mark Dortch, and Mark Noel of CEWES, in relation to their Chesapeake Bay project and visual analysis of the output of the CEWES CEQUAL-IQM code. This is a continuation of the relationship that was begun in Year 1. This year, we have worked with them on defining their requirements for desktop visualization support, prototyping solutions for those needs, and iterating on the design process to refine their specifications. We have provided them a production-quality version of a tool that they are currently using to view data from their 10- and 20-year productions run of the Chesapeake Bay model. This tool also supports a limited form of collaboration that they are using to share their results with their project monitor at the Environmental Protection Agency. The CEWES PET Vis team has had a variety of contacts with the PET CTA on-site staff, including Steve Bova (CFD) and Rick Weed (CSM), to consult on how to best assist their users in the areas of computational monitoring and visualization. And we have worked with our PET academic counterparts, Mary Wheeler (EQM, Texas), Keith Bedford (CWO, Ohio State), and Geoffrey Fox (FMS & C/C, Syracuse). In Year 3, we also had numerous contacts with CEWES MSRC visualization personnel, including Michael Stephens, Richard Strelitz, Kent Eschenberg, Richard Walters, and John West. We have advised on new software packages and techniques for visualization and virtual environments. We also have continuing contact with Milti Leonard, visualization staff at Jackson State University, consulting on visualization software and mentoring her in her own skill development. MSU's major SV interaction has been with CSM scientist Raju Namburu of CEWES. He has guided the project and has influenced the quality deliverables to a great extent. In addition, he also explored the use of several algorithms including volume rendering for his datasets and has spurred the development of tools to aid the main project. University of Southern California The USC team met with Rama Valisetty at CEWES as well as at USC to discuss the computation and communication structure of key applications in Computational Structural Mechanics (CSM). This interaction helped us focus our benchmarking and modeling efforts to accommodate the problems that real end-users face in parallelizing their code. The USC team interacted with a DoD end-user through Valisetty. An unoptimized CFD application was given to USC. USC employed the benchmark results and the IMH model to initially analyze the computation and communication requirements of the original algorithm. A flow diagram was then created and the major bottleneck sections of the code were identified. Using the IMH model, the USC team analyzed the performance of various data reorganization techniques, communication latency hiding techniques, and computation scheduling choices. This allowed USC to optimize the algorithm to minimize the communication overhead of parallelization while distributing the workload evenly among the processors. The optimized algorithm developed by USC was scalable and portable. Using 30 processing nodes, the performance was improved by approximately 5 fold over the original algorithm.