VII. TRAINING Since its inception, the CEWES MSRC PET training program has faced two challenges. One is to provide training in an anytime, anypace, anyplace environment. That goal has not been reached, but the PET program has continued to support efforts in remote training and distance education. Those efforts are now bearing fruit as can be seen from this report. The second challenge is to meet the needs of CEWES MSRC users faced with a rapid change in available hardware and software systems. The attempts to meet this challenge are evident by comparing the courses listed in Table 7 with the same list that appeared in the Year 2 Annual Report. Training is now offered on products like OpenMP that did not exist in 1997, and courses on topics such as C++ have been replaced by Java courses. The PET training program continued its evolution in Year 3 with more emphasis on distance training technology and service to remote users. The first full-blown distance training course was offered through the Tango Interactive distance consulting system. This was followed by a second Tango course that was broadcast to all three of the other MSRCs. This year has also seen continuous development of Tango as a distance education tool through its application as a vehicle for offering graduate computer science courses from Syracuse University to Jackson State University. The Fortran 90 course offered in September 1998 was our first Tango-based course offered to remote users. The course was broadcast over the Internet to OSC, the course provider, and users at the ARL MSRC. Additional Tango courses have been scheduled for 1999. The PET training team supports not only training activities, but also provides logistic and technical support to all areas of the PET program. PET played a major role in supporting the workshop on Recent Advances in Computational Structural Mechanics and High Performance Computing held at CEWES in November 1998. Training Curriculum PET training is designed to assist the CEWES MSRC user in transitioning to new programming environments and efficiently using the present and future SPP (Scalable Parallel Processing) hardware acquired under the HPCM program. The training curriculum is a living document with new topics being added continually to keep up with the fast pace of research and development in the field of HPC. The curriculum contains courses in the following general categories: * Parallel programming * Architecture and software specific topics * Visualization and performance * CTA targeted courses, workshops, and forums Table 7 gives a list of all training courses taught during Year 3 with the organization offering the course, the number of students attending the course, and the overall evaluation score of the course on a scale of 1 (poor) to 5 (excellent). Unless otherwise noted, the courses were held in the CEWES MSRC Training and Education Facility (TEF). Training at the DoD HPCMP Users Group Conference The CEWES MSRC PET program sponsored training activities at the DoD HPCMP Users Group Conference at Rice in June 1998. Five training courses were held and are included in the list of courses in Table 7. These training courses had the largest attendance in the history of the User Group Conferences. The PET program also sponsored a PET Training Colloquium on Distance Learning and Collaboration. The colloquium was organized by Geoffrey Fox, PET Academic Lead for Training and Collaboration/Communication, from Syracuse University. The speakers were Dr. Anoop Gupta of Microsoft Research, Dr. Don Johnson of the DoD Advanced Distributed Learning Initiative, and Prof. Fox. The moderator was Dr. Louis Turcotte of CEWES MSRC. Seminars The CEWES MSRC PET program offers seminars on an irregular basis. These are presentations by experts in their field and are designed to introduce the CEWES MSRC users to current research topics in HPC. The following seminar presentations were made during Year 3 at CEWES MSRC: Managing Scientific Data with HDF Dr. Michael Folk National Center for Supercomputing Applications (NCSA) University of Illinois Web-Based Instruction Prof. Geoffrey Fox Director, Northeast Parallel Architectures Center (NPAC) Syracuse University Web-Based Training During Year 3, four distance education courses were conducted over the Web. Syracuse delivered one undergraduate course (Web Programming) and two graduate courses (Computational Science for Simulation Applications and Advanced Web Programming) to Jacckson State, and Jackson State delivered one undergraduate course (Web Programming) to Morgan State University. Syracuse also delivered the Advanced Web Programming course to Mississippi State and Clark Atlanta. All these offerings were full semester, for-credit courses delivered over the Web using the Tango collaborative software environment. TRAINING COURSE DESCRIPTIONS This material appears on the CEWES MSRC PET Website as training course descriptions in advance of courses, hence the future tense. Parallel Programming Workshop for Fortran Programmers The workshop will begin with a one-day lecture on strategy, tools, and examples in parallel programming. On the remaining days participants will work with their own codes. There will be no attempt to prescribe a particular solution to the problem of porting a code from the C90 to the scalable systems. Rather, the instructors will work with the user to find the best overall strategy, whether that best strategy is message passing via MPI or PVM, or data parallel via HPF or OpenMP. It may not be possible to parallelize a full blown application program in a week, but the process can get started and a continuing relationship can be established between the users and the parallelization experts at the CEWES MSRC. Using the Message Passing Interface (MPI) Standard Message-Passing Interface (MPI) is the de facto standard for message-passing developed by the Message-Passing Interface Forum (MPIF). MPI provides many features needed to build portable, efficient, scalable, and heterogeneous message-passing code. These features include point-to-point and collective communication, support for datatypes, virtual topologies, process-group and communication context management, and language bindings for the FORTRAN and C languages. In this tutorial we will cover the important features supported by MPI with examples and illustrations. Also an introduction to extensions of MPI (MPI-2) and message-passing in real-time (MPI/RT) will also be provided. Large Deformation Computational Structural Mechanics Applications on High Performance Computers using ParaDyn/DYNA3D This course will begin with a DYNA3D lecture reviewing the features added to the program since 1993. Some of the recent features include techniques for switching materials from rigid to deformable and back, new material models and equations of state, recent developments in element technology, and new contact methods. This lecture will include time for questions and answers about modeling and using any of the features in DYNA3D. The MSRC will provide attendees a summary of steps required for submitting batch jobs to run parallel problems on the Origin2000, Cray T3E and IBM SP. This will include the design of script files for the batch system, a discussion of the batch queues, and running the batch utilities to follow the progress of a job. The ParaDyn lecture will feature discussions on the automated software for domain decomposition, running the ParaDyn program, post-processing the results for visualization, and the performance on parallel computers. Techniques for efficiently handling contact boundary conditions and future parallel capability releases will be discussed. The lectures will finish with a discussion of applications illustrating the power of parallel computers in modeling problems of DoD interest. On the second day the instructor will demonstrate a sample problem preparation and execution of a ParaDyn calculation on one of the parallel systems at CEWES MSRC. Attendees will be able to run their own examples and work with the instructor directly at this time. Grid Generation for Complex Configurations This course will cover an in-depth review of the current state-of-the-art and state-of-practice in geometry/grid generation applicable to complex problems. A step-by-step process starting from the initial CAD definition or drawing of a configuration and proceeding to the generation of a curvilinear, hexahedral or cartesian grid and grid adaptation techniques will be presented in detail. Demonstrations and hands-on computer lab exercises will be conducted to explore the use of GUM-B, VGRID, CAGI, GENIE++, TrueGrid, PMAG, CUBIT, and Hybrid2d systems for practical applications of interest to CEWES MSRC users. Java for Scientific Computing The objective of this course is to provide the participant with (a) an understanding of the high performance computing architecture, including the World Wide Web for visualization, (b) an overview of the Java language and its capabilities, and (c) enough programming details to do some examples. High Performance Fortran (HPF) in Practice This course will introduce programmers to the most important features of HPF, including features inherited from Fortran 90, the data parallel FORALL statement and INDEPENDENT assertion, and data mapping by ALIGN and DISTRIBUTE directives. The instructor will illustrate how these features can be used in practice on algorithms for scientific computation such as LU decomposition and the conjugate gradient method. Performance Optimization This course will focus on the optimization of numeric intensive codes for HPC systems. The course will begin with a quick overview of the basics of performance and processor architecture. Then it will cover a wide variety of optimizations geared towards enhancing processor performance. Topics will include efficient use of the memory hierarchy, functional units, amortizing loop overhead and dependency analysis. Common bottlenecks and caveats will be discussed as well as proposed solutions, and the logic behind them. Topics in Finite Element Methodology for Nonlinear Problems This course is broadly structured to cover different types of applications from structures to fluids to heat transfer and coupled problems that are of general interest to DoD. Methodology rather than specific applications is stressed. Topics covered include algorithms, nonlinear solution strategies, and integrating solution with adaptive refinement. A Tutorial on Designing and Building Parallel Programs In this tutorial, the instructors will provide a comprehensive introduction to the techniques and tools used to write parallel programs. First, the instructors will introduce principles of parallel program design, touching upon relevant topics in architecture, algorithms, and performance modeling. Examples from well-established parallel programming systems (HPF and MPI) will be included. After the basic material is covered, we will examine two new programming systems for parallel machines, OpenMP and PETSc. An Introduction to the Fortran 90 This course is aimed at introducing engineers and scientists familiar with Fortran 77 to the new features and capabilities available in Fortran 90. These new features include free form source code, the CASE control structure, the ability to create new data types, modules (similar to C++ courses), array processing shortcuts, dynamic memory allocation, pointers, improved I/O handling, and a host of new intrinsic functions. Source code compatibility between Fortran 77 and Fortran 90 will also be discussed. Scalable OpenMP Programming on Origin2000 This is an advanced course. Topics to be covered are: 1. Overview of OpenMP programming model 2. Review of execution model 3. Moving beyond incremental parallelization mode 4. Domain decomposition 5. Comparisons with message passing 6. Performance optimization on Origin2000 7. Preview of OpenMP C/C++ specification Introduction to MSC/PATRAN - Modeling for Design Analysis This is an introductory course for new and/or infrequent MSC/PATRAN user. Students will master the basic skills required to use MSC/PATRAN in a typical MCAE application. This course emphasizes practical skills development through comprehensive, hands-on laboratory sessions. Students will learn to build analysis models using MSC/PATRAN, by defining material properties, creating boundary conditions, and submitting their problems for analysis and post-processing the results using a variety of graphical formats. Specific topics such as CAD integration, geometry editing, meshing, grouping, and customization will be covered. Users of all FEA codes are encouraged to attended since MSC/PATRAN supports all the popular FEA codes such as MSC/NASTRAN, MSC/DYTRAN, HKS/ABAQUS, ANSYS, LS-DYNA and many more. Tango for Remote Consulting NPAC's Tango is a Web collaboratory. The system extends capabilities of Web browsers towards a fully interactive, multimedia, collaborative environment. Tango is also a framework for building collaboratory systems. In this tutorial we will instruct how to use Tango and will cover applications of Tango for remote consulting, including all the critical software development phases: coding, compiling, testing and debugging, result analysis. Parallel Debugging and Performance Analysis Tools: TotalView and Vampir The goal of this course is to introduce parallel application developers to parallel debugging and performance analysis tools available on CEWES MSRC platforms, and to provide more in-depth coverage of the TotalView debugger and Vampir performance analysis tool. The course will cover the basics of using the tools as well as provide pointers to further information. A lab session will include practice on using the tools on some example programs. Debuggers to be covered include Dolphin TotalView 3.8 for the SGI/Cray Origin2000 and IBM SP, Cray TotalView for the Cray T3E, SGI dbx for the Origin2000, and pdbx for the IBM SP. Dolphin TotalView has a graphical interface while dbx and pdbx provide command-line debugging interfaces. The Cray version of TotalView for the Cray T3E has both graphical and command-line interfaces. An overview will be given of the various performance analysis tools available on CEWES MSRC platforms, but the performance analysis portion of the course will focus in detail on the Vampir tool which has been recently acquired and is now available on CEWES MSRC machines. Techniques in Code Parllelization The techniques needed to parallelize an algorithm and code are described. These includes discretization methods, domain decomposition, linear and nonlinear solver issues, mesh partitioning, load balancing, preprocessing, and postprocessing. Examples of parallelization efforts carried out at the University of Texas will be given. Participants will also have a chance to bring their "dusty deck" codes for discussion on how best to migrate them to parallel platforms. Parallel Programming on the Origin2000 using OpenMP This "how-to" workshop is designed to train the participants in the techniques and tools required to perform parallel programming using OpenMP directives on the Origin2000(O2K). After a discussion of the MIPS R10000 processor, the O2K architecture, and an introduction to the IRIX operating system creation and scheduling of parallel threads, the OpenMP directives will be discussed in detail along with examples of their use. The course will conclude with the equally important topic of how to distribute the data used by parallelized Open MP regions among the local memories on the O2K. Computational Monitoring Using CUMULVS Computational monitoring lets you visualize simulation output while your computation executes. This can be useful for users with codes that produce very large output files, or when you want to stop a run that is not progressing satisfactorily. This "how-to" workshop will educate participants in the techniques and procedures required to perform interactive computation using currently available tools. The workshop will include: 1. An overview and discussion of available tools, commercial or freeware 2. An introduction to CUMULVS from Oak Ridge National Lab 3. Detailed steps for how to instrument your code to use CUMULVS 4. Discussion of possibilities for computational monitoring for participants' codes. Interactive Structured Time-varying Visualizer (ISTV) This tutorial gives an introduction to the Interactive Structured Time-varying Visualizer (ISTV), an OpenGL-based scientific visualization package available on IRIX and Solaris. ISTV is an interactive visualization system that visualizes time-varying multiblock (or multigrid) simulations on time-varying grids. ISTV's genesis was in the need for a toolkit to visualize data from high-resolution ocean models. By exploiting modularity and plug-ins, the scientist has the ability to tailor the ISTV visualization system to the needs of a particular discipline or problem without having to write a completely new system. WebFlow: Web Interfaces for Computational Modules In this course we will present the WebFlow system developed at Northeast Parallel Architectures Center (NPAC) at Syracuse University. This system addresses the needs for high level programming environments and tools to support distance computing on heterogeneous, distributed platforms. During the course we will describe and demonstrate the WebFlow system. This will include background information on CORBA and developing CORBA objects in Java. We will present the architecture of WebFlow, discuss its security model, and methods of providing a seamless access to remote resources. The course will be focused on applying WebFlow to the users' applications. We will explain how to customize the WebFlow front-end to the need of a particular application, and how to invoke and control the users' computational modules.