VII. TRAINING

Since its inception, the CEWES MSRC PET training program has faced
two challenges.  One is to provide training in an anytime, anypace,
anyplace environment.  That goal has not been reached, but the
PET program has continued to support efforts in remote training
and distance education.  Those efforts are now bearing fruit as
can be seen from this report.  The second challenge is to meet
the needs of CEWES MSRC users faced with a rapid change in available
hardware and software systems.  The attempts to meet this challenge
are evident by comparing the courses listed in Table 7 with
the same list that appeared in the Year 2 Annual Report.
Training is now offered on products like OpenMP
that did not exist in 1997, and courses on topics such as
C++ have been replaced by Java courses.

The PET training program continued its evolution in Year 3 with more
emphasis on distance training technology and service to remote
users. The first full-blown distance training course was offered through
the Tango Interactive distance consulting system.  This was followed
by a second Tango course that was broadcast to all three of the
other MSRCs.  This year has also seen continuous development of Tango as 
a distance education tool through its application as a vehicle for offering 
graduate computer science courses from Syracuse University to Jackson State 
University.

The Fortran 90 course offered in September 1998 was our first Tango-based
course offered to remote users.  The course was broadcast over
the Internet to OSC, the course provider, and users at the ARL MSRC.
Additional Tango courses have been scheduled for 1999.

The PET training team supports not only training activities, but also
provides logistic and technical support to all areas of the PET
program.  PET played a major role in supporting the workshop
on Recent Advances in Computational Structural Mechanics and High
Performance Computing held at CEWES in November 1998.


Training Curriculum

PET training is designed to assist the CEWES MSRC user
in transitioning to new programming environments and efficiently
using the present and future SPP (Scalable Parallel Processing)
hardware acquired under the HPCM program.  The training curriculum 
is a living document with new topics being added continually to keep 
up with the fast pace of research and development in the field of HPC.  
The curriculum contains courses in the following general categories:

 * Parallel programming

 * Architecture and software specific topics

 * Visualization and performance

 * CTA targeted courses, workshops, and forums

Table 7 gives a list of all training courses taught during Year 3
with the organization offering the course, the number of students attending
the course, and the overall evaluation score of the course on a scale of
1 (poor) to 5 (excellent).  Unless otherwise noted, the courses were
held in the CEWES MSRC Training and Education Facility (TEF).


Training at the DoD HPCMP Users Group Conference

The CEWES MSRC PET program sponsored training activities at the
DoD HPCMP Users Group Conference at Rice in June 1998.  Five training courses
were held and are included in the list of courses in
Table 7.  These training courses had the
largest attendance in the history of the User Group Conferences.

The PET program also sponsored a PET Training Colloquium on
Distance Learning and Collaboration.
The colloquium was organized by Geoffrey Fox, PET Academic
Lead for Training and Collaboration/Communication, from Syracuse
University.  The speakers were Dr. Anoop Gupta of Microsoft
Research, Dr. Don Johnson of the DoD Advanced
Distributed Learning Initiative, and Prof. Fox.
The moderator was Dr. Louis Turcotte of CEWES MSRC.


Seminars

The CEWES MSRC PET program offers seminars on an irregular basis.  These are
presentations by experts in their field and are designed to introduce
the CEWES MSRC users to current research topics in HPC.  The following
seminar presentations were made during Year 3 at CEWES MSRC:

  Managing Scientific Data with HDF
  Dr. Michael Folk
  National Center for Supercomputing Applications (NCSA)
  University of Illinois

  Web-Based Instruction
  Prof. Geoffrey Fox
  Director, Northeast Parallel Architectures Center (NPAC)
  Syracuse University


Web-Based Training

During Year 3, four distance education courses were conducted over the Web.  
Syracuse delivered one undergraduate course (Web Programming) and two graduate 
courses (Computational Science for Simulation Applications and Advanced Web 
Programming) to Jacckson State, and Jackson State delivered one undergraduate 
course (Web Programming) to Morgan State University.  Syracuse also delivered 
the Advanced Web Programming course to Mississippi State and Clark Atlanta. All 
these offerings were full semester, for-credit courses delivered over the Web 
using the Tango collaborative software environment.  


TRAINING COURSE DESCRIPTIONS

This material appears on the CEWES MSRC PET Website as training course 
descriptions in advance of courses, hence the future tense.

Parallel Programming Workshop for Fortran Programmers

The workshop will begin with a one-day lecture on strategy,
tools, and examples in parallel programming. On the remaining
days participants will work with their own codes.  There will
be no attempt to prescribe a particular solution to the
problem of porting a code from the C90 to the scalable systems.
Rather, the instructors will work with the user to find the
best overall strategy, whether that best strategy is message
passing via MPI or PVM, or data parallel via HPF or OpenMP.
It may not be possible to parallelize a full blown application
program in a week, but the process can get started and a
continuing relationship can be established between the users
and the parallelization experts at the CEWES MSRC.

Using the Message Passing Interface (MPI) Standard

Message-Passing Interface (MPI) is the de facto standard
for message-passing developed by the Message-Passing
Interface Forum (MPIF). MPI provides many features needed
to build portable, efficient, scalable, and heterogeneous
message-passing code. These features include point-to-point
and collective communication, support for datatypes,
virtual topologies, process-group and communication
context management, and language bindings for the FORTRAN
and C languages. In this tutorial we will cover the
important features supported by MPI with examples and
illustrations. Also an introduction to extensions of MPI
(MPI-2) and message-passing in real-time (MPI/RT) will
also be provided.

Large Deformation Computational Structural Mechanics Applications
  on High Performance Computers using ParaDyn/DYNA3D

This course will begin with a DYNA3D lecture reviewing the features
added to the program since 1993.  Some of the recent features include
techniques for switching materials from rigid to deformable and back,
new material models and equations of state, recent developments in
element technology, and new contact methods.  This lecture will include
time for questions and answers about modeling and using any of the
features in DYNA3D.   The MSRC will provide attendees a summary of
steps required for submitting batch jobs to run parallel problems on the
Origin2000, Cray T3E and IBM SP.  This will include the design of script
files for the batch system, a discussion of the batch queues, and running
the batch utilities to follow the progress of a job.  The ParaDyn lecture
will feature discussions on the automated software for domain
decomposition, running the ParaDyn program, post-processing the
results for visualization, and the performance on parallel computers.
Techniques for efficiently handling contact boundary conditions and
future parallel capability releases will be discussed.  The lectures will
finish with a discussion of applications illustrating the power of parallel
computers in modeling problems of DoD interest.  On the second day the
instructor will demonstrate a sample problem preparation and execution
of a ParaDyn calculation on one of the parallel systems at CEWES MSRC.
Attendees will be able to run their own examples and work with the
instructor directly at this time.

Grid Generation for Complex Configurations

This course will cover an in-depth review of the
current state-of-the-art and state-of-practice in
geometry/grid generation applicable to complex
problems.  A step-by-step process starting from the
initial CAD definition or drawing of a configuration
and proceeding to the generation of a curvilinear,
hexahedral or cartesian grid and grid adaptation
techniques will be presented in detail.  Demonstrations
and hands-on computer lab exercises will be conducted
to explore the use of GUM-B, VGRID, CAGI, GENIE++,
TrueGrid, PMAG, CUBIT, and Hybrid2d systems for
practical applications of interest to CEWES MSRC users.

Java for Scientific Computing

The objective of this course is to provide the participant with

  (a) an understanding of the high performance computing architecture,
      including the World Wide Web for visualization,

  (b) an overview of the Java language and its capabilities, and

  (c) enough programming details to do some examples.

High Performance Fortran (HPF) in Practice

This course will introduce programmers to the most important
features of HPF, including features inherited from Fortran 90,
the data parallel FORALL statement and INDEPENDENT assertion,
and data mapping by ALIGN and DISTRIBUTE directives.  The
instructor will illustrate how these features can be used
in practice on algorithms for scientific computation such as
LU decomposition and the conjugate gradient method.

Performance Optimization

This course will focus on the optimization of numeric
intensive codes for HPC systems.  The course will begin with a
quick overview of the basics of performance and processor
architecture. Then it will cover a wide variety of optimizations
geared towards enhancing processor performance. Topics
will include efficient use of the memory hierarchy,
functional units, amortizing loop overhead and dependency
analysis. Common bottlenecks and caveats will be
discussed as well as proposed solutions, and the
logic behind them.

Topics in Finite Element Methodology for Nonlinear Problems

This course is broadly structured to cover different types of
applications from structures to fluids to heat transfer and
coupled problems that are of general interest to DoD.  Methodology
rather than specific applications is stressed.  Topics covered
include algorithms, nonlinear solution strategies, and integrating
solution with adaptive refinement.

A Tutorial on Designing and Building Parallel Programs

In this tutorial, the instructors will provide a comprehensive
introduction to the techniques and tools used to write
parallel programs.  First, the instructors will introduce
principles of parallel program design, touching upon relevant
topics in architecture, algorithms, and performance modeling.
Examples from well-established parallel programming
systems (HPF and MPI) will be included.  After the basic
material is covered, we will examine two new programming
systems for parallel machines, OpenMP and PETSc.

An Introduction to the Fortran 90

This course is aimed at introducing engineers and scientists
familiar with Fortran 77 to the new features and capabilities
available in Fortran 90.  These new features include free
form source code, the CASE control structure, the ability to
create new data types, modules (similar to C++ courses), array
processing shortcuts, dynamic memory allocation, pointers,
improved I/O handling, and a host of new intrinsic functions.
Source code compatibility between Fortran 77 and Fortran 90
will also be discussed.

Scalable OpenMP Programming on Origin2000

This is an advanced course.  Topics to be covered are:

  1. Overview of OpenMP programming model
  2. Review of execution model
  3. Moving beyond incremental parallelization mode
  4. Domain decomposition
  5. Comparisons with message passing
  6. Performance optimization on Origin2000
  7. Preview of OpenMP C/C++ specification

Introduction to MSC/PATRAN - Modeling for Design Analysis

This is an introductory course for new and/or infrequent
MSC/PATRAN user. Students will master the basic skills required
to use MSC/PATRAN in a typical MCAE application. This course
emphasizes practical skills development through comprehensive,
hands-on laboratory sessions. Students will learn to build
analysis models using MSC/PATRAN, by defining material properties,
creating boundary conditions, and submitting their problems for
analysis and post-processing the  results using a variety of
graphical formats. Specific topics such as CAD integration,
geometry editing, meshing, grouping, and customization will be
covered. Users of all FEA codes are encouraged to attended since
MSC/PATRAN supports all the popular FEA codes such as MSC/NASTRAN,
MSC/DYTRAN, HKS/ABAQUS, ANSYS, LS-DYNA and many more.

Tango for Remote Consulting

NPAC's Tango is a Web collaboratory. The system extends
capabilities of Web browsers towards a fully interactive,
multimedia, collaborative environment. Tango is also a
framework for building collaboratory systems. In this
tutorial we will instruct how to use Tango and will
cover applications of Tango for remote consulting,
including all the critical software development phases:
coding, compiling, testing and debugging, result analysis.

Parallel Debugging and Performance Analysis Tools: TotalView and Vampir

The goal of this course is to introduce parallel application
developers to parallel debugging and performance analysis
tools available on CEWES MSRC platforms, and to provide more
in-depth coverage of the TotalView debugger and Vampir
performance analysis tool.  The course will cover the basics
of using the tools as well as provide pointers to further
information.  A lab session will include practice on using
the tools on some example programs.  Debuggers to be covered
include Dolphin TotalView 3.8 for the SGI/Cray Origin2000
and IBM SP, Cray TotalView for the Cray T3E, SGI dbx for the
Origin2000, and pdbx for the IBM SP.  Dolphin TotalView has
a graphical interface while dbx and pdbx provide command-line
debugging interfaces.  The Cray version of TotalView for
the Cray T3E has both graphical and command-line interfaces.
An overview will be given of the various performance analysis
tools available on CEWES MSRC platforms, but the performance
analysis portion of the course will focus in detail on the
Vampir tool which has been recently acquired and is now
available on CEWES MSRC machines.

Techniques in Code Parllelization

The techniques needed to parallelize an algorithm and code are
described.  These includes discretization methods, domain
decomposition, linear and nonlinear solver issues, mesh partitioning,
load balancing, preprocessing, and postprocessing.  Examples of
parallelization efforts carried out at the University of Texas will
be given.   Participants will also have a chance to bring their
"dusty deck" codes for discussion on how best to migrate them
to parallel platforms.

Parallel Programming on the Origin2000 using OpenMP

This "how-to" workshop is designed to train the participants
in the techniques and tools required to perform parallel
programming using OpenMP directives on the Origin2000(O2K).
After a discussion of the MIPS R10000 processor, the O2K
architecture, and an introduction to the IRIX operating system
creation and scheduling of parallel threads, the OpenMP
directives will be discussed in detail along with examples
of their use. The course will conclude with the equally important
topic of how to distribute the data used by parallelized Open MP
regions among the local memories on the O2K.

Computational Monitoring Using CUMULVS

Computational monitoring lets you visualize simulation output
while your computation executes. This can be useful for users
with codes that produce very large output files, or when you
want to stop a run that is not progressing satisfactorily.
This "how-to" workshop will educate participants in the
techniques and procedures required to perform interactive
computation using currently available tools. The workshop
will include:

  1. An overview and discussion of available tools, commercial or freeware
  2. An introduction to CUMULVS from Oak Ridge National Lab
  3. Detailed steps for how to instrument your code to use CUMULVS
  4. Discussion of possibilities for computational monitoring
     for participants' codes.

Interactive Structured Time-varying Visualizer (ISTV)

This tutorial gives an introduction to the Interactive
Structured Time-varying Visualizer (ISTV), an OpenGL-based
scientific visualization package available on IRIX and
Solaris.  ISTV is an interactive visualization system that
visualizes time-varying multiblock (or multigrid) simulations
on time-varying grids.  ISTV's genesis was in the need for a
toolkit to visualize data from high-resolution ocean models.
By exploiting modularity and plug-ins, the scientist has the
ability to tailor the ISTV visualization system to the needs
of a particular discipline or problem without having to write
a completely new system.

WebFlow: Web Interfaces for Computational Modules

In this course we will present the WebFlow system developed at
Northeast Parallel Architectures Center (NPAC) at Syracuse
University. This system addresses the needs for high level
programming environments and tools to support distance
computing on heterogeneous, distributed platforms.
During the course we will describe and demonstrate the WebFlow
system. This will include background information on CORBA
and developing CORBA objects in Java. We will present the
architecture of WebFlow, discuss its security model, and
methods of providing a seamless access to remote resources. The
course will be focused on applying WebFlow to the users'
applications. We will explain how to customize the WebFlow
front-end to the need of a particular application, and how to
invoke and control the users' computational modules.