Tuesday: PSE
PSE is NOT domain specific
	Algorithms
	Software deployment
	Standards

Linux is future!!!  (as multiple vendor O/S)
	e.g. SGI O/S work "wasted"

POOMA not part of milepost; it is part of PSE

Snow small same O/S 128 CPU's

White 2176 nodes merged February 2001 as white
Frost 6016 nodes

Upgrade 16 gigabytes per 16 processor node
2 more small machines coming

Troutbeck Mohonk Mohonk2 is Software

Machine stable and 100% used

Unclear planning processe
	POOMA not in as all left
	No timeline for what works when

Unclear analysis of what "holds users up"

Zosel slide 7 is guidelines

ALE3D: Should be flat at 2
	OpenMP flat at 6
	MPI doesn't scale

LANL
	Lot of F90
	UPS and PGSLib are high level messaging

SAGE Scaling looks rubbish  (Brown talk)
	Y axis is number of cells (i.e. 1/time) NOT Time

Sandia Code
CTH F77 "easy"

Allegra -- doessn't work

MPQC Chemistry
	MPI blocks threads when waiting
	multi-threaded to overlap communication/calculation
	Is this Monte Carlo and so little communication?
	similar to global arrays

Not very thoughtful analysis of (MPI) scaling results
MPI Analysis not part of MilePost

Working performance issues not part of Milepost

3 OpenMP/MPI
6 MPI  only
1  Pthreads

F90 stresses features

Most difficulties are C++ or F90 "flaky features"

Don't use PGI and GNU?

Some of mileposts satisfied rather perfuncturely
Description focussed on convincing us Milepost
met and not what intellectual issue was

Use XML as tool interface

---- These are essentially serial
Profile
5 applications
3 at scale

ZeroFault originally serial slows down factor 5 to 20
	ZeroFault is NT / AIX only
GreatCircle factor of 1.1 to 2
----- End essentially serial

TotalView
	Scaling? is it that important
	Success story

Lifecycle
	Good enough?
	Across enough platforms
	
Relevance of mileposts
Other committees reviewed mileposts

Frameworks are not very succesfulhich

Hardware counter multiplexing
4 counters Power604e/Power3 each can only count
a subset of events
You choose which

Software from IBM makes counters track processes not CPUs
(shared memory migration)

Sphinx Test Suite
Added MPI Collective and Ptheads to SkaMPI
Allreduce problems
	2 conference papers

mpiP mpi identify hotspot tool

Art Hale / Steven Humphreys GCE **********
Pamela Cotes *********
Sphinx to PEMCS *******
Gannon Dzierba ********
	STI
	Physics
ASCI call ***********
Derek Moses *********
ASCI DisCom System to PET *******
Gaitros 3 Students *********
Anabas local operation ********
parcel to mailbox *************
IEEE Hussaini sorry ********
April 25 27 Trip
April 30 May 1 Trip

Send garnet to Art/stephen ******

Design of Replay ****************
User on vertical scale
Time horiz scale
Events marked with color code
Can Zoom and click to see application
see my email to ho/yin

CpandE Referees ******************
Sandia Kerberos delivered with Globus *************

PET Quality *******
So I think we can get material of right quality
Roughly we want 1 per Focus Area

I suggest 
a) Agree on small committee -- how about
You me Pritchard Perkins
This committee somehow writes the "Overview of HPCMO/PET Paper?

b) Invite each FA lead to suggest 1 or 2 papers
c) Invite each MSRC to suggest up to 5
d) Integrate b) and c) and contact potential authors
   in some ranked order.
   Depending on interest get from 6-15 papers

OpenMP MPI Integration
Issue with MPI being done in single threaded mode

Stress test GPFS
General Parallel File System
Network NOT Disk is limit
Like higher level than MPI-IO such as HDF5
PSE dominant HPSS funding source
Use Gigabit Ethernet Machine to HPSS
91% writes 9% read
Don't go though GPFS -- direct to HPSS
HPSS not necesarily backed up
GPFS not automatically backed up

Sierra DCE: not high performance
Sierra has problem of Kerberos ticket expiring
while long job runs. 
Aiming to build glued applications from a single
code source

PSE Local Software

No databases
No portable GPFS

No more new codes for 2004 -- maybe new codes for 2010

Budgets:per year
PSE $51.7M split three-way
Discom $46.5M mainly Sandia (including cluster computing)
VIEWS $70.5M plus $10M platform at Livermore
Pathforward $32M

ASCI total $700M
1/3 PSE DISCOM VIEWS Pathforward
1/3 Platform
1/3  Apps
$25M Alliances

"Grid Services"?
Scripts embody old site specific UI's

Dollars!
Why is Q called Q not a color:

Is debug local or remote
	Mainly local --- PSE Problem to get unified tools?
	Is this technical or political decision
	New Ice machine at LLNL for code testing so White production
	IBM can't deliver compatible

Telecons -- why not WebEx

VIEWS -- move  data locally; then analyse

** Peer to Peer Earthquake Simulation ****************
Science Scenarios
Computational Science Environment
Peer to Peer
Object Web
Computational Grid
Garnet

----------------------------------------------------------------
Visualization
	REMOTE/collaborative visualization

Need to use parallel channels to HPSS
Used to go through Gateways -- bottleneck
	Currently 30 users ONLY

NSA provides chips for encryption
	As Top Secret, by law must use NSA

4 way parallel; unique IP on each path

Networking is State of the Art -- Software is NOT

HPSS needs a lot of acknowledgement as tapes can runnot out!
	Modify this as WAN has much longer round trip time
	FTP versus NFS (Security, Parallelism, Opotimized protocols
	removing roundtrip blocking messages)

Encrypters do not  encrypt ATM headers

April 00 RFQ 
	3 year contract as price going down
	2 links are about $3M a year

ATM and GigE connections
	Implies CISCO 8540 not Juniper

A machine a year!!!
April 2002 30T Q Machine Los Alamos
2003 20T Redstorm Sandia
2004 60T Livermore
2005 100T   Los Alamos

Linux needed to make more rational
Wallach Petaflop has Java
Do we have money to afford so much hardware

Use Remedy database for trouble tickets

Not using WebEx or equivalent for remote desktop access
	Security problem

Grid Services
	16 software developers

DRM currently not at LLNL
NASA: Brokers used most in parameter studies

DPCS is LC scheduling system

DRM Services separate from Globus Services
DRM CORBA based
Added Kerberos to Globus

GSF is lower level accredited security system
Put Kerberos or PKI on top of it

Currently XML Scripts built by developers
	GALE access language
	WFMQ Industry Workflow
	Condor
	Browne like dataflow graphs
	Automated backup/Restart wanted by user

XML for TotalView and other tool events ****

Computational Steering -- can't be validated

Grid Service Rollout
Join GCE WG
Tutorial by user

DRM as a global file system
What does Sandia CORBA system do (It is aimed
at smaller computations)
"ASCI does not do databases"

Vision is a "continuous process"
	Users manual
	Same as a requirements document

Hero driven 5 person code teams
	cant make 20-40 people teams work

$250M year on codes
	Is there a better way

POOMA
Quinlan A++

Large Groups have failed
Frameworks   have failed

PSE 30 FTE LLNL mainly HPSS

Purify liked by everybody
	does not work on AIX

Limited by people not dollars

Distance Computing Solutions $20.5
	$4M Routers
	$3.7M WAN SecureNet
	$2M NSA
	$2M Plants (Y12 KC)
	Rest people

Integration of Computing Resources $9.5M
	$2.5M CR Process
	$4.3M Grid Services
	$1M Security

Computational Capacity $17M
	$10M Cplant Hardware
	$5M Software
	$2M Cplant Integration

Program Management $1M

Parallel FTP?
	Napster
	BDS

I appreciate your very reasonable concerns and I am happy
to either talk on the phone or visit the department -- I am
on travel early next week but I could visit next thursday for instance.
Please call my cell phone 315 254 6387. Here are a partial set of remarks
addressing some of your issues.

All my work in computer science is application driven and exploits my
physics training and expertise.
-- a major activity of this type in physics was
work I did in numerical relativity (a Syracuse speciality)
where I helped design and implement the simulation environment.

Over the last few years, I have been collaborating with colleagues
(from my Caltech days) at JPL and others on Earthquake simulations (GEM). This
This has substantial involvement from Physics community as earthquakes
are a very interesting "Complex System" -- an area I view as
promising for physics. I expect this work to grow in interest and
importance.

I have submitted a proposal with Alex Dzierba to develop an advanced
environment for the HallD experiment. This activity makes extensive
use of my physics knowledge and illustrates my goal to perform
interdisciplinary research that exploits my understanding of
applications and computer science. I hope we can make this a major
thrust of the "IPCRES lab" and I would be honored to be a member
of his experiment. This research will benefit other experiments such
Atlas.

Physics graduate students for both GEM and the experimental science
"informatics" would be great.

I have done a lot of work on web-based education and would certainly
like to work on this with the department. 
I am happy to serve on committees as long as total among the multiple
departments was satistfactory. I was a reasonable department chair in
physics at Caltech (called executive officer) and understand the importance of this.

Teaching for an "IPCRES" position seems a little peculiar and
I am not sure I understand this correctly but I intend to continue teaching. 
For physics a web-based course on "Computational Physics" 
or "Physics Informatics" would be natural.

I would appreciate a joint appointment including physics as this reflects
my interest and commitment to interdisciplinary research and education.