Tuesday: PSE PSE is NOT domain specific Algorithms Software deployment Standards Linux is future!!! (as multiple vendor O/S) e.g. SGI O/S work "wasted" POOMA not part of milepost; it is part of PSE Snow small same O/S 128 CPU's White 2176 nodes merged February 2001 as white Frost 6016 nodes Upgrade 16 gigabytes per 16 processor node 2 more small machines coming Troutbeck Mohonk Mohonk2 is Software Machine stable and 100% used Unclear planning processe POOMA not in as all left No timeline for what works when Unclear analysis of what "holds users up" Zosel slide 7 is guidelines ALE3D: Should be flat at 2 OpenMP flat at 6 MPI doesn't scale LANL Lot of F90 UPS and PGSLib are high level messaging SAGE Scaling looks rubbish (Brown talk) Y axis is number of cells (i.e. 1/time) NOT Time Sandia Code CTH F77 "easy" Allegra -- doessn't work MPQC Chemistry MPI blocks threads when waiting multi-threaded to overlap communication/calculation Is this Monte Carlo and so little communication? similar to global arrays Not very thoughtful analysis of (MPI) scaling results MPI Analysis not part of MilePost Working performance issues not part of Milepost 3 OpenMP/MPI 6 MPI only 1 Pthreads F90 stresses features Most difficulties are C++ or F90 "flaky features" Don't use PGI and GNU? Some of mileposts satisfied rather perfuncturely Description focussed on convincing us Milepost met and not what intellectual issue was Use XML as tool interface ---- These are essentially serial Profile 5 applications 3 at scale ZeroFault originally serial slows down factor 5 to 20 ZeroFault is NT / AIX only GreatCircle factor of 1.1 to 2 ----- End essentially serial TotalView Scaling? is it that important Success story Lifecycle Good enough? Across enough platforms Relevance of mileposts Other committees reviewed mileposts Frameworks are not very succesfulhich Hardware counter multiplexing 4 counters Power604e/Power3 each can only count a subset of events You choose which Software from IBM makes counters track processes not CPUs (shared memory migration) Sphinx Test Suite Added MPI Collective and Ptheads to SkaMPI Allreduce problems 2 conference papers mpiP mpi identify hotspot tool Art Hale / Steven Humphreys GCE ********** Pamela Cotes ********* Sphinx to PEMCS ******* Gannon Dzierba ******** STI Physics ASCI call *********** Derek Moses ********* ASCI DisCom System to PET ******* Gaitros 3 Students ********* Anabas local operation ******** parcel to mailbox ************* IEEE Hussaini sorry ******** April 25 27 Trip April 30 May 1 Trip Send garnet to Art/stephen ****** Design of Replay **************** User on vertical scale Time horiz scale Events marked with color code Can Zoom and click to see application see my email to ho/yin CpandE Referees ****************** Sandia Kerberos delivered with Globus ************* PET Quality ******* So I think we can get material of right quality Roughly we want 1 per Focus Area I suggest a) Agree on small committee -- how about You me Pritchard Perkins This committee somehow writes the "Overview of HPCMO/PET Paper? b) Invite each FA lead to suggest 1 or 2 papers c) Invite each MSRC to suggest up to 5 d) Integrate b) and c) and contact potential authors in some ranked order. Depending on interest get from 6-15 papers OpenMP MPI Integration Issue with MPI being done in single threaded mode Stress test GPFS General Parallel File System Network NOT Disk is limit Like higher level than MPI-IO such as HDF5 PSE dominant HPSS funding source Use Gigabit Ethernet Machine to HPSS 91% writes 9% read Don't go though GPFS -- direct to HPSS HPSS not necesarily backed up GPFS not automatically backed up Sierra DCE: not high performance Sierra has problem of Kerberos ticket expiring while long job runs. Aiming to build glued applications from a single code source PSE Local Software No databases No portable GPFS No more new codes for 2004 -- maybe new codes for 2010 Budgets:per year PSE $51.7M split three-way Discom $46.5M mainly Sandia (including cluster computing) VIEWS $70.5M plus $10M platform at Livermore Pathforward $32M ASCI total $700M 1/3 PSE DISCOM VIEWS Pathforward 1/3 Platform 1/3 Apps $25M Alliances "Grid Services"? Scripts embody old site specific UI's Dollars! Why is Q called Q not a color: Is debug local or remote Mainly local --- PSE Problem to get unified tools? Is this technical or political decision New Ice machine at LLNL for code testing so White production IBM can't deliver compatible Telecons -- why not WebEx VIEWS -- move data locally; then analyse ** Peer to Peer Earthquake Simulation **************** Science Scenarios Computational Science Environment Peer to Peer Object Web Computational Grid Garnet ---------------------------------------------------------------- Visualization REMOTE/collaborative visualization Need to use parallel channels to HPSS Used to go through Gateways -- bottleneck Currently 30 users ONLY NSA provides chips for encryption As Top Secret, by law must use NSA 4 way parallel; unique IP on each path Networking is State of the Art -- Software is NOT HPSS needs a lot of acknowledgement as tapes can runnot out! Modify this as WAN has much longer round trip time FTP versus NFS (Security, Parallelism, Opotimized protocols removing roundtrip blocking messages) Encrypters do not encrypt ATM headers April 00 RFQ 3 year contract as price going down 2 links are about $3M a year ATM and GigE connections Implies CISCO 8540 not Juniper A machine a year!!! April 2002 30T Q Machine Los Alamos 2003 20T Redstorm Sandia 2004 60T Livermore 2005 100T Los Alamos Linux needed to make more rational Wallach Petaflop has Java Do we have money to afford so much hardware Use Remedy database for trouble tickets Not using WebEx or equivalent for remote desktop access Security problem Grid Services 16 software developers DRM currently not at LLNL NASA: Brokers used most in parameter studies DPCS is LC scheduling system DRM Services separate from Globus Services DRM CORBA based Added Kerberos to Globus GSF is lower level accredited security system Put Kerberos or PKI on top of it Currently XML Scripts built by developers GALE access language WFMQ Industry Workflow Condor Browne like dataflow graphs Automated backup/Restart wanted by user XML for TotalView and other tool events **** Computational Steering -- can't be validated Grid Service Rollout Join GCE WG Tutorial by user DRM as a global file system What does Sandia CORBA system do (It is aimed at smaller computations) "ASCI does not do databases" Vision is a "continuous process" Users manual Same as a requirements document Hero driven 5 person code teams cant make 20-40 people teams work $250M year on codes Is there a better way POOMA Quinlan A++ Large Groups have failed Frameworks have failed PSE 30 FTE LLNL mainly HPSS Purify liked by everybody does not work on AIX Limited by people not dollars Distance Computing Solutions $20.5 $4M Routers $3.7M WAN SecureNet $2M NSA $2M Plants (Y12 KC) Rest people Integration of Computing Resources $9.5M $2.5M CR Process $4.3M Grid Services $1M Security Computational Capacity $17M $10M Cplant Hardware $5M Software $2M Cplant Integration Program Management $1M Parallel FTP? Napster BDS I appreciate your very reasonable concerns and I am happy to either talk on the phone or visit the department -- I am on travel early next week but I could visit next thursday for instance. Please call my cell phone 315 254 6387. Here are a partial set of remarks addressing some of your issues. All my work in computer science is application driven and exploits my physics training and expertise. -- a major activity of this type in physics was work I did in numerical relativity (a Syracuse speciality) where I helped design and implement the simulation environment. Over the last few years, I have been collaborating with colleagues (from my Caltech days) at JPL and others on Earthquake simulations (GEM). This This has substantial involvement from Physics community as earthquakes are a very interesting "Complex System" -- an area I view as promising for physics. I expect this work to grow in interest and importance. I have submitted a proposal with Alex Dzierba to develop an advanced environment for the HallD experiment. This activity makes extensive use of my physics knowledge and illustrates my goal to perform interdisciplinary research that exploits my understanding of applications and computer science. I hope we can make this a major thrust of the "IPCRES lab" and I would be honored to be a member of his experiment. This research will benefit other experiments such Atlas. Physics graduate students for both GEM and the experimental science "informatics" would be great. I have done a lot of work on web-based education and would certainly like to work on this with the department. I am happy to serve on committees as long as total among the multiple departments was satistfactory. I was a reasonable department chair in physics at Caltech (called executive officer) and understand the importance of this. Teaching for an "IPCRES" position seems a little peculiar and I am not sure I understand this correctly but I intend to continue teaching. For physics a web-based course on "Computational Physics" or "Physics Informatics" would be natural. I would appreciate a joint appointment including physics as this reflects my interest and commitment to interdisciplinary research and education.