CRPC Book **************************** Bill Gropp examples including figures Pthreads PVM MPI Give John Salmon Hell John Rice on PSE Chapter Give CRPC Web Site CpandE ********************* Problem reflects field. There is a group of people. No clear technology focus Name Information Tech: P and E Section in an Issue Number of Issues 15 by Administrative Process Software Change Machanism Ninja: a) Get short Whitepaper on collaborative portals b) Contact Matt welch Presentationpro.com Powerpoint Templates Windows NT Laptop from Dell Delta Crown room Review Sprint 30 minute limit Light for PC White paper on what to do Invite people to FSU Heather meet friends and school Many ITR Coupled Ocean Tom Green C little Java DQS Masters in CS BS in business Jim Carr Java Applets-- Web Education Rutherford Scattering; Baseball home runs Teaching Science Tally Gifted Middle School Geography GIS Fuzzy Searches Salary Scales Staff Untenured Faculty Joe Thompson says Dennis Duke saved SDSC Why didn't NSF reward him. (at NCAR) Wally at FSU Good (Ken Kennedy) ISI at USC Doing distance education using AI and multimedia Lori (Ken's Student) doing new math education Mascagni doing Analysis of Algorithms in distance education Dennis Duke says very good IP and Indemification issues FSU was on original team Indemnification moot for state university due to separation of federal and state laws CEWES $5K **************** e books GPS Contact Sia re ESCAPE 2000 Alo Java Academy working with Florida A and M Jim Bottum: why go to FSU RCI: Check CDROM OK (20% blank) Sun Meeting November 14 99 ********************** Erik Hagersten: Scalability Based on TMC and Cray technology One Scalable Interconnect: 2 Personalities SMPplus Single Solaris Image across multiple boxes NUMA coherent caching not efficient Coherent Memory (as opposed to coherent cache) Replication: pages replicated for performance by viewing local memory as a cache Pages that could benefit from this are identified automatically Fat Nodes are important here as overheads lower Good TPC-C results High Performance Clustering Fast MPI and OpenMP < 2000 CPU Use special switch combined with CMR hardware memory to memory block transfer primitives 10,000 nodes: Use extra SMP boxes as switch nodes with backplane on each at 43 Gigabytes/sec Bill Nesheim: Wildcat Scalable Systems Wildcat has link ASIC with optical interconnects links upto 50 meter with decreasing performance as length increases can link through PCI or CPU interfaces. Also High performance 8 port switch for more than 3 to 4 units In between extremes use multiple switches Use Starcat as a Switch for your 5 teraflop system! Wildcat supports 2 personalities Fault Tolerance 1(coherent memory): upto 16 Serengeti nodes max 384 nodes 2(clustering): Upto 162 Serengeti/Starcat 3 (600 ns) to 1 (200 ns) ratio remote to local latency Solaris has special locality support Emerging standard in O/S bypass messaging -- not yet VIA not really this So use internal standard Available 2001 CMR mode will support single Java VM Base Systems are: Serengeti 24 way maximum next generation of E3500-E6000 Starcat 72 way maximum (18 boards, 4 CPUs per board) this is next generation E10000 Performance Lisa Noordergraaf Lot of Open MP directives Red-Black iteration. Performance increases with time due to migration. Serengeti and Starcat Tai Quan 3 times faster than EX500 Fault Isolation/Detection/Tolerance SP 8 nodes rack mountable MD 12 ME 12 DC 24 Starcat 72 Cheetah UltraSparc III 600 Mhz used initially III+ 750 Mhz IV 900 Mhz IV+ 1050 Mhz 2 flops per cycle SP 8 CPU 64 Gigabyte memory 4 per rack 12 hot pluggable CompactPCI (no PCI) MD Deskside machine runs off 110 volts 12 CPU 96 gigabytes memory PCI or cPCI ME Datacenter Rack similar to MD 220 volts DC Datacenter 24 CPU 192 Gigabytes memory Starcat 72 to 104 CPU's 576 gigabytes 72 PCI or CPCI slots 18 domains Starcat SAME CPU board as Serengeti CPU board upto 4 CPU's per board -- can be Ultrasparc III or IV 32 gigabytes memory per board Memory 3.3 volts CPU 1.8 volts Memory Bus 256 bits data wide plus ECC etc I/O bus 128 bits First Customer Ship End 00. Wildcat 6 months later Paced by software Less I/O per CPU than PC (Chien) Swiss Center of Scientific Computing: Karsten Decker ************************** 1992: Service Center 1999: Science Center (Swiss center of excellence in Scientific Computing) More emphasis on Biological Sciences and so will procure large SMP Storage and Archiving Meteorology and Climatology strong at the moment SDSC Experience E10K 64 processors Ultrasparc II 400 Mhz Hoped for: Solaris 8 HPC 3.1 Workshop 6.0 has openMP LSF version specific for Sun SSSL Libraries Veritas Volume Manager Sun Resource Manager Hippi ATM Gigabit Ethernet (GBE) Easier to install software on E10K than on SP with same number of nodes Performance .75 time T3E Alpha 21164 .4 on another code Good gravitational performance with better 32 node efficiency than SP2 30% slower than Tera on volume renderer of visible human Several Security fixes Interval Arithmetic Walster **************************** 1954 IBM first introduced floating point support in 704 F95 Workshop 6.0 includes interval arithmetic Later C++ C Java Bill Joy interested in Hardware support fellow is not a scientist Appears not to understand that errors are not always intervals but often distributions which are approximated by intervals Confuses software/design bugs with floating point bugs (e.g. aircraft safety is a logical not a floating point bug) Will intervals replace turbulence model uncertainty Newton laws integration and volume rendering are more plausible In general he appears not to understand role of errors EPCC *********************************** Java Grande Benchmark Compiled code off a factor of 2 openMP for Java based on native threads Looked in general at openMP -- technology report and courses performance comparison of different paradigms MPI versus openMP (slower but easier to produce) HPF v openMP collective MPI versus point to point MPI Good benchmarks Buffalo Russ Miller ************************** 64 node Ultra5 running Linux connected with 100 megabit/sec ethernet 5% of cost of Origin2000 and SP2 New cluster with gigabit ethernet good Talk Trotter MSU ERC ******************************* Wants to upgrade JavaCADD MSU gets infrastructure from university other annual budget budget about $14.5M Has large oceanography effort at Stennis What about E10K? Compare to R10000 based Origin CFD Applications Single Processor. SGI Best 4 Processor: Sun Best SLAC **************************** Asymmetric B factory C++ and OODBMS Objectivity 310 Ultra5's being replaced by a denser packaging Lots of million line compiles 7 seconds per event on Ultra 5 and 100 events per second ANU *************************************** Scientific Supercomputer center for Australia SGI Meeting ******************************** Linux Cluster Itanium Norman Universe adaptive Mesh requires conventional supercomputers Their NT Cluster 550 MHz MPI almost as fast as Origin Linux at UNM 3 Supercomputer eras Vector Processors RISC Workstation with proprietary UNIX IA-64 EPIC PC Servers HPC Open Source Movement Extreme Linux FFT, Linear Algebra Globus, HPVM, MPICH Applications -- Cactus, QCD open source VTK Visualization ASCI Red glimpse of future at Sandia 10,000 IA-32 Pentium Pro demo: Cactus on 4 node Itanium 4 megabytes on board memory 4 double precision floats per cycle 2 gigaflops on Linpack 128 integer registers 128 floating point registers 512 megabytes memory per processor 100 megabit ethernet First Silicon 3 months ago SGI 512 node system New business model: Modular Open Systems Clustered and Tightly coupled systems UNICOS and IRIX Software to Open Source Palm Pilot Software 2.5 cents cost per transaction $3 software -- Sold 6 million copies Chien **** NT Clusters ********************************* VIA user level messages without buffer management/flow control His fast messages add buffer management/flow control Fall 1997 persuaded Smarr to build NT cluster (Myrinet) 256 Processors in Alliance Cluster (now 50% 300 50% 550 Mhz) Good discussion of communication bandwidth and communication round trip time per flop T3E Beowulf O2K SP2 NT Cluster 550 Mhz parts have faster memory subsystems and get better than a factor of 2 over 300 Mhz Sandia Kudzu NT Cluster jointly with Compaq Cornell Theory Center: 64 4 node Xeon and several other smaller clusters Gigabit Network moved codes from SP2 to NT Info Web Sites: NCSA Site, Chien link, Cornell IDE disks are much cheaper than SCSI and now almost as fast 16 to 128 node PC lusters are quite straightforward Itanium or X86 1K to 16K Itanium 3 gigaflop nodes 64 node cluster $500 nodes $32K dollars 64 Intelligent Disks $6K dollars Jini Universal plug and play 64 network devices http://www-csag.ucsd.edu ***** eSCape 2000 *********************************** Wireless Tablet Iway in SC95 Stephen Jones John West Interact with HPC Systems Portals Aggregate Dallas Currently 3 megabit/sec wireless network 100 megabit sec planned Laptops Palm PC's Pagers etc Not a competition Posters + Demos Escape from desktop HPC Anywhere "any" HPC vaguely eSCape Infrastructure eSCape Applications One place with theater, posters Al Geist PVM on Palm Obviously look at Java MPI on Palm