JISC NTI Review
December 13, 1996
Geoffrey Fox
Syracuse University
Recommendations/Comments
- Cluster software evaluation projects did good work but dissemination
and continuing evaluation not part of NTI. This means that even
as clusters get more attractive, there is no natural way for potential
purchasers to tap into results of NTI.
- Most cluster evaluation projects terminated 1-2
years ago while the JISC funded general and very complete review
led by Mark Baker could be sharpened up for the operational task
of aiding an organization that wants to buy cluster management
software today!
- Recent technology watch review in this area by
Edinburgh is helpful input
- Need to include ongoing evaluation of Windows NT and Linux
PC clusters into cluster evaluation program. These are likely
to be clustering solution with the broadest appeal.
- This seems an urgent need as many organizations
face need to replace aging equipment while new institutions in
HPC field are most likely to use the PC solution as it offers
best cost performance ratio.
- This should include cluster management software
(public domain and commercial) as well as PVM, MPI and HPF.
- Coordinate with new JTAP activity led by Chester
College
- Should look again at CHEST university wide purchase
agreement for key identified software.
- The NTI Cluster and HPC training and education work would
have been improved in some cases with greater involvement of individuals
active in research arena. HPCC is emerging technology which needs
leading edge research expertise in projects involving it.
- Short term non academic appointments are not
effective if one wishes to put together teams of individuals at
the leading edge.
- This recommendation would tend to change balance
of some teams towards academic and away from some computer service
organizations. However two of the most successful projects (Manchester
and Edinburgh HPCTE) were led by computer services and so this
remark should be interpreted carefully. In general one should
carefully examine credentials of bidding teams!
- Part of HPCTE should be focused on integration of advanced
computation into groups, organizations and disciplines which are
not using it now. This is likely to be more attractive than particular
courses on PVM, HPF etc.
- Alternatively view as promotion of academic discipline
of computational science rather than narrower goal of using HPC
in research i.e. grow the use of computation as a respected academic
methodology to complement theory and experiment
- This generalized goal will be particularly helpful
in enhancing outreach to institutions and disciplines which have
not traditionally used HPC and/or are not so active in research.
- Successful HPCTE centres were "creative" (broad)
in their interpretation of HPC as "advanced computation".
This should be encouraged
- Some Examples are:
- Manchester: Visualization, Java, VRML
- Belfast: Datamining
- Edinburgh: Business, Computational Science
- The HPCTE teams typically documented well the outreach effort
in terms of courses taught and similar activities but there was
little documentation of impact which could be measured in terms
of:
- Research Projects initiated
- Academic courses (modules) and programs (such
as Computational Engineering and Design Centre at Southampton)
established and/or enhanced (my preferred impact area)
- Belfast did assess their project well
- We recommend an effort to assess more precisely JISC HPCTE
impact as this assessment could indicate those approaches which
were most effective
- Our prejudice is that courses in technologies
such as MPI and HPF are appropriate to enhance leading research
but that broader outreach material (computational science) would
be more effective in integrating computation into academic programs.
The latter is our suggestion for most promising focus of JISC
funding.
- Establishment of "Centres" focused on computation
are a good indication of the integration of computation into HE
institutions.
- Need to make JISC work live on in terms of activities
that are naturally funded by the HE institutions -- teaching of
computationally augmented courses is one of best survivable mechanisms
- Some of the projects seemed to have had disappointing institutional
commitment to follow up and continue efforts "jump-started"
by JISC. This follow-up commitment should be part of evaluation
of new bids.
- A focused effort to collect and catalog material produced
by JISC NTI on a single Web resource would be very valuable
- The software component should be coordinated
with the US "National High Performance Software Exchange".
- Educational material should be commented as to
type and level of material and course material from different
sources on the same general area should be contrasted.
- Integrate the two SEL-HPC Web catalogs into this
new Web resource and clarify Intellectual property issues in these
catalogs.
- Include both HPCTE and other NTI sources of educational
material.
- Some of the US activities (e.g. the Virtual Workshop at Cornell)
have been more aggressive than UK in using advanced Web technology
(e.g. Java applets) as a teaching tool.
- More generally I recommend that JISC identify
and use international "Best of Practice" in rapidly
improving Web dissemination technology. This should involve several
JISC teams and should preferably involve a few relevant experts
from the US. This could naturally be an Edinburgh Technology Watch
report but obviously other mechanisms are possible
- The least successful class of projects where those that focused
on developing novel software as JISC funding was insufficient
to produce software that could make significant impact
- Some of software development projects were essentially
research and not "capitalization" efforts and so UK
benefited from activity even though funding would have more naturally
come from Research Councils
- In NTI projects developing software , need a plan for supporting
it after end of project
- There are non trivial IPR issues in making education material
available on the Web but probably as field changes so rapidly,
there is more to be lost than gained by "protecting"
training material either with passwords or by using an inconvenient
(postscript) dissemination method.
Notes for gcf
- JISC pays no overhead to Universities - typically this is
40% of non capital costs
- JISC is part of HEFCE (Education support for England) which
tries to raise valleys
- Research councils focus on enhancing peaks i.e. leading edge
research
- Note US ambivalence about HPCC peaked in January 95 with "Pasadena2
Meeting" and negative government reports - mainly on lack
of industry uptake
- However recent NSF solicitation and responses
was upbeat and focused on successes!
- more focus on computational science education in US than UK
- 2 significant previous HPCTE reviews (Lavington, Davis)
- No previous formal cluster reviews
- Lavington review was obviously unfortunate as confused some
HPCTE's by suggesting "goal posts" had changed and
further by controversial style of actual review
- maybe JISC should in retrospect have addressed
this damage?
- Actual report has peculiar quantitative measure of performance
which seemed unjustified but did not press outreach issue too
hard
- Lavington review should have added some modest number of additional
goals with broader discipline participation
- Several different cluster software systems being used - DQS
NQS LSF Codine (Condor) - is there any value in focusing. JISC
did not spend spend a lot of money though on multiple reviews.
Rather point is a "CHEST" issue - will deployment of
one focused system be useful!
A)8 Training and Education Projects
HPC Training and Education Centre Queen's University
Belfast (DENI funded)
- Large (but not largest) size HPCTE project funded by DENI
(Northern Ireland) at 118,000 pounds per year
- DENI also funded an 8 Indy Cluster with FORE ATM switch
- Presented by Dr. R.Rankin of Computer Services at Queens
- Remember Burke (Physics) and Perrott (Computer Science) indicate
quality at Queen's
- Burke is using T3D for some codes
- Had a substantial HPCC effort with 10 people with NTI/Queens/"Contract"(Europe,
Ordnance Survey, Cray) support. Now about 2!
- Also machines - even cluster - are aging
- There is a freeze in University staff and strange "protect
the catholic minority" law which made hard to hire staff
with rolling funds associated with NTI
- Northern Ireland unusual in that only 2 HEI's but Queens did
establish good relations with Southern Ireland - Trinity College
Dublin (2 megabits/sec link)
- Queens and Ulster mainly have local students
- This project did not exhibit the tremendous productivity of
Manchester and Edinburgh measured by course material produced
but had some very important features where the Belfast activity
was the leader
- Clear collaboration in using and producing courses
- Innovative identification of areas such as Law and Datamining
where HPCC interesting but novel
- Integration of material into curricula of several areas
- Helped 7 groups set up own (cluster) HPCC resources
- Belfast worked closely with University of Ulster (other Northern
Ireland University), Shorts (aircraft) and Strathclyde/Glasgow
- Some comments on course material:
- Used Shorts code in HPC for Aeronautic Engineering
Course
- Produced Networking module
- Added practical material to existing PVM courses from EPCC
and Manchester
- Datamining material was 2-3 hours and customizable - would
like to extend with future funding
- Note 49% undergraduates in courses were not trainers (London
removed such people from counts!)
- students can work in HPCTE for "industrial experience
credit" and there was a summer program
- There is a MSc in DAMTP entitled "computational science"
which is associated with Burke's effort in "Atomic and Molecular
Physics"
- Chemists use GAMESS and Gaussian but little interest in parallel
version - this is consistent with chemistry community worldwide
- PC expertise will be important in future due to dearth of
HPCC hardware funds
HPTCE: Training and Education Centre University of
Wales College of Cardiff
- Much smaller than other HPCTE's at 40,000 pounds per year
- Presented by John Martin, head of Computer Centre at Cardiff
and Rob Davis who was the one full time funded person on project
- Previous reviews were generally positive with some concern
by lack of broad outreach - especially to relevant regions of
Western England
- Understood that hard to connect to North Wales
which is quite hard to reach from Cardiff!
- This time Cardiff came over as Centre with least achievements
(albeit smallest in size) and Martin gave a poor presentation
where he presented little or no successes and answered questions
very vaguely
- Note Simon Lavington review rated (in controversial table)
Cardiff highly and I gather Martin found it much easier to handle
Lavington's controversial style than say Mark Cross of Greenwich
- Martin explicitly said Lavington review did not cause change
of direction by Centre
- Of main contributors, one Nelson Stephens of Cardiff Computer
Science has moved to Goldsmith's in London. Others are
- Alistair Nelson Cardiff physics (galaxy simulation
- note Rob Davies did his astrophysics Ph.D. on Barnes-Hut type
algorithms)
- Hywel Thomas, Engineering, Cardiff
- Phil Grant, Computer Science, Swansea
- The presenter John Martin
- HEI's such as Gwent college and Cardiff Institute of Higher
Education had no activity
- Rather unexciting article on use of video links in education
and conferencing
- Centre funded 50% of the Paramid system support person
- Swansea had Cray EL98 as well as Digital alpha and SGI clusters;
main machine at Cardiff was the Paramid
- Expertise of group is largely applications - especially in
Science and Engineering
- Industry is being established near Cardiff but not very sophisticated
yet.
- One interesting example is BT Lab
- Interested in applications and not vague computer science
issues
- Very unsuccessful outreach to other institutions
- Appears to have been very passive - write a letter and send
UNIX programmer Rob Davis to teach "HPF"
- This seems unlikely to persuade institutions
such as Gwent college to integrate computation into their curricula!
- Targets in England such as Bristol are too snotty to be interested
- Very positive about collaboration with other Centres
- Remember Liverpool negative about Cardiff (Belfast,
London) introductory Fortran 90 course
- Seemed to have taken a long time not developing parallel Oracle
course
- Canceled last general Oracle course due to lack
of interest
- BT research Centre interested in Oracle
- Note Paramid is idiosyncratic i860 - Transputer system which
has no native PVM
- This explains why no PVM course
- There is MPI on a Sun cluster as result of outreach
- Some courses were taught by Paramid expert who is funded 50%
by project
- There is a genetic algorithm grouping in University which
fits Centre goals
- Rob Davis expects to stay as Computer Science department UNIX
systems person
HPCTE: HPC Training and Education Centre University
of Edinburgh
- Large three year project totaling 420,000 pounds
- Presented by Trew and Simpson (in charge of HPTEC program)
- In general a rather arrogant inwardly looking program which
did not appear to listen to what others said
- However results were impressive and only negative
is that could have been even better!
- EPCC is 45 professional staff
- T3D support is 8 people, TRAC (Europe wide visitors) 6 people,
with HPTEC employing from 2.5 to 4 people
- Roughly one person manages summer undergraduate
research activity which supports 15 students
- 40% of EPCC funding is from Industry (due to focus on this
in Europe?)
- EPIC courseware allows Web page to spawn client-side
applications (editor, HPF etc.)
- Presenter and managers had not heard of Habanero
and were not aware of related work
- T3D is totally saturated
- Excellent set of courses and Technology Watch activities
- Latter include one on clustered computing
- Courses include computational science, computational chemistry
and HPC in Business
- Very excellent set of courses in both number and breadth
- Probably partially financed by T3D support and
other grants as otherwise development cost per course (20,000
pounds) seems too low (gotten by dividing about half of JISC grant
by number of courses)
- Maybe some of courses are short?
- HPF course sold to PGI and packaged with PGI's T3D offering
- Their measure of course downloads is for registered downloads
and so should be reliable
- Trew gave a diatribe against confused UK funding picture with
HEFCE and Research Councils not cooperating well and HEFCE funding
hardware and research councils training (on T3D)
- I complained that EPCC NTI final report made little or no
mention of curricula integration of material and Trew in impassioned
fashion, said this was his goal but was afraid to discuss in report
- Integration of Computational Science into courses
takes longer than training
- Trew wanted no negative statements in final report even though
he believed program was stopped just when it was gathering steam
- Note outside courses offered by EPCC were all to "elitist"
universities and all training type material ( HPF, PVM, MPI, Scientific
Visualization)
- It appears EPCC made no effort to tackle broad base of HEI's
although their course material seems better suited than that of
most HPCTE's!
- Computational Science and other "broad courses"
were only offered at Edinburgh
HPCTE: HPC Training and Education Centre SEL(London)-HPC
Consortium
- Large (220,000 pounds per year) 3 year project presented by
John Steel of Queen Mary's College (QMW) Compute Services and
Chris Walshaw from Greenwich
- Steel used to be in London Parallel Applications
Centre (LPAC which is now a company) and UK DAP service
- Walshaw is a postdoc in mesh generation in Mark
Cross's research group at Greenwich
- Note Steel was the only one of 6 PI's present
- this indicates state of project in itself
- Project finished in July 96
- Strengths of partnership were
- Greenwich - Engineering and Computational Science
- Kent - Embedded Systems
- LPAC/QMW - DAP
- ULCC (University of London Computing Centre)
which has extensive HPCC training material
- Gave a very negative presentation and their report was similar
in tone!
- They had done some good things which prompted me to tell them:
"You were a success but didn't realize it!"
- This Centre was severely criticized in both previous reviews
- that of Lavington and that of Davis
- Major problem was that they had more money than
anybody but developed fewer courses and taught fewer people than
say Manchester or Edinburgh
- Also complaints that some courses were too low
a level such as "Introduction to X Windows"
- Also seemingly poor outreach to other institutions
even though region has many universities
- They claim that lack of parallel equipment was
a serious problem as central sites only had aging uninteresting
machines
- They focused on training the trainers and claimed to have
excluded students from attendee counts
- Has this been done consistently in all HPCTE
projects ?
- Project funded staff and some hardware and maintenance
- 3 funded Ph.D. level people came from neural network, computer
science and math/parallel computing background
- This doesn't explain how they spent large budget!'
- Complained about failure of expected associated hardware funds
from DTI/Research Councils to materialize
- Note Darlington at Imperial College did get an interesting
Fujitsu Machine but split from proposing group!
- Did teach a special course at Social Science Summer School
at Essex with a captive audience
- This is University where Lavington comes from!
- Claimed that clustering itself not of interest - only the
more practical use of PVM and MPI on clusters
- Probably real issue was that they didn't proactively
help users install Clusters as Belfast did very effectively
- Good Application courses as illustrated by Walshaw from Greenwich
- Proud of two Web archives which go off hensa
and cover respectively HPCC articles (1800 available) and software
- Note hensa mirrors netlib
- The parallel software archive (which should be
coordinated with NHSE) needs ongoing funding to continue
- Uses CGI script with bibtex format for articles
- Franklin says there are IPR issues with article
archive
- Not very satisfactory "Crisis in HPCC" workshop
held on 11 September 1995 and typifies negative nature of Centre
- Chris Walshaw described excellent state of computational science
at Greenwich (used to be Thames Polytechnique) with material integrated
into curricula
- But JISC wanted such uptake outside the proposing
institutions
- Imperial College is too elitist but Birkbeck
is possible
- We suggested redoing annual report which did not even describe
successes at Greenwich
HPCTE: HPC Training and Education Centre University
of Manchester
- Large 450,000 pound project over 3 years
- Presented by Andrew Grant (energetic extrovert leader of effort)
and his boss W.T. Hewitt
- Probably in balance, the most successful NTI projects as measured
by large number of courses developed and pro-active successful
outreach
- Project finished in July 96 and staff re-assigned (not laid
off as in some other projects)
- Some left during project as always contracts
were at best one year (note that St. Andrews was only University
to clearly trust JISC and give three year contracts)
- NTI funded some 4-6 people and this gave critical mass
- Although Manchester supercomputer Centre unlikely to continue,
this effort is Centered in very healthy graphics unit (= Manchester
Visualization Centre)
- 23 course packages developed which each last from 1-4 days
- First courses were basic HPC training - HPF,
Visualization, Vector computing, MPI etc.
- Second set of courses were more application oriented
and computational science rather than HPC - Geography, JAVA/VRML,
Medical Visualization, bioinformatics etc.
- By agreement, courses such as "Introduction
to X-Windows" were not taught as most computer service departments
in UK can do this
- 55 courses (at Manchester) and 40 courses (outside Manchester)
offered
- The outside courses had substantial representation
from polytechniques (the new universities) and institutes of higher
education.
- Several offerings to Industry who paid for these
courses
- Claim that could not reap full benefits as integration into
curricula takes about 2-3 years after material produced
- Unlike Edinburgh, they did not hide goal of curricula
integration
- No evidence they had studied Web technology and courses outside
UK
- Worried about expected lack of central facilities at Manchester
- Do have 10 SGI systems on an ATM switch and emphasis
on clustering and so not clear if this is a valid concern
- Continuing under JTAP with cluster based activities (led by
Chester)
- Short term (1 year contracts) as JISC never gave multiyear
commitment and this made it hard to get best staff
- However success has led to several funded follow-on
projects
- Were worried about Intellectual Property issues and only distributed
Web material in inconvenient postscript
- Not clear to Manchester if this policy should
be preserved
- Manchester runs MIDAS which is hosting of JISC datasets
HPTCE: HPC Training and Education Centre University
of Southampton
- Tony Hey presented quickly as only 2 hours for this and shpf
effort
- Very poor final report by Pritchard
- Poor management by first Computer Centre (employing Mark Baker,
Ade Miller etc.) and then by Pritchard.
- These problems were not explored
- Not clear what the funds were spent on
- Training Centre did little outreach outside Southampton and
claimed this was due to lack of HPC facilities for outside users.
This does not seem plausible to Franklin (as Edinburgh and Manchester
had no problem) and as they don't list any, it is not obvious
Southampton gave any courses outside their University!
- Good set of general computational science activities (seminars
etc.) benefiting Southampton
- List of conventional HPC courses but not clear who developed
them.
- Novel curricula include "Ship Science" and "Parallel
Database" as well as Case Studies on how to get applications
on T3D and SP2 (latter from separate T3D training funds)
- There was under separate course development funding (TLTP)
a CDROM produced but not clear what is on it.
- Attribute 3 major new activities stemming from training activities
- Centre for Computational Engineering and Design
- Parallel Database and Transaction Processing
Research Group
- Digital Library Research Group ( Parallel Database
is HPC tie)
- So we see excellent justification of computational science
as theme but no testing outside Southampton
Use of Fortran90 and HPF
University of Liverpool
(Page 11)
- Dr. Steve Morgan, Computing Services (substantial F90 X3J3
experience)
- Quite expensive as undetermined amount of 'far' project
money also used
- Have no cost extension to October 97
- Did not have pro-active outreach as one of its deliverables
(as did HPCTE's) even though passive placement on Web has been
very successful.
- Note MANTEC course being used in local universities (as MANTEC
had outreach in mandate) whereas not clear that Liverpool course
was used even at Liverpool as computational science activities
(IASC) closed
- Mechanical Engineering does teach F90 in one
of their courses
- Delves head of math and computing and IASC (Institute for
advanced Scientific Computing) was HPF expert but left to set
up company NA software
- Chuck Koelbel may use F90 notes in DoD modernization lectures
- The HPCTE (SEL, Belfast, Cardiff) Fortran 90 basic course
inferior as developers did not have experience of Liverpool team.
The MANTEC conversion course is high quality
- Note that EPC (Edinburgh Portable compiler) F90 compiler for
SGI is much better than one SGI provides
Parallel Computing in Higher Education University
of Oxford
(Page 15)
- NOT reviewed
- Final report doesn't say what JISC project did but from Tom
Franklin, I gather that project was some training and outreach
in their BSP message passing software environment
- Small 25,000 pound total project that finished in 1994
- Of course, BSP - fairly or unfairly -- is viewed
as irrelevant by most in the USA!
- Other Education projects did not feature BSP
B)One (Simple) HPF Compiler
High Performance Fortran Translation - University
of Southampton
- Tony Hey
- Modest 50K (originally 25K) pound activity which builds on
research council EPSRC funded activity
- produced ADLIB compiler runtime library written in C++ by
Carpenter which is continued in development by Carpenter at Syracuse
University
- shpf is translator (which is of course typical of
most parallel "compilers") from subset HPF to F90 +
MPI
- Compiler work was performed by Merlin and is continued by
him at Vienna which is major European parallel compiler group
headed by Zima
- Southampton experience helped international HPF2 activity
as they had implemented (unlike initial commercial compilers)
the complete set of distribution mappings allowed in subset HPF
- shpf had inlining of locality tests and subscript conversion
which made in some cases faster than commercial compilers
- Technology used to port New Hampshire C* compiler to the IBM
SP2
- Not clear if any UK HEI is actually using shpf
- Hope was that it would be free HPF starter kit
for use by HPCTE target sites
- Possible to bundle with Edinburgh EPIC software especially
if EPIC modified to use server not client execution of HPF
- Does shpf run on PC Windows NT platforms?
- This was most successful of JISC funded software development
projects but not clear if even shpf was useful to JISC!
- The moral is to steer clear of software development
as significant work is outside scope for JISC
C)5 Projects in Research and Development
of Nifty Distributed and Clustered Computing Technology
Project AUTALS(Amoeba/UNIX Teaching and Learning System)
Southampton Institute of Higher Education
(Page 20)
- NOT reviewed
- Modest 23000 pound activity developing distributed Amoeba
based distributed computing environment
- No uptake because nobody wanted to use Amoeba give the easy
availability of more conventional systems
- Software was used internally according to Tom Franklin - presumably
to teach parallel computing
Network of Workstations - 'far' utilities --
University of Liverpool
(Page 10)
- Dr. Steve Morgan, Computing services
- Very modest effort as some (undetermined) amount of already
small effort actually spent on HPF/F90 education activity!
- far was very simple lightweight system which was designed
for reserving block of Sun workstations for a parallel run.
- far did not support migration or checkpointing (not
so important in Imperial Codine evaluation) or batch queues (critical
in most cases)
- far depends on NFS and automount on Sun
- far is easier to set up than Codine
- However not clear if evaluation of pluses and minuses of
far versus NQS/Codine etc. was produced
- far Installed at Durham but essentially no longer used
anywhere as Liverpool switched to an SGI 14 processor R10000 solution.
- Liverpool based (NA Software) HPF compiler installed as part
of this project. Delves set this company up.
- Probably should not have been funded!
Distributed Supercomputing and Scalable High Speed
Networking University of St. Andrews (page 24)
- Mike Livesey
- Misleading title should have been changed
- This system Warp has C and Tcl versions and consists
of roughly 5000 lines of code
- JISC project funded enhanced capability and the hardening
of software
- This is based on Time Warp event driven simulation concept
but is simplified and applied to cases where serialization of
activities is required but real time as opposed to virtual time
is used to schedule events
- Uses Lamport's clock mechanism
- Coordinates "atoms of computation" and not the full
(heavy weight) encapsulated objects supported in most Time Warp
implementation
- Could in principle be used as part of a database concurrency
control or distributed shared memory system but these applications
not explored in detail
- Warp supports both atoms of computation and objects
which are locked when allocated to an atom
- Has a nice example of a shared spreadsheet which is funded
(by UMI) for further work supporting use in Education with Dundee
- Web technology will do this much better but UMI
project could be a way to develop interesting educational material
which can be ported later on to more pervasive technology
- Only other example was Dining Philosophers simulation
- remember this (in a more attractive packaging was early Java
Applet!
- 2 people employed on stable 3 year contracts (thanks to University)
- One (the best) has found a good job with Cambridge
software company (Corba) but other is looking while finishing
contract
- New project will fund one, probably new, employee
- Proposal promised dissemination but none attempted
- Relatively clear that nobody would actually want
to use this technology as too specialized and needs to be put
into a higher level tool or application before would be useful
to community
- The concept of dissemination did not appear to
have occurred to Livesey!
- Reasonable research but inappropriate for JISC funding
- Not clear if this should have been obvious at
time of proposal
A Parallel System that Exploits the Spare Capacity
of Open Networked workstations University of Reading
(Page 16)
- Presented by Roger Loader from Computer Science department
- Very misleading title should be changed
- This work had three components
- Development of PVM with Meiko CS1 and system
management tools
- Outreach to Reading scientific users
- Development of a higher level message passing
interface to MPI or PVM which is where most of activity is now
- Although most of JISC funded activity has not much to do with
clusters and is so-so computer science research, the funding is
supporting a computer science group that seems a genuine advocate
of HPC at Reading
- Early PVM work seemed nearer to mission of NTI and developed
enhanced PVM system management tools which will be incorporated
into next release of PVM
- Activity stopped when Ph.D. student involved
was hired by Dongarra
- Also implemented PVM on Meiko C as part of this
early activity
- High level message passing environment supports groups and
message passing templates and produces augmented sends and receives
in PVM or MPI
- Use a Sun system rpcgen to compile
the high level code
- Loader did not seem familiar with related work such as groups
in MPI and template activities of Caltech, Oak Ridge etc.
- No follow on funding and unclear if software built can be
maintained - especially as doubtful if anybody wants as a deployed
product
- The original computing officer was a Ph.D. student hired by
Dongarra on basis of early PVM work
- Current implementation responsibility of individual with
computer science undergraduate degree who is employed full-time
and keen on Z which is new implementation language (groan!)
- Graham, Loader and Williams are faculty at Reading in a Numerical
Methods Group
- Note Very good computational expertise at local Meteorological
office
- Will have 12 processor SGI Challenge machine
- No formal Computational Science oriented curricula
- Loader is working with Physics and Chemistry applications
where computer science department is parallelizing codes
High Performance Computing using Spare Capacity on
a Network of Workstations University of Manchester (Page 12)
- Presented by W. T. Hewitt (head of Computer Graphics Unit
at Manchester)
- Claimed to develop a thread based distributed virtual shared
memory but essentially delivered nothing relevant
- did not finish the two exemplar applications that were promised
after Hey review
- Matrix algorithm run on prototype had terrible performance
with massive slowdown
- A project that should never have been funded. The PI (Kelly)
was clearly unqualified for job as hadn't done research for 10
years and unaware of research in area.
- Kelly also has reputation for not delivering and did not work
with one researcher (McLean) on project. Hewitt put in charge
half way through
- A braver JISC would have terminated project earlier but major
error was in initial evaluation
- Note Sumner (at time on JISC committee) was understood to
have "promised" to oversee but not Principal Investigator
of either proposal or implemented project and so this not very
relevant.
- Sumner left for other reasons as head of department
- Manchester has learnt some if not all lessons
D) 6 Projects in Deployment of Clustered
Computing Systems (Hardware/Software) Including Evaluation Development
and Proactive Support of Parallel Applications
A Dynamic Self-Configuring Distributed Computing Facility
University of Durham
(Page 1)
- NOT reviewed
- Used Condor but found that not robust enough
HPC-Alpha Workstation Farm - University of East Anglia
(Page 4)
- Dr. Andrew Boswell, Computing Centre, University of East Anglia
(Norwich)
- University has about 12,000 students of which over half are
part time.
- Have two good memos describing activity to be published in
AXIS - which is a UK journal sent nationally (only) to computer
service directors
- Franklin thinks Keith Woods (deputy director of Computer Centre)
, Boswell's manager, would have given a more upbeat presentation
- Boswell gave a very professional talk which described a declining
interest in parallel computing and a very uncertain future
- JISC funded two people half time - one Matt Beare has returned
to his Ph.D. work in Applied Mathematics which is Centreed on
parallel oceanography and one of major users of parallel cluster
- The half time person was Boswell who is 50% UNIX support and
provisionally his 50% cluster work is being carried by the University
(see later)
- The alpha workstation farm replaced central VAX's and is only
major central compute capability. (The University also has PC
clusters )
- In 1993, the Computer Board funded the 5 servers and 16 workstations
connected with gigaswitch using FDDI and Ethernet.
- Later 5 faster servers were added
- typically 200 users
- there is no charge currently (will
change?) to faculty for use of facility
- there are some 40 other UNIX systems around the
University in departments
- Computer Centre supports essentially no disk
space (each user gets 10 megabytes) and so disk storage is responsibility
of departments and information is accessed from cluster by NFS
- Note 16 workstation cluster was dedicated to parallel
batch (mainly PVM but will go to MPI as well) with no
sequential batch allowed when run overnight
- A little evaluation work on HPF (Digital compiler)
by oceanography
- have to use cluster in non dedicated (to parallel)
mode during day to develop any parallel code - somewhat surprising
set-up
- Used DQS (from Florida State) version 2
- Did not switch to version 3 when it came out
as had built support utilities around version 2 (typical problem
in in-house under funded activity)
- Liked feature that only ran one job at a time
(no paging!)
- Only DQS user in the UK
- 1 modification to actual DQS software to change algorithm
used to choose job to run
- DQS is Public Domain C code
- there were also a set of auxiliary programs enhancing
DQS and written in C or UNIX shell (surprising not in Perl)
- 4 fast servers run sequential batch at roughly 80% (in a busy
week) and growing use
- 16 original alphas run parallel batch at night with up to
50% use which is declining
- Number of parallel projects used to be 4 but
now 2
- UEA is part of major T3D project with oceanography
effort
- Sequential use features long MATLAB jobs doing satellite image
processing
- No classes using system
- Courses based on Liverpool/Manchester (Fortran90) and Edinburgh
(HPF)
- Will offer Fortran90 once per year
- Not enough demand for PVM or MPI
- Life after JISC: currently Boswell's 50% salary is continued
by UEA but internal computer Centre study is considering 3 possibilities
- Update and keep in computer Centre
- Update using some mix of University and user
research grant funds and let contributing users run - would like
university to contribute 50% as a "top-slice"
- Terminate
- Note 16 alphas only have about another year of useful life
- Note computer Centre has about 30 people and a budget of some
million pounds per year
- Leeds going to a rather similar model
HPC using Spare Capacity on a Network of Workstations
- University of Glasgow
(Page 6)
- Large 190,000 pound 2.5 year activity presented by Bryan Richards
and Bill McMillan of the department of aerospace engineering
- A clearly successful project which seemed very confused as
to what follow-ons would be attractive to funding agencies
- There is a natural collaboration between Glasgow and Edinburgh
where the latter's EPIC Web technology is modified to use server
not client for execution of HPF and programs. The Glasgow cluster
computing expertise can manage server use by multiple clients.
This allows several students to use a single backend compute facility
and learn parallel computing.
- Franklin indicated that JTAP follow-on was viewed as too inward
looking and that one with more outreach would have been more attractive
- Not clear why research grants cannot bear cost of maintaining
system established by JISC
- NTI funded 3 staff
- Initially there were 6 workstations - now more
- Looked at Codine but chose LSF from Platform computing as
Glasgow works extensively with industry and LSF had much better
industry use
- LSF more expensive than Codine at pound 500 per seat but cost
of LSF modest compared with industrial engineering codes
- LSF has better scheduling than Codine and Richards claims
that aerospace engineering at Imperial College were not enthusiastic
about their Codine installation
- Platform computing uses their site
- In contact with USA NSCP (Maryland, Penn etc.) which also
uses LSF
- Also talk to VHPC (Canada: Calgary, Toronto) and GIBN - the
G7 initiative
- Has a good Web page
- Group has being doing parallel computing for about 7 years
- Remember Pratt and Whitney discarded mainframes for such clustered
computing
- Part of a CFD for aerospace simulation effort on the T3D
- Here 5 organizations share some 2-4% of T3D -
Note this means Glasgow's main resource is cluster which has 25
nodes now (some better than T3D nodes)
- T3D is of course for large memory jobs
- Current use is 60% serial and 40% parallel with new users
tending to be serial
- Cluster is 80% utilized
- Large number (over 20) applications cited
- 120 people came to a half day seminar
- Courses (some with EPCC and other places): 5 cluster computing,
5 LSF, 4 PVM, 1 MPI, 1 HPF, 2 AVS
- They have a PGI site license
- Follow on proposals include
- JTAP (turned down as too Glasgow centric) involving
PC based computing with industry
- With Rover and Westlands to Europe
- SHEFC to support CFD instruction (as part of
UMI use of Metropolitan area network initiative) is funded with
Strathclyde
- Foresight is DTI program for University-Industry
partnerships leading to "Realizing Our Potential" (ROP
)
awards - can't remember status of this proposal
- LSF version 3 runs on Windows NT and is same price (500 pounds)
for both UNIX and NT operating systems (Note NT is on workstations1)
- Of the three JTI staff, two will lose jobs when project ends
and Glasgow trying to keep one on as systems management
Development of Parallel Finite Difference Time Domain
Application - Brunel University
(Not in final Report)
- NOT reviewed
- 50,000 pound activity developing a parallel computational
electromagnetic simulation on a network of Sun's (using PVM)
- Incorporated into parallel computing course given to electrical
engineering students.
Support of Distributed Batch Systems for UNIX - University
of Sheffield
(Page 18)
- Presented by Chris Cartledge of Compute Services but young
programmer Stuart Herbert did all the work -- he was a great hacker!
- Herbert now works for a local company and is
also a LINUX expert and doing the multiprocessor support for it.
- Sheffield had no interest in parallel computing but lots of
users
- NQS was originally 300,000 lines of C code -- GNQS as cleaned
up by Sheffield only has 150,000 lines.
- Will ask Herbert about a possible NT version of GNQS
- Note easy to put GNQS on a LINUX PC cluster
- Note that as GNQS Public Domain, it attracts modules from
people around the world! (as in WWW free software)
- NQS stores information in POSIX nodes (in disk filesystems)
and perhaps is sensitive to unsaid assumptions that information
preserved.
- Note Digital has better mission critical NT servers than Compaq!
- Sheffield now supports GNQS and it is used by 35 UK HEI's
including Exeter and Bristol
- Imperial College has harder job than Sheffield as strong departments
make Imperial job's harder. Sheffield can control them more easily!
So not surprising that Imperial needed Codine robustness. Similar
comments for Glasgow who chose LSF
Serial Work and Parallel Computing Techniques in a
Workstation Cluster Environment - Imperial College of Science,
Technology and Medicine
(Page 8)
- R.J.Hynds (presenter) and P.Belli describing modest short
26,000 pound activity finished 2 years ago
- ICSTM has a lot of Workstations supporting very large funded
activity
- Looked at NQS and Condor -- rejected as not robust enough
for production environment and hard to keep up to date with numerous
O/S changes
- Codine and LoadLeveller are commercial versions of Condor
(which itself built on NQS) with comparable capabilities
- Codine has nice graphical user interface but in fact not used
by users
- Initial tests showed Codine effective as reduced paging on
heavily loaded machines
- LoadLeveller looked at on 2 IBM machines but not continued
as price prohibitive
- Codine failed to negotiate CHEST (UK wide University bulk
rate) pricing
- Codine cost about $6000 year at ICSTM for unlimited number
of machines of all types. They have now doubled price
- Head of Computer Services R.J. Hynds was a Space Physicist
- Codine even better choice if parallel computing needed
- This was 9 month project in 94-95 and after initial Southampton
conference had no contact with National scene. Note JISC NTI did
not have dissemination as part of project
- They believe conclusions on comparison of systems are still
valid.