Subject:
Resent-Date: Sun, 23 Jan 2000 16:29:19 -0500
Resent-From: Geoffrey Fox <gcf@npac.syr.edu>
Resent-To: p_gcf@npac.syr.edu
Date: Sun, 23 Jan 2000 16:19:16 -0500 (EST)
From: Gordon Erlebacher <erlebach@scri.fsu.edu>
To: gcf@npac.syr.edu
1 Summary of Requested Experimental Facilities
In this section we list the facilities requested under this
grant. They are separated into three categories that clearly
map into three different classes of research and educational
activities. They are the Scalable Cluster Machine (SCM),
the Experimental Parallel Machine, and the Information and
Pervasive Infrastructure.
In the following section, we first describe the infrastructure
we are requesting, followed by a five year time schedule over which
the equipment will be acquired.
1.0.1 Scalable Cluster Machine (SCM)
A very high performance effective computer cluster will be purchased
with the following characteristics:
- 32 to 48 processors,
-
1/2 Gbyte of memory per processor,
-
Peak speed of 1-3 Gflops per cpu,
-
An interconnect bandwidth of at least 1 Gigabit/sec.
-
An L2 cache of 1 to 4 Mbytes for each processor.
-
A 1/2 Terabyte of disk.
To illustrate the above requirements, we briefly describe
the current and future technologies offered by two vendors
well known for their cluster technologies (Compaq and IBM),
and SGI, a newcomer to the field of clusters. We contrast
the offerings of these three vendors with a hand-crafted cluster
built from off the shelf components.
Compaq:
IBM:
SGI:
1.0.2 Experimental Parallel Machine (EPM)
A machine aimed at algorithm development will be purchased and
upgraded within 24 months. This system will in all likelihood
be a distributed system of 16 nodes with at least 4 processors per node
and a peak performance of ???. The most probably candidate vendors
will be Compaq, IBM, or SGI, given their track record, proposed chip
architectures over the next several years, and low cost.
The system will have the following characteristics:
- 16 to 24 nodes
-
1/2 Gbyte of memory per processor
-
Peak speed of 1-3 Gflops per cpu,
-
a switching infrastructure with an aggregate bandwidth of at least
2 Gbytes/sec.
-
An L2 cache of at least 4 Mbytes per processor.
-
A 1/2 Terabyte of disk.
The above configuration will be purchased in year 2, with an upgrade
that consists of replacement of the chips by the next generation processors
in year 3. We anticipate a factor 5 improvement in sustained floating
point performance over this period, regardless of the architecture chosen.
We have found that although Compaq is the leader in absolute
peak performance, that both SGI and IBM had technologies on the horizons
which make the clear leader uncertain at this time. We have compared
configurations from Compaq, IBM, and SGI to establish that within a
30 percent range, the cost/performance ratio is the same.
Compaq:
IBM:
SGI:
1.1 Information and Pervasive Infrastructure
Pervasive infrastructure refers to equipment that does not fit in the
above two categories. This equipment will support mostly the core
technologies and the educational components of this proposal.
They are classified into three sections: visualization and user interfaces,
mobile support, and information infrastructure. Technology is
evolving at a very fast pace and it is difficult to predict with any
reliability the availability, quality, and pricing of almost
all hardware devices. All that is certain, based on current trends, is
that for a given technology, prices will continue to plummet downwards.
Thus, we will purchase equipment in this category distributed evenly
over the 5 year period. The amount spent in each category will remain
approximately constant over the 5 year period.
1.1.1 Visualization and user interfaces (VUI)
We will purchase two raster managers for our SGI Onyx 2
visualization machine to feed our second pipe (year 2).
These will support video broadcast for dissemination of presentations,
workshops and tutorials to the desk. Several high quality digital
video recorders, digital cameras, scanners, MPEG encoders/decoders
are also necessary to support this activity. New input devices will
be purchased and integrated into the visualization research. These consist
of haptic devices from Sensable Technology, head trackers, and head mounted
displays. We will track the technology and upgrade our components at
least once every two years.
More detail:
1.1.2 Mobile Support (MS)
To support research in education and in the core technologies, we
require the purchase of equipment capable of communicating between
computers using broadband. We will purchase PCI cards for hand held palm
pilots (or equivalent PDAs) and laptop computers that enable researchers
to interact remotely with their simulations, debuggers, problem
solving environments, etc. We anticipate that the PCI cards will
become available within the next twelve months, so the first purchases
will occur in year 2. We expect continued improvement in transmission
quality, broadcast distance, protocols, etc. To keep abreast of this
anticipated evolution, we will purchase equipment upgrades every two
years.
Convergence of laptops, palmtops, pdas, cell phones into a single
integrated system (GIVE AN EXAMPLE). Java Virtual Machines will play
an integral role in development applications for these devices.
Recently, Transmeta has announced a 700 Mhz Crusoe chip that has
a decreased transistor count, and thus lower power requirements.
In addition, novel software allows the power consumption to vary
according to the actual use of the processor, keyboard, etc. The
initial impact on handheld systems appears to be substantial, allowing
much longer usage without the need to recharge. This development underscores
the need to avoid premature decisions regarding a particular path
of equipment acquisition to avoid getting locked into a particular and
outdated technology.
1.1.3 Information Infrastructure (II)
We will purchase an Oracle database in year 1 to support
storage of simulation data, metadata, input files, etc.
Upgrades to the database and support tools will be acquired
in years 2 and beyond.
Fox brings to FSU (from Syracuse) several Oracle Databases. Theses
will be upgraded in year 3 with enhancements to support the new technologies
used for education and research (video on demand, xml interfaces, objects,
cube structures, etc.). He also brings several sun servers to support
research in information technology and education. After they are integrated
into our current environment, we will evaluate the need to add new hardware
support in the form of more powerful servers.
1.2 Storage
Secondary and tertiary systems are requested to store the data
in the short and medium term. To accommodate the high frequency
of file storage and the large files produced by the simulations,
we will install a one Terabyte disk disk subsystem built from
four IBM 7133 Serial Disk System advanced model D40. Three
D40's will have 16 18.2 Gigabytes disks each, while the fourth
will have 8 18.2 Gigabyte disk drives. Each D40 is rack-mountable.
These disks have a sustained transfer rate to memory of 35Megabyte/sec,
and are capable of achieving upwards of 80 Megabytes/sec via a loop
configuration.
Although there is a need for an archival storage system, we
will leverage the 100+ Terabyte system purchased with the large
Teraflop machine (TFM). We anticipate the system will be in place
by October 2000 with 60 Terabytes of initial storage capacity. Additional
storage will added within the following 18 months simultaneously with
a major system upgrade.
The archival system (of which IBM's is typical)
has a sustained rate between 7 and 12 Megabytes/sec from the disk
subsystem. A one Gigabyte file can therefore be retrieved in its
entirety in less than 2 minutes which is sufficient for batch
postprocessing of many time-dependent datasets over a
period of hours.
1.3 Software
The requested software are the compilers, profilers, and
performance enhancement tools
necessary to maximize the efficiency of the application programs. Compilers
for C/C++/Fortran and Java are requested. The cost of software is dominated
by the parallel environment which are low level OS routines to manage the
parallel system. Finally, Automatic Data Storage Management tools (ADSM)
are necessary to provide a high degree of flexibility and automation in
the area of data storage and retrieval. This software can also be used to
backup both the SP system as well as other computers on the network to the
archival tape system. This class of software is available from all
vendors. The operating system on the SCM will be Linux which gives access
to a large variety of open source software. We anticipate collaborative
agreements with computer vendors to help enhance their software in
the areas of high performance code tuning, and Gigabit (and beyond) networking.
File translated from TEX by TTH, version 2.33.
On 24 Jan 2000, 08:01.