Subject: Resent-Date: Sun, 23 Jan 2000 16:29:19 -0500 Resent-From: Geoffrey Fox <gcf@npac.syr.edu> Resent-To: p_gcf@npac.syr.edu Date: Sun, 23 Jan 2000 16:19:16 -0500 (EST) From: Gordon Erlebacher <erlebach@scri.fsu.edu> To: gcf@npac.syr.edu

1  Summary of Requested Experimental Facilities

In this section we list the facilities requested under this grant. They are separated into three categories that clearly map into three different classes of research and educational activities. They are the Scalable Cluster Machine (SCM), the Experimental Parallel Machine, and the Information and Pervasive Infrastructure. In the following section, we first describe the infrastructure we are requesting, followed by a five year time schedule over which the equipment will be acquired.

1.0.1  Scalable Cluster Machine (SCM)

A very high performance effective computer cluster will be purchased with the following characteristics:

  1. 32 to 48 processors,
  2. 1/2 Gbyte of memory per processor,

  3. Peak speed of 1-3 Gflops per cpu,

  4. An interconnect bandwidth of at least 1 Gigabit/sec.

  5. An L2 cache of 1 to 4 Mbytes for each processor.

  6. A 1/2 Terabyte of disk.

To illustrate the above requirements, we briefly describe the current and future technologies offered by two vendors well known for their cluster technologies (Compaq and IBM), and SGI, a newcomer to the field of clusters. We contrast the offerings of these three vendors with a hand-crafted cluster built from off the shelf components.

Compaq: 

IBM: 

SGI: 

1.0.2  Experimental Parallel Machine (EPM)

A machine aimed at algorithm development will be purchased and upgraded within 24 months. This system will in all likelihood be a distributed system of 16 nodes with at least 4 processors per node and a peak performance of ???. The most probably candidate vendors will be Compaq, IBM, or SGI, given their track record, proposed chip architectures over the next several years, and low cost. The system will have the following characteristics:

  1. 16 to 24 nodes
  2. 1/2 Gbyte of memory per processor

  3. Peak speed of 1-3 Gflops per cpu,

  4. a switching infrastructure with an aggregate bandwidth of at least 2 Gbytes/sec.

  5. An L2 cache of at least 4 Mbytes per processor.

  6. A 1/2 Terabyte of disk.

The above configuration will be purchased in year 2, with an upgrade that consists of replacement of the chips by the next generation processors in year 3. We anticipate a factor 5 improvement in sustained floating point performance over this period, regardless of the architecture chosen.

We have found that although Compaq is the leader in absolute peak performance, that both SGI and IBM had technologies on the horizons which make the clear leader uncertain at this time. We have compared configurations from Compaq, IBM, and SGI to establish that within a 30 percent range, the cost/performance ratio is the same.

Compaq: 

IBM: 

SGI: 

1.1  Information and Pervasive Infrastructure

Pervasive infrastructure refers to equipment that does not fit in the above two categories. This equipment will support mostly the core technologies and the educational components of this proposal. They are classified into three sections: visualization and user interfaces, mobile support, and information infrastructure. Technology is evolving at a very fast pace and it is difficult to predict with any reliability the availability, quality, and pricing of almost all hardware devices. All that is certain, based on current trends, is that for a given technology, prices will continue to plummet downwards. Thus, we will purchase equipment in this category distributed evenly over the 5 year period. The amount spent in each category will remain approximately constant over the 5 year period.

1.1.1  Visualization and user interfaces (VUI)

We will purchase two raster managers for our SGI Onyx 2 visualization machine to feed our second pipe (year 2). These will support video broadcast for dissemination of presentations, workshops and tutorials to the desk. Several high quality digital video recorders, digital cameras, scanners, MPEG encoders/decoders are also necessary to support this activity. New input devices will be purchased and integrated into the visualization research. These consist of haptic devices from Sensable Technology, head trackers, and head mounted displays. We will track the technology and upgrade our components at least once every two years.

More detail:

1.1.2  Mobile Support (MS)

To support research in education and in the core technologies, we require the purchase of equipment capable of communicating between computers using broadband. We will purchase PCI cards for hand held palm pilots (or equivalent PDAs) and laptop computers that enable researchers to interact remotely with their simulations, debuggers, problem solving environments, etc. We anticipate that the PCI cards will become available within the next twelve months, so the first purchases will occur in year 2. We expect continued improvement in transmission quality, broadcast distance, protocols, etc. To keep abreast of this anticipated evolution, we will purchase equipment upgrades every two years.

Convergence of laptops, palmtops, pdas, cell phones into a single integrated system (GIVE AN EXAMPLE). Java Virtual Machines will play an integral role in development applications for these devices.

Recently, Transmeta has announced a 700 Mhz Crusoe chip that has a decreased transistor count, and thus lower power requirements. In addition, novel software allows the power consumption to vary according to the actual use of the processor, keyboard, etc. The initial impact on handheld systems appears to be substantial, allowing much longer usage without the need to recharge. This development underscores the need to avoid premature decisions regarding a particular path of equipment acquisition to avoid getting locked into a particular and outdated technology.

1.1.3  Information Infrastructure (II)

We will purchase an Oracle database in year 1 to support storage of simulation data, metadata, input files, etc. Upgrades to the database and support tools will be acquired in years 2 and beyond.

Fox brings to FSU (from Syracuse) several Oracle Databases. Theses will be upgraded in year 3 with enhancements to support the new technologies used for education and research (video on demand, xml interfaces, objects, cube structures, etc.). He also brings several sun servers to support research in information technology and education. After they are integrated into our current environment, we will evaluate the need to add new hardware support in the form of more powerful servers.

1.2  Storage

Secondary and tertiary systems are requested to store the data in the short and medium term. To accommodate the high frequency of file storage and the large files produced by the simulations, we will install a one Terabyte disk disk subsystem built from four IBM 7133 Serial Disk System advanced model D40. Three D40's will have 16 18.2 Gigabytes disks each, while the fourth will have 8 18.2 Gigabyte disk drives. Each D40 is rack-mountable. These disks have a sustained transfer rate to memory of 35Megabyte/sec, and are capable of achieving upwards of 80 Megabytes/sec via a loop configuration.

Although there is a need for an archival storage system, we will leverage the 100+ Terabyte system purchased with the large Teraflop machine (TFM). We anticipate the system will be in place by October 2000 with 60 Terabytes of initial storage capacity. Additional storage will added within the following 18 months simultaneously with a major system upgrade.

The archival system (of which IBM's is typical) has a sustained rate between 7 and 12 Megabytes/sec from the disk subsystem. A one Gigabyte file can therefore be retrieved in its entirety in less than 2 minutes which is sufficient for batch postprocessing of many time-dependent datasets over a period of hours.

1.3  Software

The requested software are the compilers, profilers, and performance enhancement tools necessary to maximize the efficiency of the application programs. Compilers for C/C++/Fortran and Java are requested. The cost of software is dominated by the parallel environment which are low level OS routines to manage the parallel system. Finally, Automatic Data Storage Management tools (ADSM) are necessary to provide a high degree of flexibility and automation in the area of data storage and retrieval. This software can also be used to backup both the SP system as well as other computers on the network to the archival tape system. This class of software is available from all vendors. The operating system on the SCM will be Linux which gives access to a large variety of open source software. We anticipate collaborative agreements with computer vendors to help enhance their software in the areas of high performance code tuning, and Gigabit (and beyond) networking.


File translated from TEX by TTH, version 2.33.
On 24 Jan 2000, 08:01.