Next: References Up: Application of Massively Parallel Previous: Software

The IBM SP-1 System at ANL

The IBM SP-1 is a new parallel computer designed to make the best use of IBM's powerful RISC technology combined with a high-speed switch. Special features of this SP-1 are:

The hardware configuration of the IBM SP-1 at Argonne National Labortroy (or the Argonne SP-1) consists of 128 nodes and two compile servers. Each node is essentially an RS/6000 model 370. This model has a 62.5 MHz clock speed, a 32-KB data cache, and a 32-KB instruction cache. Key features of this system are:
  1. 128 Mbytes of memory per node
  2. GBytes local disk on each node (400 Mbytes available to users, the rest for paging)
  3. Full Unix on each node (IBM AIX 3.2.4)
  4. Each node accessible by Ethernet from the internet
  5. High-performance Omega switch (50 sec latency, 8.5 Mbytes/sec bandwidth when using EUI-H)
In addition, the ANL SP-1 will soon have a large high-performance fill system (220 Gbytes of RAID disk and a 6-T Byte automated tape library).

The peak performance of each node is 125 Mflops (1674-bit floating point add and 1 floating point multiply in each clock cycle). In practice, each node can achieve between 15 and 70 Mflops on Fortran code. Higher performance can be reached by using BLACS or ESSL routines.

Each SP-1 node is running a full Unix; most of the usual Unix tools are available. Users may log directly into any SP-1 node using telnet, rlogin, or rsh. The software of the ANL SP-1 includes multiple parallel programming environments, IBM's ESSL library, and performance debugging tools.

Communication between nodes can be carried out by many ways which one can choose. Most users will not use these directly; rather they will use one of the portable programming libraries. However, as the programming libraries use these transport layers to actually accomplish the communication, it is important to understand them so that the proper transport layer can be chosen.

The available transport layers are Ethernet, IP, Switch/IP, and EUI-H.Only the first two support multiple parallel jobs on the same node. Both versions of EUI can only run one process per node. In addition, EUI-H is incompatible with EUI and Switch/IP on the same nodes (though the SP-1 can be configured so that EUI-H runs on some nodes and EUI and Switch/IP run on the others; this is a common daytime configuration at ANL.

Using the Ethernet transport layer with all nodes connected by Ethernet, the SP-1 looks just like a collection of workstations. This method does suffer from the same drawbacks as any Ethernet-connected system: high latency (about 1 sec) and low bandwidth (1 MByte/sec is shared among all processors). The IP transport layer provides enhanced performance to code written using Unix sockets for interprocessor communication.

EUI is IBM's message-passing interface to the high-performance switch. There are two versions: one that works with the Parallel Operating Environment (POE) and one that does not. EUI refers to the POE version. POE supports a parallel symbolic debugger (xpdbx) and a performance visualization tool (vt). However, its performance is inferior to EUI-H (latency is about 405 sec). In addition, at most 64 nodes are available to EUI. EUI-H is an experimental, low-overhead implementation of the EUI interface. It does not support either xpdbx or vt. It is difficult to provide standard input to EUI-H programs. Also, it is not possible to produce gprof-style profiling information from EUI-H programs. All 128 nodes may be accessed using EUI-H when the machine is configured for that. Note that this version of EUI-H is the version of IBM Research; it contains only the Fortran bindings for EUI.

IP and EUI applications may share the switch; multiple IP applications may share both nodes and the switch. IP and EUI run under the ``Parallel Operating Environment,'' or POE. POE includes a number of tools, such as a parallel debugger and ParaGraph-like visualization tool (vt). These two transport layers share a common interface to the switch known as lightspeed.

Parallel Libraries:



Next: References Up: Application of Massively Parallel Previous: Software


xshen@
Sat Dec 3 17:51:03 EST 1994