Next: The CM-5 at Up: Application of Massively Parallel Previous: Overview of Three

The CM-5 System Overview

This section gives a brief overview of the CM-5 system. A CM-5 system contains four major parts:

The CM-5 is available in configurations of 32 to 1024 processing nodes, each node being a RISC microprocessor with optional attached vector units. Current implementations use a SPARC microprocessor with 32M to 128M bytes of memory and four optional vector units. Each processing node operates at 33 MHz and is rated at 22 Mips and 5 MFlops. When equipped with vector units, each node is rated at 128 Mips (peak) and 128 MFlops (peak).

The processing nodes can be divided into several partitions and each partition contains a control processor. The control processors run a UNIX-based operating system. They can download programs into the processing nodes in their partition, control the program execution on the processing nodes, and handle sequential I/O for processing nodes. The control processors also participate in computation when the program is executed in the host-node programming mode.

The I/O nodes, which also connect to the network, handle high performance parallel I/O for processing nodes. Besides ordinary NFS file systems, the I/O nodes support all HIPPI (High Performance Parallel Interface), SDA (Scalable Disk Array) and VME interfaces, thus allowing connections to a wide range of computers and I/O devices. A CMIO interface supports mass storage devices such as the Data Vault. The peak throughput is from 20 to 200 Mbyte/sec.

The CM-5 internal networks (Figure C.1) include two components, a data network and a control network. The data network contains two channels, a left data network and a right data network. The control network contains three different networks, a broadcast network, a combine network, and a global network. The CM-5 has a separate diagnostics network, which is visible only to the system administrator, to detect and isolate errors throughout the system.

The data network, which is a 32-bit-wide data path, provides high performance data communications among all system components. The network has a peak bandwidth of 5M bytes/sec for node-to-node communication. However, if the destination is within the same cluster of 4 or 16, it can give a peak bandwidth of 20M bytes/sec and 10M bytes/sec, respectively. The topology of the CM-5 data network is a fat tree and the communication mechanism is worm-hole. Figure C.2 shows the data network with 16 nodes. The fat tree topology allows more than one path to be used for data transmission. The data path will not be blocked unless all links are occupied. Data packets can choose any link that connects to its destination during data transmission, which reduces the possibility of link contentions with other data packets. Each packet is a size of 5 words long, which means that a large message must be divided into many packets for transmission. Data packets from other nodes can be sent via the same link interactively. Data packets from different nodes can share the same link and thus will not be blocked by the large message. Link contention will not be a problem on CM-5 unless the amount of data has exceeded the capacity of the data network. Since data packets may be sent via different links, the receiving order may be different from the sending order. A sequential mechanism is needed when transmitting data larger than 5 words.

The control network handles operations requiring the cooperation of many or all processors. The broadcast network broadcasts messages to all nodes. The combine network supports parallel prefix, reduction operations, and network-done tests; it accelerates the cooperative mathematical and logic operations. The global network handles the synchronization for the nodes of CM-5; it supports both synchronous and asynchronous interfaces to perform synchronization among processing nodes.

CMMD, which provides a full range of message-passing facilities, is the standard communication library of CM-5. CMMD provides both host-node and hostless programming modes. The front-end control processor acts as the host when the host-node programming mode is used.

CMMD provides functions for

I/O operations can be done by each node independently or by all nodes working globally.

The CM-5 also provides active message CMAML, which is an asynchronous communication mechanism with the following underlying scheme: Each message header contains the address of a user-level handler that is executed at the receiving node upon message arrival with the message body as argument(s).



Next: The CM-5 at Up: Application of Massively Parallel Previous: Overview of Three


xshen@
Sat Dec 3 17:51:03 EST 1994