Next: References Up: No Title Previous: Software for HPDC

Glossary

Applets

An application interface where referencing (perhaps by a mouse click) a remote application as a hyperlink to a server causes it to be downloaded and run on the client.

Asynchronous Transfer Mode (ATM)

ATM is expected to be the primary networking technology for the NII to support multimedia communications. ATM has fixed length 53 byte messages (cells) and can run over any media with the cells asynchronously transmitted. Typically, ATM is associated with Synchronous Optical Network (SONET) optical fiber digital networks running at rates of OC-1 (51.84 megabits/ sec), OC-3 (155.52 megabits/sec) to OC-48 (2,488.32 megabits/sec).

Bandwidth

The communications capacity (measured in bits per second) of a transmission line or of a specific path through the network.

Clustered Computing

A commonly found computing environment consists of many workstations connected together by a local area network. The workstations, which have become increasingly powerful over the years, can together, be viewed as a significant computing resource. This resource is commonly known as cluster of workstations, and can be generalized to a heterogeneous collection of machines with arbitrary architecture.

Command and Control

This refers to the computer support decision making environment used by military commanders and intelligence officers. It is described in Section 2.2.

COW or NOW

Clusters of Workstations (COW) are a particular HPDC environment where often one will use optimized network links and interfaces to achieve high performance. A COW---if homogeneous---is particularly close to a classic homogeneous MPP built with the same CPU chipsets as workstations. Proponents of COW's will claim that use of commodity workstation nodes allow them to track technology better than MPP's. MPP proponents note that their optimized designs deliver higher performance, which outweighs the increased cost of low-volume designs, and effective performance loss due to later (maybe only months) adoption of a given technology by the MPP compared to commodity markets.

Network of Workstations (NOW at http://now.cs.berkeley.edu/) and SHRIMP (Scalable High-Performance Really Inexpensive Multi Processor at http:/www.cs.princeton.edu/Shrimp/) are well-known research projects developing COWs.

Data Locality and Caching

A key to sequential parallel and distributed computing is data locality. This concept involves minimizing ``distance'' between processor and data. In sequential computing, this implies ``caching'' data in fast memory and arranging computation to minimize access to data not in cache. In parallel and distributed computing, one uses migration and replication to minimize time a given node spends accessing data stored on another node.

Data Mining

This describes the search and extraction of unexpected information from large databases. In a database of credit card transactions, conventional database search will generate monthly statements for each customer. Data mining will discover using ingenious algorithms, a linked set of records corresponding to fraudulent activity.

Data Parallelism

A model of parallel or distributed computing in which a single operation can be applied to cell elements of a data structure simultaneously. Often, these structures are arrays.

Data Fusion

A common command and control approach where the disparate sources of information available to a military or civilian commander or planner, are integrated (or fused) together. Often, a GIS is used as the underlying environment.

Distributed Computing

The use of networked heterogeneous computers to solve a single problem. The nodes (individual computers) are typically loosely coupled.

Distributed Computing Environment

The OSF Distributed Computing Environment (DCE) is a comprehensive, integrated set of services that supports the development, use and maintenance of distributed applications. It provides a uniform set of services, anywhere in the network, enabling applications to utilize the power of a heterogeneous network of computers. http://www.osf.org/dce/

Distributed Memory

A computer architecture in which the memory of the nodes is distributed as separate units. Distributed memory hardware can support either a distributed memory programming model, such as message passing or a shared memory programming model.

Distributed Queuing System (DQS)

An experimental UNIX based queuing system being developed at the Supercomputer Computations Research Institute (SCRI) at The Florida State University. DQS is designed as a management tool to aid in computational resource distribution across a network, and provides architecture transparency for both users and administrators across a heterogeneous environment. http://www.scri.fsu.edu/ pasko/dqs.html

Embarrassingly Parallel

A class of problems that can be broken up into parts, which can be executed essentially independently on a parallel or distributed computer.

Geographical Information System (GIS)

A user interface where information is displayed at locations on a digital map. Typically, this involves several possible overlays with different types of information. Functions, such as image processing and planning (such as shortest path) can be invoked.

Gigabit

A measure of network performance---one Gigabit/sec is a bandwidth of

bits per second.

Gigaflop

A measure of computer performance---one Gigaflop is

floating point operations per second.

Global Information Infrastructure (GII)

The GII is the natural world-wide extension of the NII with comparable exciting vision and uncertain vague definition.

High-Performance Computing and Communications (HPCC)

Refers generically to the federal initiatives, and associated projects and technologies that encompass parallel computing, HPDC, and the NII.

High-Performance Distributed Computing (HPDC)

The use of distributed networked computers to achieve high performance on a single problem, i.e., the computers are coordinated and synchronized to achieve a common goal.

HPF

A language specification published in 1993 by experts in compiler writing and parallel computation, the aim of which is to define a set of directives which will allow a Fortran 90 program to run efficiently on a distributed memory machine. At the time of writing, many hardware vendors have expressed interests, a few have preliminary compilers, and a few independent compiler producers also have early releases. If successful, HPF would mean data parallel programs can be written portably for various multiprocessor platforms.

Hyperlink

The user level mechanism (remote address specified in a HTML or VRML object) by which remote services are accessed by Web Clients or Servers.

Hypertext Markup Language (HTML)

A syntax for describing documents to be displayed on the World Wide Web.

Hypertext Transport Protocol (HTTP)

The protocol used in the communication Web Servers and clients.

InfoVISiON

Information, Video, Imagery, and Simulation ON demand is scenario described in Section 3 where multimedia servers deliver multimedia information to clients on demand---at the click of the user's mouse.

Integrated Service Data Network (ISDN)

A digital multimedia service standard with a performance of typically 128 kilobits/sec, but with possibility of higher performance. ISDN can be implemented using existing telephone ( POTS) wiring, but does not have the necessary performance of 1--20 megabits/second needed for full screen TV display at either VHS or high definition TV (HDTV) resolution. Digital video can be usefully sent with ISDN by using quarter screen resolution and/or lower (than 30 per second) frame rate.

Internet

A complex set of interlinked national and global networks using the IP messaging protocol, and transferring data, electronic mail, and World Wide Web. In 1995, some 20 million people could access Internet---typically by POTS. The Internet has some high-speed links, but the majority of transmissions achieve (1995) bandwidths of at best 100 kilobytes/sec. the Internet could be used as the network to support a metacomputer, but the limited bandwidth indicates that HPDC could only be achieved for embarrassingly parallel problems.

Internet Protocol (IP)

The network-layer communication protocol used in the DARPA Internet. IP is responsible for host-to-host addressing and routing, packet forwarding, and packet fragmentation and reassembly.

Java

A distributed computing language ( Web Technology) developed by Sun, which is based on C++ but supports Applets.

Latency

The time taken to service a request or deliver a message which is independent of the size or nature of the operation. The latency of a message passing system is the minimum time to deliver a message, even one of zero length that does not have to leave the source processor. The latency of a file system is the time required to decode and execute a null operation.

LAN, MAN, WAN

Local, Metropolitan, and Wide Area Networks can be made from any or many of the different physical network media, and run the different protocols. LAN's are typically confined to departments (less than a kilometer), MAN's to distances of order 10 kilometers, and WAN's can extend worldwide.

Loose and Tight Coupling

Here, coupling refers to linking of computers in a network. Tight refers to low latency, high bandwidth; loose to high latency and/or low bandwidths. There is no clear dividing line between ``loose'' or ``tight.''

Massively Parallel Processing (MPP)

The strict definition of MPP is a machine with many interconnected processors, where `many' is dependent on the state of the art. Currently, the majority of high-end machines have fewer than 256 processors. A more practical definition of an MPP is a machine whose architecture is capable of having many processors---that is, it is scalable. In particular, machines with a distributed memory design (in comparison with shared memory designs) are usually synonymous with MPPs since they are not limited to a certain number of processors. In this sense, ``many'' is a number larger than the current largest number of processors in a shared-memory machine.

Megabit

A measure of network performance---one Megabit/sec is a bandwidth of

bits per second. Note eight bits represent one character---called a byte.

Message Passing

A style of inter-process communication in which processes send discrete messages to one another. Some computer architectures are called message passing architectures because they support this model in hardware, although message passing has often been used to construct operating systems and network software for sequential processors, shared memory, and distributed computers.

Message Passing Interface (MPI)

The parallel programming community recently organized an effort to standardize the communication subroutine libraries used for programming on massively parallel computers such as Intel's Paragon, Cray's T3D, as well as networks of workstations. MPI not only unifies within a common framework programs written in a variety of exiting (and currently incompatible) parallel languages but allows for future portability of programs between machines.

Metacomputer

This term describes a collection of heterogeneous computers networked by a high-speed wide area network. Such an environment would recognize the strengths of each machine in the Metacomputer, and use it accordingly to efficiently solve so-called Metaproblems. The World Wide Web has the potential to be a physical realization of a Metacomputer.

Metaproblem

This term describes a class of problem which is outside the scope of a single computer architectures, but is instead best run on a Metacomputer with many disparate designs. These problems consist of many constituent subproblems. An example is the design and manufacture of a modern aircraft, which presents problems in geometry grid generation, fluid flow, acoustics, structural analysis, operational research, visualization, and database management. The Metacomputer for such a Metaproblem would be networked workstations, array processors, vector supercomputers, massively parallel processors, and visualization engines.

Multimedia Server or Client

Multimedia refers to information (digital data) with different modalities, including text, images, video, and computer generated simulation. Servers dispense this data, and clients receive it. Some form of browsing, or searching, establishes which data is to be transferred. See also InfoVISiON.

Multiple-Instruction/Multiple-Data (MIMD)

A parallel computer architecture where the nodes have separate instruction streams that can address separate memory locations on each clock cycle. All HPDC systems of interest are MIMD when viewed as a metacomputer, although the nodes of this metacomputer could have SIMD architectures.

Multipurpose Internet Mail Extension (MIME)

The format used in sending multimedia messages between Web Clients and Servers that is borrowed from that defined for electronic mail.

National Information Infrastructure (NII)

The collection of ATM, cable, ISDN, POTS, satellite, and wireless networks connecting the collection of

computers that will be deployed across the U.S.A. as set-top boxes, PCs, workstations, and MPPs in the future.

The NII can be viewed as just the network infrastructure or the full collection of networks, computers, and overlayed software services. The Internet and World Wide Web are a prototype of the NII.

Network

A physical communication medium. A network may consist of one or more buses, a switch, or the links joining processors in a multicomputer.

Node

A parallel or distributed system is made of a bunch of nodes or fundamental computing units---typically fully fledged computers in the MIMD architecture.

N(UMA)

UMA---Uniform Memory Access---refers to shared memory in which all locations have the same access characteristics, including the same access time. NUMA (Non-Uniform Memory Access) refers to the opposite scenario.

Parallel Computer

A computer in which several functional units are executing independently. The architecture can vary from SMP to MPP and the nodes (functional units) are tightly coupled.

POTS

The conventional twisted pair based Plain Old Telephone Service.

Protocol

A set of conventions and implementation methodologies defining the communication between nodes on a network. There is a famous seven layer OSI standard model going from physical link (optical fiber to satellite) to application layer (such as Fortran subroutine calls). Any given system, such as PVM or the Web implements a particular set of protocols.

PVM

PVM was developed at Emory and Tennessee Universities, and Oak Ridge National Laboratory. It supports the message passing programming model on a network of heterogeneous computers (http://www. epm.ornl.gov/pvm/).

Shared Memory

Memory that appears to the user to be contained in a single address space that can be accessed by any process or any node (functional unit) of the computer. Shared memory may have UMA or NUMA structure. Distributed computers can have a shared memory model implemented in either hardware or software---this would always be NUMA. Shared memory parallel computers can be either NUMA or UMA.

Virtual or Distributed Shared Memory is (the illusion of) a shared memory built with physically distributed memory.

Single-Instruction/Multiple-Data (SIMD)

A parallel computer architecture in which every node runs in lockstep accessing a single global instruction stream, but with different memory locations addressed by each node. Such synchronous operation is very unnatural for the nodes of a HPDC system.

Supercomputer

the most powerful computer that is available at any given time. As performance is roughly proportional to cost, this is not very well defined for a scalable parallel computer. Traditionally, computers costing some $10--$30 M are termed supercomputers.

Symmetric Multiprocessor (SMP)

A Symmetric Multiprocessor supports a shared memory programming model---typically with a UMA memory system, and a collection of up to 32 nodes connected with a bus.

Televirtual

The ultimate computer illusion where the user is fully integrated into a simulated environment and so can interact naturally with fellow users distributed around the globe.

Teraflop

A measure of computer performance---one Teraflop is

floating point operations per second.

Transmission Control Protocol (TCP)

A connection-oriented transport protocol used in the DARPA Internet. TCP provides for the reliable transfer of data, as well as the out-of-band indication of urgent data.

VBNS

A high speed ATM experimental network (WAN) maintained by the National Science Foundation (NSF) to link its four Supercomputer centers at Cornell, Illinois, Pittsburgh, and San Diego, as well as the Boulder National Center for Atmospheric Research (NCAR).

Virtual Reality Modeling Language (VRML)

A ``three-dimensional'' HTML that can be used to give a universal description of three-dimensional objects that supports hyperlinks to additional information.

Web Clients and Servers

A distributed set of clients (requesters and receivers of services) and servers (receiving and satisfying requests from clients) using Web Technologies.

WebWindows

The operating environment created on the World Wide Web to manage a distributed set of networked computers. WebWindows is built from Web clients and Web servers.

WebWork

(Fox:95a) An environment proposed by Boston University, Cooperating Systems Corporation, and Syracuse University, which integrates computing and information services to support a rich distributed programming environment.

World Wide Web and Web Technologies

A very important software model for accessing information on the Internet based on hyperlinks supported by Web technologies, such as HTTP, HTML, MIME, Java, Applets, and VRML.

Next: References Up: No Title Previous: Software for HPDC

Geoffrey Fox, Northeast Parallel Architectures Center at Syracuse University, gcf@npac.syr.edu