Full HTML for Basic MetaComputing -- the Informal Supercomputer -- MRA Meeting Part II:The Practical Issues

Full HTML for

Basic foilset MetaComputing -- the Informal Supercomputer -- MRA Meeting Part II:The Practical Issues

Given by Mark Baker, Geoffrey Fox at Tutorial for CRPC MRA Meeting at Cornell on May 7 1996. Foils prepared May 7 1996
Outside Index Summary of Material

The Challenge

- Transparent Utilisation- Full Utilisation- Construct Metacomputer

Don't Want to Reinvent "Wheel"

General Introduction to Cluster Computing

Some Terminology

CMS Interaction with the OS

The Workings of Typical CMS Package

Special Note - The Ownership Hurdle

Cluster/Metacomputing Environments

- A System Administrators Perspective- A Users Perspective

Features and Functionality of CMS Packages

- Computer Environments Supported- Application Support- Job Scheduling and Allocation Policy- Configurability- Dynamics of Resources

Status of CMS Packages - Basic Problems

Related Projects

- The Information Wide Area Year (I-WAY)- Wide Area Metacomputer Manager (WAMM)- National MetaCenter for Computational Science and Engineering

Near and Future Projects

- WWW/CGI - RSA Factoring- JAVA based systems.

Table of Contents for full HTML of MetaComputing -- the Informal Supercomputer -- MRA Meeting Part II:The Practical Issues

Denote Foils where HTML is sufficient

1

MetaComputing: The Informal Supercomputer
2

Lecture 2: Metacomputing: The Practical Issues
3

Lecture 2: Metacomputing: The Practical Issues
4

Lecture 2: Metacomputing: The Practical Issues
5

Lecture 2: Metacomputing: The Practical Issues
6

The Challenge
7

The Challenge
8

Do Not Want to Reinvent the "Wheel", So Must...
9

General Introduction
10

General Introduction
11

Some Currently Available CMS Packages
12

Some Currently Available CMS Packages
13

Some Terminology - 1
14

Some Terminology - 2
15

Some Terminology - 3
16

Some Terminology - 4
17

Some Terminology - 5
18

Some Terminology - 6
19

Some Terminology - 7
20

Some Terminology - 8
21

Some Terminology - 9
22

Cluster Software and Its Interaction With the Operating System
23

Cluster Software and Its Interaction With the Operating System
24

The Workings of Typical Cluster Management Software - 1
25

The Workings of Typical Cluster Management Software - 2
26

The Workings of Typical Cluster Management Software - 3
27

The Workings of Typical Cluster Management Software - 4
28

The Workings of Typical Cluster Management Software - 5
29

The Workings of Typical Cluster Management Software
30

Special Note - The Ownership Hurdle.
31

Special Note - The Ownership Hurdle.
32

Cluster/Metacomputing Environments:
33

Cluster/Metacomputing Environments:
34

Features and Functionality Desired in a CMS Package
35

Features and Functionality Desired in a CMS Package
36

Features and Functionality Desired in a CMS Package
37

Features and Functionality Desired in a CMS Package
38

Features and Functionality Desired in a CMS Package
39

Features and Functionality Desired in a CMS Package
40

Features and Functionality Desired in a CMS Package
41

Features and Functionality Desired in a CMS Package
42

Features and Functionality Desired in a CMS Package
43

Features and Functionality Desired in a CMS Package
44

Features and Functionality Desired in a CMS Package
45

Features and Functionality Desired in a CMS Package
46

Features and Functionality Desired in a CMS Package
47

Features and Functionality Desired in a CMS Package
48

Features and Functionality Desired in a CMS Package
49

Features and Functionality Desired in a CMS Package
50

Features and Functionality Desired in a CMS Package
51

Features and Functionality Desired in a CMS Package
52

Features and Functionality Desired in a CMS Package
53

Features and Functionality Desired in a CMS Package
54

Features and Functionality Desired in a CMS Package
55

Features and Functionality Desired in a CMS Package
56

Summary of Desirable Cluster/Metacomputing Features
57

Summary of Desirable Cluster/Metacomputing Features
58

Status of CMS Packages - Basic Problems
59

Metacomputing - Related Projects
60

Metacomputing - Related Projects
61

Metacomputing - Related Projects
62

Metacomputing - Related Projects
63

Metacomputing - Related Projects
64

Metacomputing - Related Projects
65

Metacomputing - Related Projects
66

Metacomputing - Related Projects
67

Metacomputing - Related Projects
68

Metacomputing - Related Projects
69

Metacomputing - Related Projects
70

Metacomputing - Related Projects
71

Metacomputing - Related Projects
72

Metacomputing - Related Projects
73

Metacomputing - Related Projects
74

Metacomputing - Related Projects
75

Metacomputing - Related Projects
76

Metacomputing - Related Projects
77

Metacomputing - Related Projects
78

Metacomputing - Related Projects
79

Metacomputing - Related Projects
80

Metacomputing - Related Projects
81

Metacomputing - Related Projects
82

Metacomputing - Related Projects
83

Metacomputing - Related Projects
84

Metacomputing - Related Projects
85

Metacomputing - Related Projects
86

Metacomputing - Related Projects
87

Metacomputing - Related Projects
88

Near and Future Projects
89

Near and Future Projects
90

Near and Future Projects
91

Near and Future Projects
92

Near and Future Projects
93

Near and Future Projects
94

Near and Future Projects
95

Near and Future Projects
96

Near and Future Projects
97

Near and Future Projects
98

Near and Future Projects
99

Near and Future Projects
100

Near and Future Projects
101

Near and Future Projects
102

Near and Future Projects
103

Near and Future Projects
104

Near and Future Projects
105

Near and Future Projects
106

Near and Future Projects
107

Near and Future Projects
108

Near and Future Projects
109

Near and Future Projects
110

Near and Future Projects - MetaWeb
111

Near and Future Projects - MetaWeb
112

Near and Future Projects - MetaWeb
113

Near and Future Projects - MetaWeb
114

Near and Future Projects - MetaWeb
115

Near and Future Projects - MetaWeb
116

Metacomputing in the Future
117

Metacomputing in the Future
118

Metacomputing in the Future
119

Metacomputing in the Future

Outside Index Summary of Material

HTML version of Basic Foils prepared May 7 1996

Foil 1 MetaComputing: The Informal Supercomputer

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Mark Baker and Geoffrey Fox

Northeast Parallel Architectures Center

Syracuse University

111 College Place

Syracuse, NY 13244-4100, USA

tel: +1 (315) 443 2083

fax: +1 (315) 443 1973

email: mab@npac.syr.edu

URL: http://www.npac.syr.edu/

HTML version of Basic Foils prepared May 7 1996

Foil 2 Lecture 2: Metacomputing: The Practical Issues

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Overview

The Challenge

- Transparent Utilisation- Full Utilisation- Construct Metacomputer

Don't Want to Reinvent "Wheel"

General Introduction to Cluster Computing

Some Terminology (1 - 9)

HTML version of Basic Foils prepared May 7 1996

Foil 3 Lecture 2: Metacomputing: The Practical Issues

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Overview

CMS Interaction with the OS

The Workings of Typical CMS Package

Special Note - The Ownership Hurdle

Cluster/Metacomputing Environments

- A System Administrators Perspective- A Users Perspective

HTML version of Basic Foils prepared May 7 1996

Foil 4 Lecture 2: Metacomputing: The Practical Issues

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Overview

Features and Functionality of CMS Packages

- Computer Environments Supported- Application Support- Job Scheduling and Allocation Policy- Configurability- Dynamics of Resources

Status of CMS Packages - Basic Problems

HTML version of Basic Foils prepared May 7 1996

Foil 5 Lecture 2: Metacomputing: The Practical Issues

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Overview

Related Projects

- The Information Wide Area Year (I-WAY)- Wide Area Metacomputer Manager (WAMM)- National MetaCenter for Computational Science and Engineering

Near and Future Projects

- WWW/CGI - RSA Factoring- JAVA based systems.

Metacomputing in the future !

HTML version of Basic Foils prepared May 7 1996

Foil 6 The Challenge

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Transparent Utilisation of a Distributed Heterogeneous Computing

Environment

Want to fully utilise a heterogeneous computing environment where

different types of processing resources and inter-connection

technologies are effectively and efficiently used.

Fully Utilise Available Resources

Low utilisation rates of high-performance workstations (LLNL/Los

Alamos 7- 10%), as their performance grows utilisation will become

worse.

HTML version of Basic Foils prepared May 7 1996

Foil 7 The Challenge

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Build a Metacomputer

The use of distributed resources in this framework is known as

Metacomputing and such an environment has the potential to

maximise performance and cost effectiveness of a wide range of

scientific and distributed applications.

HTML version of Basic Foils prepared May 7 1996

Foil 8 Do Not Want to Reinvent the "Wheel", So Must...

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Understand what we are trying to achieve - through-put and/or processor utilisation !?

Learn from experiences with current LAN-based Cluster Management Software (CMS) packages

Extend existing knowledge to design and develop a WAN-based Metacomputing Management package.

New and emerging technologies may help us solve some of the existing problems.

HTML version of Basic Foils prepared May 7 1996

Foil 9 General Introduction

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Use of clusters of workstations to increase the throughput of user applications is becoming more common place throughout the US and Europe.

A significant number of CMS packages exist - almost all originate from research projects, many have now been taken-up/adopted by commercial vendors.

The importance of cluster software can be seen by both the commercial take-up and also by the widespread installation of this software at most of the major computing facilities around the world.

HTML version of Basic Foils prepared May 7 1996

Foil 10 General Introduction

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Not clear that CMS is being used to take advantage of spare CPU cycles, but it is evident that much effort is being expended to increase throughput on networks of workstations by load balancing the work that needs to be done.

Nearly all the CMS packages are designed to run on Unix workstations and MPP. Some of the PD packages support Linux, which runs on PCs. Support for Windows NT is planned by many vendors.

WWW software and HTTP protocols could clearly be used as part of an integrated CMS package. Little software of this type has so far been developed - several of the packages use a WWW browser as an alternative GUI.

HTML version of Basic Foils prepared May 7 1996

Foil 11 Some Currently Available CMS Packages

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

See - http://www.npac.syr.edu/techreports/hypertext/sccs-748/index.html

Commercial Packages

HTML version of Basic Foils prepared May 7 1996

Foil 12 Some Currently Available CMS Packages

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Research Packages

HTML version of Basic Foils prepared May 7 1996

Foil 13 Some Terminology - 1

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Application Programming Interface (API) - An interface which enables a third party software developers can write portable programs. Examples are the Berkeley Sockets and those published by Microsoft for the Windows GUI.

Batching System - A batching system is one that controls the access to computing resources of applications. Typically a user will send a request the batch manager to run an application. The batch manager will then place the job in a queue (normally FIFO).

HTML version of Basic Foils prepared May 7 1996

Foil 14 Some Terminology - 2

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Distributed Computing Environment - The OSF Distributed Computing Environment (DCE) is a comprehensive, integrated set of services that supports the development, use and maintenance of distributed applications. It provides a set of services, anywhere in the network, enabling applications to use a heterogeneous network of computers.

Fault Tolerance - In this context, the guarantee that a job will complete after a system crash or network failure.

Heterogeneous - Containing components of more than one kind. A heterogeneous architecture may be one in which some components are processors, and others memories, or it may be one that uses different types of processor together.

HTML version of Basic Foils prepared May 7 1996

Foil 15 Some Terminology - 3

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Homogeneous - Made up of identical components. A homogeneous architecture is one in which each element is of the same type - processor arrays and multicomputers are usually homogeneous.

Homogeneous vs Heterogeneous - Often a cluster of workstations is viewed as either homogenous or heterogeneous. These terms are ambiguous, as they refer to not only the make of the workstation but also to the operating system being used on them.

For example, it is possible to have a homogenous clusterrunning various operating systems (SunOS/Solaris, or Irix 4/5).

HTML version of Basic Foils prepared May 7 1996

Foil 16 Some Terminology - 4

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Inter-connection network - The system of logic and conductors that connects the processors in a distributed computer system. Some examples are bus, mesh, hypercube and Omega networks.

Inter-processor communication - The passing of data and information among the processors of a parallel computer during the execution of a parallel program.

Job - This term generally refers to an application sent to a batching system - a job finishes when the application has completed its run.

HTML version of Basic Foils prepared May 7 1996

Foil 17 Some Terminology - 5

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Latency - The time taken to service a request or deliver a message which is independent of the size or nature of the operation. The latency of a message passing system is the minimum time to deliver a message, even one of zero length that does not have to leave the source processor. The latency of a file system is the time required to decode and execute a null operation.

Load balance - The degree to which work is evenly distributed among available processors. A program executes most quickly when it is perfectly load balanced, that is when every processor has a share of the total amount of work to perform so that all processors complete their assigned tasks at the same time.

HTML version of Basic Foils prepared May 7 1996

Foil 18 Some Terminology - 6

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Multitasking - Executing many processes on a single processor. Usually done by time-slicing the execution of individual processes and performing a context switch each time a process is swapped in or out - supported by special-purpose hardware in some computers.

Network - A physical communication medium. A network may consist of one or more buses, a switch, or the links joining processors in a multicomputer.

NFS - Network Filing System is a protocol developed to use IP and allow a set of computers to access each other's file systems as if they were on the local host

HTML version of Basic Foils prepared May 7 1996

Foil 19 Some Terminology - 7

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Network Information Services (NIS) - Developed by Sun Microsystems, NIS is a means of storing network wide information in central databases (NIS servers), where they can be accessed by any of the clients. Typically, a NIS database will be used to store the user password file, mail aliases, group identification numbers, and network resources.

Parallel Job - This can be defined as a single application (job) that has multiple processes that run concurrently . Generally each process will run on a different processor (workstation) and communicate boundary, or other data, between the processes at regular intervals. Typically a parallel job would utilise a message passing interface, such as MPI or PVM, to pass data between the processes.

HTML version of Basic Foils prepared May 7 1996

Foil 20 Some Terminology - 8

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Process - Fundamental entity of the software implementation on a computer system. A process is a sequentially executing piece of code that runs on one processing unit of the system.

Queuing - Method by which jobs are ordered to access some computer resource. Typically the batch manager will place a job the queue. A particular compute resource could possibly have more than one queue, for example queues could be set up for sequential and parallel jobs or short and long job runs.

Sequential Job - Defined as a job that does not pass data to remote processes. Typically such a job would run on a single workstation - it is possible that for a sequential process to spawn multiple threads on its processor.

HTML version of Basic Foils prepared May 7 1996

Foil 21 Some Terminology - 9

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Single Point of Failure - This is where one part of a system will make the whole system fail. In cluster computing this is typically the batch manager, which if it fails the compute resource are no longer accessible by users.

HTML version of Basic Foils prepared May 7 1996

Foil 22 Cluster Software and Its Interaction With the Operating System

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Outside Kernel - Works completely outside the kernel and on top of a machines existing operating system. Installation does not require modification of the kernel - similar to other software packages.

Within Kernel - Other cluster environments use:

- Micro-based kernels with customised services. - Dynamic kernel module.

Installed instead or as a module within the existing kernel to supportthe desired environment. Necessary to support functionality, such asvirtual shared memory.

Difficult to achieve efficiently outside the kernel.

HTML version of Basic Foils prepared May 7 1996

Foil 23 Cluster Software and Its Interaction With the Operating System

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Combinations

Some packages, such as BSP and PVM can exist in two forms:

- The PD versions of these packages are installed on top of an

existing operating system.

- Vendor versions of the software is often integrated into the kernel

to optimise performance.

HTML version of Basic Foils prepared May 7 1996

Foil 24 The Workings of Typical Cluster Management Software - 1

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Step 1 - Job Description File

Produce some type of resource description file.

This file is generally an ASCII text file (produced using a normal text editor or with the aid of a GUI) which contains a set of keywords to be interpreted by the CMS.

The nature and number of keywords available depends on the CMS package, but will at least include the job name, the maximum runtime and the desired platform.

HTML version of Basic Foils prepared May 7 1996

Foil 25 The Workings of Typical Cluster Management Software - 2

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Step 2 - Submit Job

Once completed, the job description file is sent by the client software resident on the user's workstation, to a master scheduler.

The Master Schedular

The master scheduler is the part of the CMS that has an overall view of the cluster resources available

HTML version of Basic Foils prepared May 7 1996

Foil 26 The Workings of Typical Cluster Management Software - 3

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

On each of the resource workstation daemons are present that

communicate their state at regular intervals to the master

scheduler.

One of the tasks of the master scheduler is to evenly balance the

load on the resources that it is managing.

So, when a new job is submitted it not only has to match the

requested resources with those that are available, but also needs to

ensure that the resources being used are load balanced.

HTML version of Basic Foils prepared May 7 1996

Foil 27 The Workings of Typical Cluster Management Software - 4

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Multiple Queues

Typically a batch system will have multiple queues, each being

appropriate for a different type of jobs.

For example

- Homogeneous cluster which is primarily used to serviceparallel jobs, - Powerful server for CPU intensive jobs, - Jobs that need a rapid turnaround.

The number of possible queue configurations is large and will

depend on the typical throughput of jobs on the system being used.

HTML version of Basic Foils prepared May 7 1996

Foil 28 The Workings of Typical Cluster Management Software - 5

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Fault Tolerance

The master scheduler is also tasked with the responsibility of

ensuring that jobs complete successfully.

It does this by monitoring jobs until they successfully finish.

If a job fails, due to problems other than an application runtime

error, it will reschedule the job to run again.

HTML version of Basic Foils prepared May 7 1996

Foil 29 The Workings of Typical Cluster Management Software

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

HTML version of Basic Foils prepared May 7 1996

Foil 30 Special Note - The Ownership Hurdle.

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Generally a workstation will be "owned" by, for example, an individual, a group, a department, or an organisation.

They are dedicated to the exclusive use by the "owners".

This ownership often brings problems when attempting to form a cluster of workstations.

Typically, there are three types of "owner":

Ones who use their workstations for sending and receiving mail or preparing papers, such as administrative staff, librarian, theoreticians, etc.

HTML version of Basic Foils prepared May 7 1996

Foil 31 Special Note - The Ownership Hurdle.

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Ones involved in software development, where the usage of the workstation revolves around the edit, compile, debug and test cycle.

Ones involved with running large numbers of simulations often requiring powerful computational resources.

It is the latter type of "owner" that needs additional compute

resources and it is possible to fulfil their needs by fully utilising

spare CPU cycles from former two "owners".

However, this may be easier said than done and often requires

delicate negotiation to become reality.

HTML version of Basic Foils prepared May 7 1996

Foil 32 Cluster/Metacomputing Environments:

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A Systems Administrator Perspective...

Good documentation (on-line and manual)

Vendor support

Plug and Play Installation

Easy maintenance and support

Easy to manage, reconfigure and administrate

Truly heterogeneous platforms support (OS & Platforms)

Security

Statistics

No Single point of Failure

Fault tolerant

HTML version of Basic Foils prepared May 7 1996

Foil 33 Cluster/Metacomputing Environments:

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A Users Perspective...

Good on-line Documentation

User support

Easy to use GUI

Ease of submitting, monitoring and controlling jobs

Support for all programming paradigms

Job statistics

Batch and interactive usage

Fault tolerance

Checkpointing

HTML version of Basic Foils prepared May 7 1996

Foil 34 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Computing Environments Supported

Commercial/Research - Is it a commercial or a research product?

Platforms Supported - Heterogeneous platform support

Operating Systems - Heterogeneous operating systems support

Additional Hardware/Software - Is there any need for additional hardware or software to be able to run the CMS package?

- For example, additional diskspace or software such as AFS or DCE- It is assumed that software such as NIS and NFS is used.

HTML version of Basic Foils prepared May 7 1996

Foil 35 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Application support

Batch jobs - Are batch submissions of jobs supported?

Interactive Support - Are jobs that would normally be run interactively supported?

- For example, a debugging session or a job that requires usercommand-line input .

Parallel Support - Is there support for running parallel programs on the cluster ?

HTML version of Basic Foils prepared May 7 1996

Foil 36 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Application support

Queue Type - Are multiple, configurable, queues supported?

- This feature is necessary for managing large multi-vendorclusters where jobs ranging from short interactive sessions tocompute intensive parallel applications need to run.

HTML version of Basic Foils prepared May 7 1996

Foil 37 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Job Scheduling and Allocation Policy

Dispatching Policy- Is there a configurable dispatching policy,

- Does it allow for factors such as system load, resourcesavailable (CPU type, computational load, memory, disk-space),resources required, etc.

Impact on Workstation Owner- What is the impact on the owner of the workstation?

- It should be possible to configure the CMS to, for example,suspend jobs when an owner is using his/her workstation or setjobs to have a low priority (nice) value.

HTML version of Basic Foils prepared May 7 1996

Foil 38 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Job Scheduling and Allocation Policy

HTML version of Basic Foils prepared May 7 1996

Foil 39 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Job Scheduling and Allocation Policy

Impact on the Workstation - What is the impact of running the CMS package on a workstation?

- Obvious impact when a job is running, but there also may be anundesirable impact when a job is suspended, checkpointed ormigrated to another workstation.

EX - Process migration requires that a job saves its state andthen is physically moved over the local network to anotherworkstation.

HTML version of Basic Foils prepared May 7 1996

Foil 40 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Job Scheduling and Allocation Policy

This will impact on the workstation (CPU/memory anddiskspace) while the state is saved and then on the networkbandwidth when tens of Mbytes of data is transferred across thenetwork.

Load Balancing - The CMS should load balances the resources that it is managing.

- It is useful if the system administrator can customise the default configuration to suit the local conditions in the light of experience ofrunning the CMS.

HTML version of Basic Foils prepared May 7 1996

Foil 41 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Job Scheduling and Allocation Policy

Check Pointing - Saving a job's state at regular intervals during its execution. If the workstation fails then the job can be restarted at its last checkpointed position.

- Generally useful means of saving state, but can be costly interms of resources

HTML version of Basic Foils prepared May 7 1996

Foil 42 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Job Scheduling and Allocation Policy

Checkpointing Needs

Additional diskspace per workstation is needed.

Home filestore may be remotely mounted, this will have an impact on NFS performance and the network bandwidth.

Existing clusters will have not have the physical resources (local diskspace) to support checkpointing.

HTML version of Basic Foils prepared May 7 1996

Foil 43 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Job Scheduling and Allocation Policy

Process Migration - Migrating an executing processes from one workstation to another.

Useful for:- Minimising impact on workstation - process migrated whenowner takes back control of his/her workstation. - Here the job running on the workstation will, be suspendedfirst, and then migrated onto another workstation after a certaintime interval.

HTML version of Basic Foils prepared May 7 1996

Foil 44 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Job Scheduling and Allocation Policy

- Load balancing the cluster - migrating from heavily loadedworkstations and running them on lightly loaded ones.- Potentially of becoming very complicated on anything otherthan sequential jobs.- Like checkpointing - can be a very useful feature.- Impact is similar to checkpointing, additional disadvantage isthat large state files are moved around the network. This canhave a serious impact on users of the network.

HTML version of Basic Foils prepared May 7 1996

Foil 45 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Job Scheduling and Allocation Policy

Job Monitoring and Rescheduling - Monitor that jobs are running and in the event of a job failure should reschedule job

Suspension/Resumption of Jobs

- This feature is particularly useful to minimise the impact of a jobs on the owner of a workstation, but may also be useful inthe event of a system or network wide problem.

HTML version of Basic Foils prepared May 7 1996

Foil 46 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Configurability

Resource Administration - Control over the resources available.

- The administrator should be able to, for example, control whohas access to what resources and also what resources are used(CPU load, diskspace, memory).

Job Runtime Limits - Enforce job runtime limits.

- Otherwise it will be difficult to fairly allocate resourcesamongst users.

HTML version of Basic Foils prepared May 7 1996

Foil 47 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Configurability

Forked Child Management - It is common for a job to fork child processes.

HTML version of Basic Foils prepared May 7 1996

Foil 48 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Configurability

Process Management - Configure the resources to be either shared or be exclusive to a given job.

- Efficient use of resources may require close control over thenumber of processes running on a workstation, it may even bedesirable to allow exclusive access to workstations by aparticular job. - It should also be possible to control the priority of jobs runningon a workstation to help load balancing (nice)and minimise theimpact of jobs on the owner of the workstation.

HTML version of Basic Foils prepared May 7 1996

Foil 49 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Configurability

Job Scheduling Control - User and/or administrator should be able to schedule when a job will be run.

GUI/Command-line - User interface do users/administrators have.

- The interface of a software package will often determine thepopularity of a package. In general, a Motif GUI is standard. - Dramatic increase in usage and popularity of the HTTPprotocol and the WWW, so a GUI based on this technologyseems likely to be a common standard in the future.

HTML version of Basic Foils prepared May 7 1996

Foil 50 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Configurability

Ease of Use - How easy and/or intuitive is it for users and administrators to use the CMS ?

User Allocation of Jobs - Can a user specify the resources that they require ?

- For example, the machine type, job length and diskspace.

User Job Status Query

- Find out if it is pending/running or perhaps how long before itcompletes.

HTML version of Basic Foils prepared May 7 1996

Foil 51 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Configurability

Job Statistics - Are statistics provided to the user and administrator about the jobs that have run?

HTML version of Basic Foils prepared May 7 1996

Foil 52 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Dynamics of Resources

Runtime Configuration - Reconfigure at runtime.

- Dynamically, at runtime, resources available, queues and otherconfigurable features of the CMS, i.e. it is not necessary restartthe CMS.

Dynamic Resource Pool

- Is it possible to add and withdraw resources (workstations)dynamically during runtime?

HTML version of Basic Foils prepared May 7 1996

Foil 53 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Dynamics of Resources

Single Point of Failure (SPF) - Is there one ?

HTML version of Basic Foils prepared May 7 1996

Foil 54 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Dynamics of Resources

Fault Tolerance - Is there fault tolerance built in ?

- For example, does it check that resources are available beforesubmitting jobs to them, and will it try to rerun a job after aworkstation has crashed.- The CMS should be able to guarantee that a job will complete. - If while a job is running the machine that it is running on failsthen the CMS notice that the machine/s are unavailable, butshould also reschedule the job as soon as feasible.

HTML version of Basic Foils prepared May 7 1996

Foil 55 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Dynamics of Resources

- Also, if a machine running a queue or CMS scheduler fails, theCMS should be able to recover and continue to run.- The real need for fault tolerance is determined by the level ofservice that is being provided by the cluster. However, faulttolerance is a useful feature in any system.

Security Issues

- The package should provide at least normal Unix security.- it is desirable that it takes advantage of NIS and other industrystandard packages.

HTML version of Basic Foils prepared May 7 1996

Foil 56 Summary of Desirable Cluster/Metacomputing Features

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Truly heterogeneous platforms support, across:

- OS and platforms

Good documentation (on-line and manual)

Vendor support

Plug and Play Installation

Batch and interactive usage

Easy to use GUI to package

Easy maintenance and support

HTML version of Basic Foils prepared May 7 1996

Foil 57 Summary of Desirable Cluster/Metacomputing Features

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Easy to manage, reconfigure and administrate

Ease of submitting, monitoring and control jobs

Support for all programming paradigm

Security

Statistics

No Single point of Failure

Fault tolerant

Checkpointing/Process-Migration

HTML version of Basic Foils prepared May 7 1996

Foil 58 Status of CMS Packages - Basic Problems

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

LAN-Based

Limited platform and operating system support - not truly heterogeneous

Do not support all programming paradigms

Load Balancing is generally naive

Single-points-of-failure

Limited-fault tolerance

HTML version of Basic Foils prepared May 7 1996

Foil 59 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

The Information Wide Area Year (I-WAY)

The I-WAY is an experimental high-performance network linking dozens of the country's fastest computers and advanced visualisation environments.

Based on ATM technology.

Supports both TCP/IP over ATM and direct ATM-oriented protocols.

Provides the wide-area high-performance backbone for various experimental networking activities at SC'95.

Built from a combination of existing networks and some additional connectivity and services provided by national service providers.

HTML version of Basic Foils prepared May 7 1996

Foil 60 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

The Information Wide Area Year (I-WAY)

The I-WAY is a testbed to prototype the following:

Teraflop- class wide area computing:

- Nodes consist of the top supercomputing sites, with acombined peak computing power approaching a teraflop. - Work is under way to make this distributed environmentbehave as one facility.

HTML version of Basic Foils prepared May 7 1996

Foil 61 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

The Information Wide Area Year (I-WAY)

Close coupling of immersive virtual environments and supercomputing:

- Applications combine state- of-the-art interactiveenvironments and supercomputing.

An advanced application development resource:

- The I-WAY is envisioned as a resource for advancedapplication development and demonstrations.

HTML version of Basic Foils prepared May 7 1996

Foil 62 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

The I-WAY is a testbed

Testbed to identify future network research issues:

- Goal to uncover the areas requiring further study anddevelopment. - Highlight security mechanisms for wide-area computing,- Advanced end-to-end network management,- Mapping of infrastructure to emerging applicationenvironments,- Mapping of applications to emerging infrastructureenvironments.

HTML version of Basic Foils prepared May 7 1996

Foil 63 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Sites - Level 0

Argonne National Laboratory (ANL)

ARPA HPC Enterprise (ARPA)

California Institute of Technology (CIT)

Cornell Theory Center (CTC)

Lawrence Livermore National Laboratory/NERSC (LLNL)

Los Alamos National Laboratory (LANL)

NASA Goddard Space Flight Center (GSFC)

National Center for Supercomputing Applications (NCSA)

Pittsburgh Supercomputing Center (PSC)

San Diego Supercomputer Center (SDSC)

University of Illinois/EVL (EVL)

HTML version of Basic Foils prepared May 7 1996

Foil 64 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Sites - Level 1

Georgia Institute of Technology (GAT),

Lawrence Berkeley Laboratory (LBL)

Lockheed Martin Missiles & Space Co. (LMSC)

National Center for Atmospheric Research (NCAR)

Naval Command, Control and Ocean Surveillance Center (NCCOSC)

Naval Oceanographic Office

Naval Research Laboratory (Washington DC) (NRLW)

Oak Ridge National Laboratory (ORNL)

Pacific Northwest Laboratory (PNL)

Sandia National Laboratory (SNL)

University of Maryland (UMD)

University of Minnesota (UMN)

University of Virginia

University of Wisconsin (UWI)

HTML version of Basic Foils prepared May 7 1996

Foil 65 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

The Information Wide Area Year (I-WAY)

HTML version of Basic Foils prepared May 7 1996

Foil 66 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

The Information Wide Area Year (I-WAY)

How Long Will I-WAY Last?

I-WAY began as a project to support Supercomputing 1995, but...

Phase I, from now until January 1, 1996

Phase II, Jan 1, 1996 - Jan 1, 1997

Phase III ?

HTML version of Basic Foils prepared May 7 1996

Foil 67 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

The Information Wide Area Year (I-WAY)

Application Software

This software is intended to provide a functional and uniform environment across different I-WAY systems. It comprises:

- Single node software: compilers, shells, editors, etc. - Communication libraries - Parallel languages - Scalable Unix tools - Performance tools - Graphics libraries

HTML version of Basic Foils prepared May 7 1996

Foil 68 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

The Information Wide Area Year (I-WAY)

URL http://www.iway.org/

HTML version of Basic Foils prepared May 7 1996

Foil 69 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WAMM (Wide Area Metacomputer Manager)

WAMM is a graphical tool, built on top of PVM.

- It provides user with a GUI to assist in tasks such as: host add,check, removal, process management, compilation on remotehosts, remote commands execution.

HTML version of Basic Foils prepared May 7 1996

Foil 70 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WAMM (Wide Area Metacomputer Manager)

Sites Involved (Italy)

CINCECA - Interuniversity Consortium of Northeast Italy for Automatic Comp - Bologna

CASPUR - University and Research Consortium for Supercomputing Apps - Rome

CRS4 - Centre for Advanced Studies, Research and Development - Sardinia

CNUCE - institute of the Italian National Research Council - Pisa

ScuolaNormale Superiore - Pisa

Connection - Networked by GARR, the Italian research network - 2 Mbps

HTML version of Basic Foils prepared May 7 1996

Foil 71 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WAMM (Wide Area Metacomputer Manager)

HTML version of Basic Foils prepared May 7 1996

Foil 72 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WAMM (Wide Area Metacomputer Manager) - GUI

All functions are accessible via menus and buttons.

Geographical View of the System

Hosts are grouped following a tree structure.

- The root node, corresponding to a WAN, can containMAN and LAN. - MANs can contain only LANs and LANs contain the hostscomposing the Metacomputer. - WAN and MANs are usually represented by geographicalmaps, in order to facilitate user exploration of the resources.

HTML version of Basic Foils prepared May 7 1996

Foil 73 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WAMM (Wide Area Metacomputer Manager) - WAMM Tree

HTML version of Basic Foils prepared May 7 1996

Foil 74 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WAMM (Wide Area Metacomputer Manager)

Remote Command Execution

UNIX commands (e.g. ls, uptime, who, etc.) as well as X11 programs

(e.g. xload, xterm, etc.) can be executed on remote hosts.

WAMM takes care of showing command output (for UNIX ones) and

windows (for X11 ones) on the user's display.

HTML version of Basic Foils prepared May 7 1996

Foil 75 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WAMM (Wide Area Metacomputer Manager)

Remote Compilation

Compilation of modules on remote nodes is greatly simplified.

The user selects a group of hosts to compile onto and a set of

source files to be compiled.

WAMM copies sources on remote nodes, compiles them in parallel

and shows progress in separate windows, one for each host.

HTML version of Basic Foils prepared May 7 1996

Foil 76 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WAMM (Wide Area Metacomputer Manager)

Configuration

The Metacomputer configuration is specified through an external

file, written in a simple declarative language.

Number and grouping of hosts, remote commands for each node,

icons can be specified.

Graphical aspect (colours, fonts, etc.) can be customised via

standard X11 resource files.

HTML version of Basic Foils prepared May 7 1996

Foil 77 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WAMM (Wide Area Metacomputer Manager)

HTML version of Basic Foils prepared May 7 1996

Foil 78 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WAMM (Wide Area Metacomputer Manager)

Software Requirements

- PVM version 3.3 or higher

- X11 Release 5 or higher

- Motif version 1.2 or higher

- XPM version 3.4 or higher

HTML version of Basic Foils prepared May 7 1996

Foil 79 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WAMM (Wide Area Metacomputer Manager)

Supported Platforms - WAMM was developed and tested on:

- HP9000/700 running HP-UX 9.01 - Sun SparcStation 2 running SunOs 4.1.3 - IBM RISC/6000 running AIX 3.2 - IBM SP2 running AIX 3.2

WAMM has also been compiled (but not sufficiently tested) on :

- IBM RISC/6000 running AIX 4.1 - Silicon Graphics Indigo2 running IRIX 5.3 - DEC AlphaStation running OSF/1.3.2

HTML version of Basic Foils prepared May 7 1996

Foil 80 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WAMM (Wide Area Metacomputer Manager)

URL http://miles.cnuce.cnr.it/pp/wamm/

HTML version of Basic Foils prepared May 7 1996

Foil 81 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

NSF Centers Form National MetaCenter for Computational Science

and Engineering

NCSA's View

MetaCenter, n.: a coalescence of intellectual and physical resources

unlimited by geographical constraint; a synthesis of individual

centers that by combining resources creates a new resource

greater than the sum of its parts.

HTML version of Basic Foils prepared May 7 1996

Foil 82 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

NSF Metacenter

Objective

Based on the concept of distributed heterogeneous computing, or

Metacomputing, the MetaCenter provides scientists and engineers

the capability to move portions of their problems directly to

appropriate computer architectures without regard for where the

computers are located

Enlarge the research base and by facilitating collaboration among

researchers no what their physical location is. in the world.

HTML version of Basic Foils prepared May 7 1996

Foil 83 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

NSF Metacenter

Resources

Through heterogeneous networking technology, interactive

communication makes it possible from the desktops of individuals

and groups of scientists and engineer;

Research environment no longer needs to be a single lab., but will

invoke distributed intelligence and machinery, seamlessly

networked together.

Combine expertise and talents of the individual Centers' staffs and

focusing them on collaborative projects.

HTML version of Basic Foils prepared May 7 1996

Foil 84 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

NSF Metacenter

Resources

Co-operation will create an environment where not only academic

users, but industrial scientists and engineers can evaluate a greater

variety of systems.

Of particular importance to industry will be the ability to assess the

advantages and disadvantages of the combined high performance

computing resources offered in the MetaCenter at a far lower risk and

cost than a company would assume acquiring systems and building

expertise on their own.

HTML version of Basic Foils prepared May 7 1996

Foil 85 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

NSF Metacenter

MetaCenter participants

- Cornell Theory Center

- National Center for Atmospheric Research

- National Center for Supercomputing Applications (University of Illinois)

- Pittsburgh Supercomputing Center

- San Diego Supercomputer Center

HTML version of Basic Foils prepared May 7 1996

Foil 86 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

NSF Metacenter

MetaCenter Regional Alliance participants

- California Institute of Technology-Los Angeles Regional Gigabit Environment

- University of Illinois at Chicago

- Rice University- Center for Research on Parallel Computation

- MCNC/North Carolina Supercomputing Center

- Ohio Supercomputer Center

- Arctic Region Supercomputing Center

- PhAROH Metacenter Web Server

HTML version of Basic Foils prepared May 7 1996

Foil 87 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

NSF Metacenter

URL http://www.sdsc.edu/SDSC/Metacenter/MetaCenterHome.html

HTML version of Basic Foils prepared May 7 1996

Foil 88 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WWW/CGI Computing

World Wide Web is now the most promising candidate for the universal access core component of the NII.

Current Web is ~15,000 servers and expands at the rate of ~1 new server/hour.

Software industry starts adding value (Netscape, Netsite, Mosaic licenses, HotMetal, Netforce, Web support in OS/2 Warp and Windows95)

HTML version of Basic Foils prepared May 7 1996

Foil 89 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WWW/CGI Computing

So far, Web was mainly used for static hypermedia such as local information pages, digital libraries, Internet directories etc. However, the WWW model offers also extension mechanisms (CGI, CCI) towards dynamic services and in fact arbitrary computation

Early interactive Web services appearing. Examples include: WebCalc (NASA Goddard), Easy HTML (NCSA), WebChat (Internet Society), Virtual Doors (Unique, Inc.), Visioneering's Imaging Machine (VRL,Inc.)

HTML version of Basic Foils prepared May 7 1996

Foil 90 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WWW/CGI Computing - Web Technology Overview

Browsers have SAME interface on ALL Computers - Clients (such as Mosaic and Netscape) support browsing of hyperlinked documents but have no internal interactive/compute capability

Servers read HTTP and deliver requested service to client

HTML version of Basic Foils prepared May 7 1996

Foil 91 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WWW/CGI Computing - Web Technology Overview

PERL -- a rapid prototyping language(script) aimed at text and file manipulation - CGI Programs are typically written in PERL but an be essentially ANY UNIX Process and so do simulation, database access, advanced document processing etc.

Web Search engines such as YAHOO, HARVEST, WAIS -- early distributed database access technology supporting search and indexing

net.Thread, WebTools, RealAudio are early Web Interactive services

HTML version of Basic Foils prepared May 7 1996

Foil 92 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WWW/CGI Computing - Key points in Web Technology

Characteristics

Current main components: HTTP; HTML; CGI; Fillout Form

Client-server communication model - (Flat hierarchical UNIX) File system as the major file (data) management system

HTML version of Basic Foils prepared May 7 1996

Foil 93 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WWW/CGI Computing - Key points in Web Technology

Strengths

Established Internet as the major vehicle in networking industry

Universal, hyperlinked information access and dissemination

Transparent networking navigation and GUI with multimedia information access for information - dissemination--- a killer networking application

HTML version of Basic Foils prepared May 7 1996

Foil 94 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WWW/CGI Computing - Key points in Web Technology

Weaknesses

Static, browser-oriented client

Document update done manually, hard to automate

Flat UNIX file system supports only primitive information system functions such as open, read/write and close.

HTML version of Basic Foils prepared May 7 1996

Foil 95 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WWW/CGI Computing - Some Technologies to be Integrated into Web

ATM, ISDN, Wireless, Satellite will be hybrid physical implementation of NII

CORBA, Opendoc, OLE, SGML, Hytime are critical file and document standards

High Performance Multimedia servers to enable digital information delivery on demand

Data transport from MPI/PVM etc.

HTML version of Basic Foils prepared May 7 1996

Foil 96 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

WWW/CGI Computing - Some Technologies to be Integrated into the Web

Windows95/NT -- the last of the the non social (Web) operating

systems - will follow dinosaurs (IBM mainframes) into extinction

except as WebServer/Client platforms with only base operating

system services

Personal Digital Assistants -- WebNewtons done right - Learn from

Telescript (agent based communication) and Magic Cap operating

system

HTML version of Basic Foils prepared May 7 1996

Foil 97 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A WWW Based Computing Project Undertaken at NPAC

In collaboration with Boston University and Cooperating Systems NPAC has been developing concepts and prototypes of "Compute-Webs" over the last year.

This work is partly motivated by the integration of information processing and computation for both a better programming environment and for a natural support of data intensive computing.

Further the Web itself represents the largest available computer with worldwide some 20 million potential nodes which is expected to grow by a factor of 10 as the Information Superhighway is deployed fully.

HTML version of Basic Foils prepared May 7 1996

Foil 98 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A WWW Based Computing Project Undertaken at NPAC

Our first prototype was built on compute-extended Web Servers using the standard CGI mechanism and applied successfully to the factorisation of the RSA 130 decimal digit number using the latest sieving algorithm which was distributed to a net of Web servers and clients in a load balanced fault tolerant fashion.

This work was presented at the Supercomputing 95 and was given the award as the most geographically dispersed and heterogeneous metacomputing solution in the Teraflop Challenge contest.

HTML version of Basic Foils prepared May 7 1996

Foil 99 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A WWW Based Computing Project Undertaken at NPAC

RSA Factoring Challenge - Introduction

Public-key cryptosystem for both encryption and authentication; it was invented in 1977 by Rivest-Shamir-Adleman (RSA).

RSA is a public key cryptosystem, a cryptosystem where each party has two keys: a public key and a corresponding secret key.

The public key is made public, the secret key is kept secret.

HTML version of Basic Foils prepared May 7 1996

Foil 100 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A WWW Based Computing Project Undertaken at NPAC - RSA Factoring

Anyone can encrypt a message using the public key that belongs to the intended recipient, but only parties that know the corresponding secret key can decrypt the encrypted message.

The secrecy of the secret key, and therefore the security of the public keycryptosystem, depends on the fact that it is computationally infeasible to derive the secret key from the public key.

If that would be easy, anyone would be able to decrypt intercepted messages; if that is impossible, then the system is secure.

HTML version of Basic Foils prepared May 7 1996

Foil 101 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A WWW Based Computing Project Undertaken at NPAC - RSA Factoring

In the RSA public key cryptosystem the secret key can be derived from the public key,

If one is able to find the factorisation of a number that is part of the public key. Thus, the security of RSA depends on the difficulty of factoring.

Since factoring large numbers is believed to be hard - RSA is believed to be secure.

RSA interested in factoring to be able to evaluate the security of RSA implementations: how large should the numbers so that RSA becomes impossible to break?

HTML version of Basic Foils prepared May 7 1996

Foil 102 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A WWW Based Computing Project Undertaken at NPAC - RSA Factoring

In the early eighties some people thought that 100-digit numbers would offer enough security; 100-digit numbers can now routinely be factored.

In the August 1977 issue of Scientific American the inventors of posed the129-digit RSA challenge, and predicted that it would take 40 quadrillion years to factor the challenge; it was factored in April 1994 after 8 months on the Internet.

Right now many people are still protecting their data and money using 155- digit (i.e., 512-bit) numbers.

HTML version of Basic Foils prepared May 7 1996

Foil 103 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A WWW Based Computing Project Undertaken at NPAC - RSA Factoring

The progress in factoring is due both to better factoring methods and to more and faster hardware.

The www-factoring project is the first large scale project that makes use of a new and faster factoring method: the Number Field Sieve (NFS).

First goal is to factor a 130-digit number (known as RSA-130, part of the RSA- factoring challenge). After that we go for bigger numbers, with the ultimate goal to evaluate how hard it would be to break a 155-digit number.

HTML version of Basic Foils prepared May 7 1996

Foil 104 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A WWW Based Computing Project Undertaken at NPAC - RSA Factoring

RSA Factoring Components - FAFNER

FAFNER is a collection of Perl scripts, HTML pages, and associated documentation which together comprise the "server-side" of the Web factoring effort.

The FAFNER software doesn't actually make any progress towards factoring RSA130; rather, it provides interactive registration, task assignment, and solution database services to sieving clients.

HTML version of Basic Foils prepared May 7 1996

Foil 105 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A WWW Based Computing Project Undertaken at NPAC - RSA Factoring

GNFS (General Number Field Sieve)

The GNFS client package implements the sieving algorithms that converts a task specification into a set of useful results (called "relations").

GNFS is implemented in C, and has been ported to most Unix environments.

GNFS performs relatively little I/O, does not use the network, and has large (but configurable and constant for an entire run) memory requirements.

HTML version of Basic Foils prepared May 7 1996

Foil 106 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A WWW Based Computing Project Undertaken at NPAC - RSA Factoring

This makes it a good code to run on idle workstations, because it is almost entirely CPU-bound. GNFS is the original sieving client, but it executes exactly one task specified on its command line, and is not network-aware.

GNFSD (General Number Field Sieving Daemon)

GNFSD is an augmented sieving client (also written in C) that allows a GNFS process to interact with a "task server" over the net, rather than requiring task specification on the GNFS command line.

Other key features are automatic failure detection and restart via a watchdog timer, persistent configuration state, and a TCP/IP monitor interface at port 5453.

HTML version of Basic Foils prepared May 7 1996

Foil 107 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A WWW Based Computing Project Undertaken at NPAC - RSA Factoring

How It All Fits Together

The FAFNER servers are hierarchical; there is a root server, plus several major subservers. Each of these in turn has subservers, and so forth.

A FAFNER subserver depends on its parent for sieving tasks. As they pass from server to server, they are broken up into smaller and smaller bites. By the time they get to clients, there may be as few as 100 Q-values per task.

The sieving clients (GNFS or GNFSD processes) are the leaves of the FAFNER tree; they get a single task from a FAFNER server, and then spend anywhere from 15 minutes to several days computing the problem.

HTML version of Basic Foils prepared May 7 1996

Foil 108 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A WWW Based Computing Project Undertaken at NPAC - RSA Factoring

When the answers are ready (in the form of a text file containing a few 100 or few 1000 relations), the clients send them back to their FAFNER server.

There, they are distilled, archived, and ultimately sent back to Bellcore, where they are integrated into the final solution -- the factoring of RSA130.

HTML version of Basic Foils prepared May 7 1996

Foil 109 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A WWW Based Computing Project Undertaken at NPAC - RSA Factoring

The Problem with Fafner

We found that a major problem with our CGI enhanced Web servers that supported RSA130 factoring, was that they did not provide the standard support which one expects from clustered computing packages.

Such as load balancing, fault tolerance, process management, automatic minimisation of job impact on user workstations, security, and accounting support.

HTML version of Basic Foils prepared May 7 1996

Foil 110 Near and Future Projects - MetaWeb

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

A Scalable Metacomputer and Cluster Management Package

The overall goal of this project is to design, develop and implement a WWW-based Metacomputer management package - MetaWeb.

Project will build on existing knowledge and experiences with the management of LAN-based computing clusters to produce a software package capable of managing a potentially globally distributed Metacomputer.

HTML version of Basic Foils prepared May 7 1996

Foil 111 Near and Future Projects - MetaWeb

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

The primary objective of MetaWeb is to increase the through-put of user applications by utilising the wealth of existing networked computing resources efficiently and effectively together. A side product of this objective is the encouragement of individuals, groups and organisations to collaborate in setting up and utilising MetaWeb to build a global Metacomputer.

MetaWeb will be a truly heterogeneous package capable of managing resources ranging from personal computers running Windows 95 or NT through to vector/MPP supercomputers - this capability will be based on the use of pervasive WWW software such as HTML, HTTP and Java.

HTML version of Basic Foils prepared May 7 1996

Foil 112 Near and Future Projects - MetaWeb

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

MetaWeb will be designed to be fully fault tolerant. Not only will it be able to reboot itself and retain its previous status but also be able to resume or restart failed application jobs. This ability is enabled by the fully duplicated design of MetaWeb and also by the use of a persistent database to maintain the Metacomputer's current status

HTML version of Basic Foils prepared May 7 1996

Foil 113 Near and Future Projects - MetaWeb

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

MetaWeb will be prototyped using existing WWW-based technologies such as Perl CGI-scripts, C-modules and HTTPD servers. This prototyping phase of the project will allow the MetaWeb design be proven and adapted, if necessary. Thus ensuring that the implemented version of the package will functional, robust and work as it is intended to.

MetaWeb will replace the need to use existing research and commercial cluster management packages, such as Codine, LoadLeveler, DQS, etc, by exploiting emerging technologies and the ubiquitous nature of WWW. MetaWeb will exhibit all the best features of the existing management packages but will have the advantage of being specifically designed and developed with all the latest and emerging WWW technologies at hand.

HTML version of Basic Foils prepared May 7 1996

Foil 114 Near and Future Projects - MetaWeb

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

HTML version of Basic Foils prepared May 7 1996

Foil 115 Near and Future Projects - MetaWeb

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

HTML version of Basic Foils prepared May 7 1996

Foil 116 Metacomputing in the Future

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Long term is hard to predict - See changes over last 5 Years!!

Can see trends, however...

HTML version of Basic Foils prepared May 7 1996

Foil 117 Metacomputing in the Future

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Hardware Trends (5 - 10 Years) - Computers

Millions (100 - 300) of "settop" boxes

One in every US household

More worldwide

Ranging from Supercomputer to Personal Digital Assistants

HTML version of Basic Foils prepared May 7 1996

Foil 118 Metacomputing in the Future

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Hardware Trends (5 - 10 Years) - Networks

Networks (1 - 20 MBytes/s) - fulfil needs of "home" entertainment industry.

Technologies ranging from high-bandwidth fibre to Electro-magnetic types such as Microwave.

HTML version of Basic Foils prepared May 7 1996

Foil 119 Metacomputing in the Future

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *

Full HTML Index

Hardware Trends (5 - 10 Years) - Software

Very Hard to Predict in the relatively short term - JAVA has been

around for about a year !!

Ubiquitous and pervasive (WWW/JAVA-like).

Can forget about underlying hardware and operating system.

Metacomputing "plug-ins"

Micro-kernel-like JAVA based servers with add-on services that can support Metacomputing (load balancing, migration, checkpointing, etc...)

Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Apr 11 1999