Full HTML for

Basic foilset MetaComputing -- the Informal Supercomputer -- MRA Meeting Part II:The Practical Issues

Given by Mark Baker, Geoffrey Fox at Tutorial for CRPC MRA Meeting at Cornell on May 7 1996. Foils prepared May 7 1996
Outside Index Summary of Material


The Challenge
  • - Transparent Utilisation- Full Utilisation- Construct Metacomputer
Don't Want to Reinvent "Wheel"
General Introduction to Cluster Computing
Some Terminology
CMS Interaction with the OS
The Workings of Typical CMS Package
Special Note - The Ownership Hurdle
Cluster/Metacomputing Environments
  • - A System Administrators Perspective- A Users Perspective
Features and Functionality of CMS Packages
  • - Computer Environments Supported- Application Support- Job Scheduling and Allocation Policy- Configurability- Dynamics of Resources
Status of CMS Packages - Basic Problems
Related Projects
  • - The Information Wide Area Year (I-WAY)- Wide Area Metacomputer Manager (WAMM)- National MetaCenter for Computational Science and Engineering
Near and Future Projects
  • - WWW/CGI - RSA Factoring- JAVA based systems.

Table of Contents for full HTML of MetaComputing -- the Informal Supercomputer -- MRA Meeting Part II:The Practical Issues

Denote Foils where HTML is sufficient

1 MetaComputing: The Informal Supercomputer
2 Lecture 2: Metacomputing: The Practical Issues
3 Lecture 2: Metacomputing: The Practical Issues
4 Lecture 2: Metacomputing: The Practical Issues
5 Lecture 2: Metacomputing: The Practical Issues
6 The Challenge
7 The Challenge
8 Do Not Want to Reinvent the "Wheel", So Must...
9 General Introduction
10 General Introduction
11 Some Currently Available CMS Packages
12 Some Currently Available CMS Packages
13 Some Terminology - 1
14 Some Terminology - 2
15 Some Terminology - 3
16 Some Terminology - 4
17 Some Terminology - 5
18 Some Terminology - 6
19 Some Terminology - 7
20 Some Terminology - 8
21 Some Terminology - 9
22 Cluster Software and Its Interaction With the Operating System
23 Cluster Software and Its Interaction With the Operating System
24 The Workings of Typical Cluster Management Software - 1
25 The Workings of Typical Cluster Management Software - 2
26 The Workings of Typical Cluster Management Software - 3
27 The Workings of Typical Cluster Management Software - 4
28 The Workings of Typical Cluster Management Software - 5
29 The Workings of Typical Cluster Management Software
30 Special Note - The Ownership Hurdle.
31 Special Note - The Ownership Hurdle.
32 Cluster/Metacomputing Environments:
33 Cluster/Metacomputing Environments:
34 Features and Functionality Desired in a CMS Package
35 Features and Functionality Desired in a CMS Package
36 Features and Functionality Desired in a CMS Package
37 Features and Functionality Desired in a CMS Package
38 Features and Functionality Desired in a CMS Package
39 Features and Functionality Desired in a CMS Package
40 Features and Functionality Desired in a CMS Package
41 Features and Functionality Desired in a CMS Package
42 Features and Functionality Desired in a CMS Package
43 Features and Functionality Desired in a CMS Package
44 Features and Functionality Desired in a CMS Package
45 Features and Functionality Desired in a CMS Package
46 Features and Functionality Desired in a CMS Package
47 Features and Functionality Desired in a CMS Package
48 Features and Functionality Desired in a CMS Package
49 Features and Functionality Desired in a CMS Package
50 Features and Functionality Desired in a CMS Package
51 Features and Functionality Desired in a CMS Package
52 Features and Functionality Desired in a CMS Package
53 Features and Functionality Desired in a CMS Package
54 Features and Functionality Desired in a CMS Package
55 Features and Functionality Desired in a CMS Package
56 Summary of Desirable Cluster/Metacomputing Features
57 Summary of Desirable Cluster/Metacomputing Features
58 Status of CMS Packages - Basic Problems
59 Metacomputing - Related Projects
60 Metacomputing - Related Projects
61 Metacomputing - Related Projects
62 Metacomputing - Related Projects
63 Metacomputing - Related Projects
64 Metacomputing - Related Projects
65 Metacomputing - Related Projects
66 Metacomputing - Related Projects
67 Metacomputing - Related Projects
68 Metacomputing - Related Projects
69 Metacomputing - Related Projects
70 Metacomputing - Related Projects
71 Metacomputing - Related Projects
72 Metacomputing - Related Projects
73 Metacomputing - Related Projects
74 Metacomputing - Related Projects
75 Metacomputing - Related Projects
76 Metacomputing - Related Projects
77 Metacomputing - Related Projects
78 Metacomputing - Related Projects
79 Metacomputing - Related Projects
80 Metacomputing - Related Projects
81 Metacomputing - Related Projects
82 Metacomputing - Related Projects
83 Metacomputing - Related Projects
84 Metacomputing - Related Projects
85 Metacomputing - Related Projects
86 Metacomputing - Related Projects
87 Metacomputing - Related Projects
88 Near and Future Projects
89 Near and Future Projects
90 Near and Future Projects
91 Near and Future Projects
92 Near and Future Projects
93 Near and Future Projects
94 Near and Future Projects
95 Near and Future Projects
96 Near and Future Projects
97 Near and Future Projects
98 Near and Future Projects
99 Near and Future Projects
100 Near and Future Projects
101 Near and Future Projects
102 Near and Future Projects
103 Near and Future Projects
104 Near and Future Projects
105 Near and Future Projects
106 Near and Future Projects
107 Near and Future Projects
108 Near and Future Projects
109 Near and Future Projects
110 Near and Future Projects - MetaWeb
111 Near and Future Projects - MetaWeb
112 Near and Future Projects - MetaWeb
113 Near and Future Projects - MetaWeb
114 Near and Future Projects - MetaWeb
115 Near and Future Projects - MetaWeb
116 Metacomputing in the Future
117 Metacomputing in the Future
118 Metacomputing in the Future
119 Metacomputing in the Future

Outside Index Summary of Material



HTML version of Basic Foils prepared May 7 1996

Foil 1 MetaComputing: The Informal Supercomputer

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Mark Baker and Geoffrey Fox
Northeast Parallel Architectures Center
Syracuse University
111 College Place
Syracuse, NY 13244-4100, USA
tel: +1 (315) 443 2083
fax: +1 (315) 443 1973
email: mab@npac.syr.edu
URL: http://www.npac.syr.edu/

HTML version of Basic Foils prepared May 7 1996

Foil 2 Lecture 2: Metacomputing: The Practical Issues

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Overview
The Challenge
  • - Transparent Utilisation- Full Utilisation- Construct Metacomputer
Don't Want to Reinvent "Wheel"
General Introduction to Cluster Computing
Some Terminology (1 - 9)

HTML version of Basic Foils prepared May 7 1996

Foil 3 Lecture 2: Metacomputing: The Practical Issues

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Overview
CMS Interaction with the OS
The Workings of Typical CMS Package
Special Note - The Ownership Hurdle
Cluster/Metacomputing Environments
  • - A System Administrators Perspective- A Users Perspective

HTML version of Basic Foils prepared May 7 1996

Foil 4 Lecture 2: Metacomputing: The Practical Issues

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Overview
Features and Functionality of CMS Packages
  • - Computer Environments Supported- Application Support- Job Scheduling and Allocation Policy- Configurability- Dynamics of Resources
Status of CMS Packages - Basic Problems

HTML version of Basic Foils prepared May 7 1996

Foil 5 Lecture 2: Metacomputing: The Practical Issues

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Overview
Related Projects
  • - The Information Wide Area Year (I-WAY)- Wide Area Metacomputer Manager (WAMM)- National MetaCenter for Computational Science and Engineering
Near and Future Projects
  • - WWW/CGI - RSA Factoring- JAVA based systems.
Metacomputing in the future !

HTML version of Basic Foils prepared May 7 1996

Foil 6 The Challenge

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Transparent Utilisation of a Distributed Heterogeneous Computing
Environment
Want to fully utilise a heterogeneous computing environment where
different types of processing resources and inter-connection
technologies are effectively and efficiently used.
Fully Utilise Available Resources
Low utilisation rates of high-performance workstations (LLNL/Los
Alamos 7- 10%), as their performance grows utilisation will become
worse.

HTML version of Basic Foils prepared May 7 1996

Foil 7 The Challenge

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Build a Metacomputer
The use of distributed resources in this framework is known as
Metacomputing and such an environment has the potential to
maximise performance and cost effectiveness of a wide range of
scientific and distributed applications.

HTML version of Basic Foils prepared May 7 1996

Foil 8 Do Not Want to Reinvent the "Wheel", So Must...

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Understand what we are trying to achieve - through-put and/or processor utilisation !?
Learn from experiences with current LAN-based Cluster Management Software (CMS) packages
Extend existing knowledge to design and develop a WAN-based Metacomputing Management package.
New and emerging technologies may help us solve some of the existing problems.

HTML version of Basic Foils prepared May 7 1996

Foil 9 General Introduction

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Use of clusters of workstations to increase the throughput of user applications is becoming more common place throughout the US and Europe.
A significant number of CMS packages exist - almost all originate from research projects, many have now been taken-up/adopted by commercial vendors.
The importance of cluster software can be seen by both the commercial take-up and also by the widespread installation of this software at most of the major computing facilities around the world.

HTML version of Basic Foils prepared May 7 1996

Foil 10 General Introduction

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Not clear that CMS is being used to take advantage of spare CPU cycles, but it is evident that much effort is being expended to increase throughput on networks of workstations by load balancing the work that needs to be done.
Nearly all the CMS packages are designed to run on Unix workstations and MPP. Some of the PD packages support Linux, which runs on PCs. Support for Windows NT is planned by many vendors.
WWW software and HTTP protocols could clearly be used as part of an integrated CMS package. Little software of this type has so far been developed - several of the packages use a WWW browser as an alternative GUI.

HTML version of Basic Foils prepared May 7 1996

Foil 11 Some Currently Available CMS Packages

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
See - http://www.npac.syr.edu/techreports/hypertext/sccs-748/index.html
Commercial Packages

HTML version of Basic Foils prepared May 7 1996

Foil 12 Some Currently Available CMS Packages

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Research Packages

HTML version of Basic Foils prepared May 7 1996

Foil 13 Some Terminology - 1

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Application Programming Interface (API) - An interface which enables a third party software developers can write portable programs. Examples are the Berkeley Sockets and those published by Microsoft for the Windows GUI.
Batching System - A batching system is one that controls the access to computing resources of applications. Typically a user will send a request the batch manager to run an application. The batch manager will then place the job in a queue (normally FIFO).

HTML version of Basic Foils prepared May 7 1996

Foil 14 Some Terminology - 2

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Distributed Computing Environment - The OSF Distributed Computing Environment (DCE) is a comprehensive, integrated set of services that supports the development, use and maintenance of distributed applications. It provides a set of services, anywhere in the network, enabling applications to use a heterogeneous network of computers.
Fault Tolerance - In this context, the guarantee that a job will complete after a system crash or network failure.
Heterogeneous - Containing components of more than one kind. A heterogeneous architecture may be one in which some components are processors, and others memories, or it may be one that uses different types of processor together.

HTML version of Basic Foils prepared May 7 1996

Foil 15 Some Terminology - 3

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Homogeneous - Made up of identical components. A homogeneous architecture is one in which each element is of the same type - processor arrays and multicomputers are usually homogeneous.
Homogeneous vs Heterogeneous - Often a cluster of workstations is viewed as either homogenous or heterogeneous. These terms are ambiguous, as they refer to not only the make of the workstation but also to the operating system being used on them.
For example, it is possible to have a homogenous clusterrunning various operating systems (SunOS/Solaris, or Irix 4/5).

HTML version of Basic Foils prepared May 7 1996

Foil 16 Some Terminology - 4

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Inter-connection network - The system of logic and conductors that connects the processors in a distributed computer system. Some examples are bus, mesh, hypercube and Omega networks.
Inter-processor communication - The passing of data and information among the processors of a parallel computer during the execution of a parallel program.
Job - This term generally refers to an application sent to a batching system - a job finishes when the application has completed its run.

HTML version of Basic Foils prepared May 7 1996

Foil 17 Some Terminology - 5

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Latency - The time taken to service a request or deliver a message which is independent of the size or nature of the operation. The latency of a message passing system is the minimum time to deliver a message, even one of zero length that does not have to leave the source processor. The latency of a file system is the time required to decode and execute a null operation.
Load balance - The degree to which work is evenly distributed among available processors. A program executes most quickly when it is perfectly load balanced, that is when every processor has a share of the total amount of work to perform so that all processors complete their assigned tasks at the same time.

HTML version of Basic Foils prepared May 7 1996

Foil 18 Some Terminology - 6

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Multitasking - Executing many processes on a single processor. Usually done by time-slicing the execution of individual processes and performing a context switch each time a process is swapped in or out - supported by special-purpose hardware in some computers.
Network - A physical communication medium. A network may consist of one or more buses, a switch, or the links joining processors in a multicomputer.
NFS - Network Filing System is a protocol developed to use IP and allow a set of computers to access each other's file systems as if they were on the local host

HTML version of Basic Foils prepared May 7 1996

Foil 19 Some Terminology - 7

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Network Information Services (NIS) - Developed by Sun Microsystems, NIS is a means of storing network wide information in central databases (NIS servers), where they can be accessed by any of the clients. Typically, a NIS database will be used to store the user password file, mail aliases, group identification numbers, and network resources.
Parallel Job - This can be defined as a single application (job) that has multiple processes that run concurrently . Generally each process will run on a different processor (workstation) and communicate boundary, or other data, between the processes at regular intervals. Typically a parallel job would utilise a message passing interface, such as MPI or PVM, to pass data between the processes.

HTML version of Basic Foils prepared May 7 1996

Foil 20 Some Terminology - 8

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Process - Fundamental entity of the software implementation on a computer system. A process is a sequentially executing piece of code that runs on one processing unit of the system.
Queuing - Method by which jobs are ordered to access some computer resource. Typically the batch manager will place a job the queue. A particular compute resource could possibly have more than one queue, for example queues could be set up for sequential and parallel jobs or short and long job runs.
Sequential Job - Defined as a job that does not pass data to remote processes. Typically such a job would run on a single workstation - it is possible that for a sequential process to spawn multiple threads on its processor.

HTML version of Basic Foils prepared May 7 1996

Foil 21 Some Terminology - 9

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Single Point of Failure - This is where one part of a system will make the whole system fail. In cluster computing this is typically the batch manager, which if it fails the compute resource are no longer accessible by users.

HTML version of Basic Foils prepared May 7 1996

Foil 22 Cluster Software and Its Interaction With the Operating System

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Outside Kernel - Works completely outside the kernel and on top of a machines existing operating system. Installation does not require modification of the kernel - similar to other software packages.
Within Kernel - Other cluster environments use:
  • - Micro-based kernels with customised services. - Dynamic kernel module.
Installed instead or as a module within the existing kernel to supportthe desired environment. Necessary to support functionality, such asvirtual shared memory.
Difficult to achieve efficiently outside the kernel.

HTML version of Basic Foils prepared May 7 1996

Foil 23 Cluster Software and Its Interaction With the Operating System

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Combinations
Some packages, such as BSP and PVM can exist in two forms:
- The PD versions of these packages are installed on top of an
existing operating system.
- Vendor versions of the software is often integrated into the kernel
to optimise performance.

HTML version of Basic Foils prepared May 7 1996

Foil 24 The Workings of Typical Cluster Management Software - 1

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Step 1 - Job Description File
Produce some type of resource description file.
This file is generally an ASCII text file (produced using a normal text editor or with the aid of a GUI) which contains a set of keywords to be interpreted by the CMS.
The nature and number of keywords available depends on the CMS package, but will at least include the job name, the maximum runtime and the desired platform.

HTML version of Basic Foils prepared May 7 1996

Foil 25 The Workings of Typical Cluster Management Software - 2

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Step 2 - Submit Job
Once completed, the job description file is sent by the client software resident on the user's workstation, to a master scheduler.
The Master Schedular
The master scheduler is the part of the CMS that has an overall view of the cluster resources available

HTML version of Basic Foils prepared May 7 1996

Foil 26 The Workings of Typical Cluster Management Software - 3

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
On each of the resource workstation daemons are present that
communicate their state at regular intervals to the master
scheduler.
One of the tasks of the master scheduler is to evenly balance the
load on the resources that it is managing.
So, when a new job is submitted it not only has to match the
requested resources with those that are available, but also needs to
ensure that the resources being used are load balanced.

HTML version of Basic Foils prepared May 7 1996

Foil 27 The Workings of Typical Cluster Management Software - 4

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Multiple Queues
Typically a batch system will have multiple queues, each being
appropriate for a different type of jobs.
For example
- Homogeneous cluster which is primarily used to serviceparallel jobs, - Powerful server for CPU intensive jobs, - Jobs that need a rapid turnaround.
The number of possible queue configurations is large and will
depend on the typical throughput of jobs on the system being used.

HTML version of Basic Foils prepared May 7 1996

Foil 28 The Workings of Typical Cluster Management Software - 5

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Fault Tolerance
The master scheduler is also tasked with the responsibility of
ensuring that jobs complete successfully.
It does this by monitoring jobs until they successfully finish.
If a job fails, due to problems other than an application runtime
error, it will reschedule the job to run again.

HTML version of Basic Foils prepared May 7 1996

Foil 29 The Workings of Typical Cluster Management Software

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index

HTML version of Basic Foils prepared May 7 1996

Foil 30 Special Note - The Ownership Hurdle.

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Generally a workstation will be "owned" by, for example, an individual, a group, a department, or an organisation.
They are dedicated to the exclusive use by the "owners".
This ownership often brings problems when attempting to form a cluster of workstations.
Typically, there are three types of "owner":
Ones who use their workstations for sending and receiving mail or preparing papers, such as administrative staff, librarian, theoreticians, etc.

HTML version of Basic Foils prepared May 7 1996

Foil 31 Special Note - The Ownership Hurdle.

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Ones involved in software development, where the usage of the workstation revolves around the edit, compile, debug and test cycle.
Ones involved with running large numbers of simulations often requiring powerful computational resources.
It is the latter type of "owner" that needs additional compute
resources and it is possible to fulfil their needs by fully utilising
spare CPU cycles from former two "owners".
However, this may be easier said than done and often requires
delicate negotiation to become reality.

HTML version of Basic Foils prepared May 7 1996

Foil 32 Cluster/Metacomputing Environments:

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A Systems Administrator Perspective...
Good documentation (on-line and manual)
Vendor support
Plug and Play Installation
Easy maintenance and support
Easy to manage, reconfigure and administrate
Truly heterogeneous platforms support (OS & Platforms)
Security
Statistics
No Single point of Failure
Fault tolerant

HTML version of Basic Foils prepared May 7 1996

Foil 33 Cluster/Metacomputing Environments:

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A Users Perspective...
Good on-line Documentation
User support
Easy to use GUI
Ease of submitting, monitoring and controlling jobs
Support for all programming paradigms
Job statistics
Batch and interactive usage
Fault tolerance
Checkpointing

HTML version of Basic Foils prepared May 7 1996

Foil 34 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Computing Environments Supported
Commercial/Research - Is it a commercial or a research product?
Platforms Supported - Heterogeneous platform support
Operating Systems - Heterogeneous operating systems support
Additional Hardware/Software - Is there any need for additional hardware or software to be able to run the CMS package?
  • - For example, additional diskspace or software such as AFS or DCE- It is assumed that software such as NIS and NFS is used.

HTML version of Basic Foils prepared May 7 1996

Foil 35 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Application support
Batch jobs - Are batch submissions of jobs supported?
Interactive Support - Are jobs that would normally be run interactively supported?
  • - For example, a debugging session or a job that requires usercommand-line input .
Parallel Support - Is there support for running parallel programs on the cluster ?

HTML version of Basic Foils prepared May 7 1996

Foil 36 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Application support
Queue Type - Are multiple, configurable, queues supported?
- This feature is necessary for managing large multi-vendorclusters where jobs ranging from short interactive sessions tocompute intensive parallel applications need to run.

HTML version of Basic Foils prepared May 7 1996

Foil 37 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Job Scheduling and Allocation Policy
Dispatching Policy- Is there a configurable dispatching policy,
  • - Does it allow for factors such as system load, resourcesavailable (CPU type, computational load, memory, disk-space),resources required, etc.
Impact on Workstation Owner- What is the impact on the owner of the workstation?
  • - It should be possible to configure the CMS to, for example,suspend jobs when an owner is using his/her workstation or setjobs to have a low priority (nice) value.

HTML version of Basic Foils prepared May 7 1996

Foil 38 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Job Scheduling and Allocation Policy

HTML version of Basic Foils prepared May 7 1996

Foil 39 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Job Scheduling and Allocation Policy
Impact on the Workstation - What is the impact of running the CMS package on a workstation?
- Obvious impact when a job is running, but there also may be anundesirable impact when a job is suspended, checkpointed ormigrated to another workstation.
EX - Process migration requires that a job saves its state andthen is physically moved over the local network to anotherworkstation.

HTML version of Basic Foils prepared May 7 1996

Foil 40 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Job Scheduling and Allocation Policy
This will impact on the workstation (CPU/memory anddiskspace) while the state is saved and then on the networkbandwidth when tens of Mbytes of data is transferred across thenetwork.
Load Balancing - The CMS should load balances the resources that it is managing.
  • - It is useful if the system administrator can customise the default configuration to suit the local conditions in the light of experience ofrunning the CMS.

HTML version of Basic Foils prepared May 7 1996

Foil 41 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Job Scheduling and Allocation Policy
Check Pointing - Saving a job's state at regular intervals during its execution. If the workstation fails then the job can be restarted at its last checkpointed position.
- Generally useful means of saving state, but can be costly interms of resources

HTML version of Basic Foils prepared May 7 1996

Foil 42 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Job Scheduling and Allocation Policy
Checkpointing Needs
Additional diskspace per workstation is needed.
Home filestore may be remotely mounted, this will have an impact on NFS performance and the network bandwidth.
Existing clusters will have not have the physical resources (local diskspace) to support checkpointing.

HTML version of Basic Foils prepared May 7 1996

Foil 43 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Job Scheduling and Allocation Policy
Process Migration - Migrating an executing processes from one workstation to another.
Useful for:- Minimising impact on workstation - process migrated whenowner takes back control of his/her workstation. - Here the job running on the workstation will, be suspendedfirst, and then migrated onto another workstation after a certaintime interval.

HTML version of Basic Foils prepared May 7 1996

Foil 44 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Job Scheduling and Allocation Policy
- Load balancing the cluster - migrating from heavily loadedworkstations and running them on lightly loaded ones.- Potentially of becoming very complicated on anything otherthan sequential jobs.- Like checkpointing - can be a very useful feature.- Impact is similar to checkpointing, additional disadvantage isthat large state files are moved around the network. This canhave a serious impact on users of the network.

HTML version of Basic Foils prepared May 7 1996

Foil 45 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Job Scheduling and Allocation Policy
Job Monitoring and Rescheduling - Monitor that jobs are running and in the event of a job failure should reschedule job
Suspension/Resumption of Jobs
- This feature is particularly useful to minimise the impact of a jobs on the owner of a workstation, but may also be useful inthe event of a system or network wide problem.

HTML version of Basic Foils prepared May 7 1996

Foil 46 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Configurability
Resource Administration - Control over the resources available.
- The administrator should be able to, for example, control whohas access to what resources and also what resources are used(CPU load, diskspace, memory).
Job Runtime Limits - Enforce job runtime limits.
- Otherwise it will be difficult to fairly allocate resourcesamongst users.

HTML version of Basic Foils prepared May 7 1996

Foil 47 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Configurability
Forked Child Management - It is common for a job to fork child processes.

HTML version of Basic Foils prepared May 7 1996

Foil 48 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Configurability
Process Management - Configure the resources to be either shared or be exclusive to a given job.
- Efficient use of resources may require close control over thenumber of processes running on a workstation, it may even bedesirable to allow exclusive access to workstations by aparticular job. - It should also be possible to control the priority of jobs runningon a workstation to help load balancing (nice)and minimise theimpact of jobs on the owner of the workstation.

HTML version of Basic Foils prepared May 7 1996

Foil 49 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Configurability
Job Scheduling Control - User and/or administrator should be able to schedule when a job will be run.
GUI/Command-line - User interface do users/administrators have.
- The interface of a software package will often determine thepopularity of a package. In general, a Motif GUI is standard. - Dramatic increase in usage and popularity of the HTTPprotocol and the WWW, so a GUI based on this technologyseems likely to be a common standard in the future.

HTML version of Basic Foils prepared May 7 1996

Foil 50 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Configurability
Ease of Use - How easy and/or intuitive is it for users and administrators to use the CMS ?
User Allocation of Jobs - Can a user specify the resources that they require ?
  • - For example, the machine type, job length and diskspace.
User Job Status Query
  • - Find out if it is pending/running or perhaps how long before itcompletes.

HTML version of Basic Foils prepared May 7 1996

Foil 51 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Configurability
Job Statistics - Are statistics provided to the user and administrator about the jobs that have run?

HTML version of Basic Foils prepared May 7 1996

Foil 52 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Dynamics of Resources
Runtime Configuration - Reconfigure at runtime.
- Dynamically, at runtime, resources available, queues and otherconfigurable features of the CMS, i.e. it is not necessary restartthe CMS.
Dynamic Resource Pool
- Is it possible to add and withdraw resources (workstations)dynamically during runtime?

HTML version of Basic Foils prepared May 7 1996

Foil 53 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Dynamics of Resources
Single Point of Failure (SPF) - Is there one ?

HTML version of Basic Foils prepared May 7 1996

Foil 54 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Dynamics of Resources
Fault Tolerance - Is there fault tolerance built in ?
  • - For example, does it check that resources are available beforesubmitting jobs to them, and will it try to rerun a job after aworkstation has crashed.- The CMS should be able to guarantee that a job will complete. - If while a job is running the machine that it is running on failsthen the CMS notice that the machine/s are unavailable, butshould also reschedule the job as soon as feasible.

HTML version of Basic Foils prepared May 7 1996

Foil 55 Features and Functionality Desired in a CMS Package

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Dynamics of Resources
- Also, if a machine running a queue or CMS scheduler fails, theCMS should be able to recover and continue to run.- The real need for fault tolerance is determined by the level ofservice that is being provided by the cluster. However, faulttolerance is a useful feature in any system.
Security Issues
  • - The package should provide at least normal Unix security.- it is desirable that it takes advantage of NIS and other industrystandard packages.

HTML version of Basic Foils prepared May 7 1996

Foil 56 Summary of Desirable Cluster/Metacomputing Features

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Truly heterogeneous platforms support, across:
  • - OS and platforms
Good documentation (on-line and manual)
Vendor support
Plug and Play Installation
Batch and interactive usage
Easy to use GUI to package
Easy maintenance and support

HTML version of Basic Foils prepared May 7 1996

Foil 57 Summary of Desirable Cluster/Metacomputing Features

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Easy to manage, reconfigure and administrate
Ease of submitting, monitoring and control jobs
Support for all programming paradigm
Security
Statistics
No Single point of Failure
Fault tolerant
Checkpointing/Process-Migration

HTML version of Basic Foils prepared May 7 1996

Foil 58 Status of CMS Packages - Basic Problems

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
LAN-Based
Limited platform and operating system support - not truly heterogeneous
Do not support all programming paradigms
Load Balancing is generally naive
Single-points-of-failure
Limited-fault tolerance

HTML version of Basic Foils prepared May 7 1996

Foil 59 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
The Information Wide Area Year (I-WAY)
The I-WAY is an experimental high-performance network linking dozens of the country's fastest computers and advanced visualisation environments.
Based on ATM technology.
Supports both TCP/IP over ATM and direct ATM-oriented protocols.
Provides the wide-area high-performance backbone for various experimental networking activities at SC'95.
Built from a combination of existing networks and some additional connectivity and services provided by national service providers.

HTML version of Basic Foils prepared May 7 1996

Foil 60 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
The Information Wide Area Year (I-WAY)
The I-WAY is a testbed to prototype the following:
Teraflop- class wide area computing:
- Nodes consist of the top supercomputing sites, with acombined peak computing power approaching a teraflop. - Work is under way to make this distributed environmentbehave as one facility.

HTML version of Basic Foils prepared May 7 1996

Foil 61 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
The Information Wide Area Year (I-WAY)
Close coupling of immersive virtual environments and supercomputing:
- Applications combine state- of-the-art interactiveenvironments and supercomputing.
An advanced application development resource:
- The I-WAY is envisioned as a resource for advancedapplication development and demonstrations.

HTML version of Basic Foils prepared May 7 1996

Foil 62 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
The I-WAY is a testbed
Testbed to identify future network research issues:
- Goal to uncover the areas requiring further study anddevelopment. - Highlight security mechanisms for wide-area computing,- Advanced end-to-end network management,- Mapping of infrastructure to emerging applicationenvironments,- Mapping of applications to emerging infrastructureenvironments.

HTML version of Basic Foils prepared May 7 1996

Foil 63 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Sites - Level 0
Argonne National Laboratory (ANL)
ARPA HPC Enterprise (ARPA)
California Institute of Technology (CIT)
Cornell Theory Center (CTC)
Lawrence Livermore National Laboratory/NERSC (LLNL)
Los Alamos National Laboratory (LANL)
NASA Goddard Space Flight Center (GSFC)
National Center for Supercomputing Applications (NCSA)
Pittsburgh Supercomputing Center (PSC)
San Diego Supercomputer Center (SDSC)
University of Illinois/EVL (EVL)

HTML version of Basic Foils prepared May 7 1996

Foil 64 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Sites - Level 1
Georgia Institute of Technology (GAT),
Lawrence Berkeley Laboratory (LBL)
Lockheed Martin Missiles & Space Co. (LMSC)
National Center for Atmospheric Research (NCAR)
Naval Command, Control and Ocean Surveillance Center (NCCOSC)
Naval Oceanographic Office
Naval Research Laboratory (Washington DC) (NRLW)
Oak Ridge National Laboratory (ORNL)
Pacific Northwest Laboratory (PNL)
Sandia National Laboratory (SNL)
University of Maryland (UMD)
University of Minnesota (UMN)
University of Virginia
University of Wisconsin (UWI)

HTML version of Basic Foils prepared May 7 1996

Foil 65 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
The Information Wide Area Year (I-WAY)

HTML version of Basic Foils prepared May 7 1996

Foil 66 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
The Information Wide Area Year (I-WAY)
How Long Will I-WAY Last?
I-WAY began as a project to support Supercomputing 1995, but...
Phase I, from now until January 1, 1996
Phase II, Jan 1, 1996 - Jan 1, 1997
Phase III ?

HTML version of Basic Foils prepared May 7 1996

Foil 67 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
The Information Wide Area Year (I-WAY)
Application Software
This software is intended to provide a functional and uniform environment across different I-WAY systems. It comprises:
- Single node software: compilers, shells, editors, etc. - Communication libraries - Parallel languages - Scalable Unix tools - Performance tools - Graphics libraries

HTML version of Basic Foils prepared May 7 1996

Foil 68 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
The Information Wide Area Year (I-WAY)
URL http://www.iway.org/

HTML version of Basic Foils prepared May 7 1996

Foil 69 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WAMM (Wide Area Metacomputer Manager)
WAMM is a graphical tool, built on top of PVM.
- It provides user with a GUI to assist in tasks such as: host add,check, removal, process management, compilation on remotehosts, remote commands execution.

HTML version of Basic Foils prepared May 7 1996

Foil 70 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WAMM (Wide Area Metacomputer Manager)
Sites Involved (Italy)
CINCECA - Interuniversity Consortium of Northeast Italy for Automatic Comp - Bologna
CASPUR - University and Research Consortium for Supercomputing Apps - Rome
CRS4 - Centre for Advanced Studies, Research and Development - Sardinia
CNUCE - institute of the Italian National Research Council - Pisa
ScuolaNormale Superiore - Pisa
Connection - Networked by GARR, the Italian research network - 2 Mbps

HTML version of Basic Foils prepared May 7 1996

Foil 71 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WAMM (Wide Area Metacomputer Manager)

HTML version of Basic Foils prepared May 7 1996

Foil 72 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WAMM (Wide Area Metacomputer Manager) - GUI
All functions are accessible via menus and buttons.
Geographical View of the System
Hosts are grouped following a tree structure.
  • - The root node, corresponding to a WAN, can containMAN and LAN. - MANs can contain only LANs and LANs contain the hostscomposing the Metacomputer. - WAN and MANs are usually represented by geographicalmaps, in order to facilitate user exploration of the resources.

HTML version of Basic Foils prepared May 7 1996

Foil 73 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WAMM (Wide Area Metacomputer Manager) - WAMM Tree

HTML version of Basic Foils prepared May 7 1996

Foil 74 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WAMM (Wide Area Metacomputer Manager)
Remote Command Execution
UNIX commands (e.g. ls, uptime, who, etc.) as well as X11 programs
(e.g. xload, xterm, etc.) can be executed on remote hosts.
WAMM takes care of showing command output (for UNIX ones) and
windows (for X11 ones) on the user's display.

HTML version of Basic Foils prepared May 7 1996

Foil 75 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WAMM (Wide Area Metacomputer Manager)
Remote Compilation
Compilation of modules on remote nodes is greatly simplified.
The user selects a group of hosts to compile onto and a set of
source files to be compiled.
WAMM copies sources on remote nodes, compiles them in parallel
and shows progress in separate windows, one for each host.

HTML version of Basic Foils prepared May 7 1996

Foil 76 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WAMM (Wide Area Metacomputer Manager)
Configuration
The Metacomputer configuration is specified through an external
file, written in a simple declarative language.
Number and grouping of hosts, remote commands for each node,
icons can be specified.
Graphical aspect (colours, fonts, etc.) can be customised via
standard X11 resource files.

HTML version of Basic Foils prepared May 7 1996

Foil 77 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WAMM (Wide Area Metacomputer Manager)

HTML version of Basic Foils prepared May 7 1996

Foil 78 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WAMM (Wide Area Metacomputer Manager)
Software Requirements
- PVM version 3.3 or higher
- X11 Release 5 or higher
- Motif version 1.2 or higher
- XPM version 3.4 or higher

HTML version of Basic Foils prepared May 7 1996

Foil 79 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WAMM (Wide Area Metacomputer Manager)
Supported Platforms - WAMM was developed and tested on:
- HP9000/700 running HP-UX 9.01 - Sun SparcStation 2 running SunOs 4.1.3 - IBM RISC/6000 running AIX 3.2 - IBM SP2 running AIX 3.2
WAMM has also been compiled (but not sufficiently tested) on :
- IBM RISC/6000 running AIX 4.1 - Silicon Graphics Indigo2 running IRIX 5.3 - DEC AlphaStation running OSF/1.3.2

HTML version of Basic Foils prepared May 7 1996

Foil 80 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WAMM (Wide Area Metacomputer Manager)
URL http://miles.cnuce.cnr.it/pp/wamm/

HTML version of Basic Foils prepared May 7 1996

Foil 81 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
NSF Centers Form National MetaCenter for Computational Science
and Engineering
NCSA's View
MetaCenter, n.: a coalescence of intellectual and physical resources
unlimited by geographical constraint; a synthesis of individual
centers that by combining resources creates a new resource
greater than the sum of its parts.

HTML version of Basic Foils prepared May 7 1996

Foil 82 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
NSF Metacenter
Objective
Based on the concept of distributed heterogeneous computing, or
Metacomputing, the MetaCenter provides scientists and engineers
the capability to move portions of their problems directly to
appropriate computer architectures without regard for where the
computers are located
Enlarge the research base and by facilitating collaboration among
researchers no what their physical location is. in the world.

HTML version of Basic Foils prepared May 7 1996

Foil 83 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
NSF Metacenter
Resources
Through heterogeneous networking technology, interactive
communication makes it possible from the desktops of individuals
and groups of scientists and engineer;
Research environment no longer needs to be a single lab., but will
invoke distributed intelligence and machinery, seamlessly
networked together.
Combine expertise and talents of the individual Centers' staffs and
focusing them on collaborative projects.

HTML version of Basic Foils prepared May 7 1996

Foil 84 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
NSF Metacenter
Resources
Co-operation will create an environment where not only academic
users, but industrial scientists and engineers can evaluate a greater
variety of systems.
Of particular importance to industry will be the ability to assess the
advantages and disadvantages of the combined high performance
computing resources offered in the MetaCenter at a far lower risk and
cost than a company would assume acquiring systems and building
expertise on their own.

HTML version of Basic Foils prepared May 7 1996

Foil 85 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
NSF Metacenter
MetaCenter participants
- Cornell Theory Center
- National Center for Atmospheric Research
- National Center for Supercomputing Applications (University of Illinois)
- Pittsburgh Supercomputing Center
- San Diego Supercomputer Center

HTML version of Basic Foils prepared May 7 1996

Foil 86 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
NSF Metacenter
MetaCenter Regional Alliance participants
- California Institute of Technology-Los Angeles Regional Gigabit Environment
- University of Illinois at Chicago
- Rice University- Center for Research on Parallel Computation
- MCNC/North Carolina Supercomputing Center
- Ohio Supercomputer Center
- Arctic Region Supercomputing Center
- PhAROH Metacenter Web Server

HTML version of Basic Foils prepared May 7 1996

Foil 87 Metacomputing - Related Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
NSF Metacenter
URL http://www.sdsc.edu/SDSC/Metacenter/MetaCenterHome.html

HTML version of Basic Foils prepared May 7 1996

Foil 88 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WWW/CGI Computing
World Wide Web is now the most promising candidate for the universal access core component of the NII.
Current Web is ~15,000 servers and expands at the rate of ~1 new server/hour.
Software industry starts adding value (Netscape, Netsite, Mosaic licenses, HotMetal, Netforce, Web support in OS/2 Warp and Windows95)

HTML version of Basic Foils prepared May 7 1996

Foil 89 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WWW/CGI Computing
So far, Web was mainly used for static hypermedia such as local information pages, digital libraries, Internet directories etc. However, the WWW model offers also extension mechanisms (CGI, CCI) towards dynamic services and in fact arbitrary computation
Early interactive Web services appearing. Examples include: WebCalc (NASA Goddard), Easy HTML (NCSA), WebChat (Internet Society), Virtual Doors (Unique, Inc.), Visioneering's Imaging Machine (VRL,Inc.)

HTML version of Basic Foils prepared May 7 1996

Foil 90 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WWW/CGI Computing - Web Technology Overview
Browsers have SAME interface on ALL Computers - Clients (such as Mosaic and Netscape) support browsing of hyperlinked documents but have no internal interactive/compute capability
Servers read HTTP and deliver requested service to client

HTML version of Basic Foils prepared May 7 1996

Foil 91 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WWW/CGI Computing - Web Technology Overview
PERL -- a rapid prototyping language(script) aimed at text and file manipulation - CGI Programs are typically written in PERL but an be essentially ANY UNIX Process and so do simulation, database access, advanced document processing etc.
Web Search engines such as YAHOO, HARVEST, WAIS -- early distributed database access technology supporting search and indexing
net.Thread, WebTools, RealAudio are early Web Interactive services

HTML version of Basic Foils prepared May 7 1996

Foil 92 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WWW/CGI Computing - Key points in Web Technology
Characteristics
Current main components: HTTP; HTML; CGI; Fillout Form
Client-server communication model - (Flat hierarchical UNIX) File system as the major file (data) management system

HTML version of Basic Foils prepared May 7 1996

Foil 93 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WWW/CGI Computing - Key points in Web Technology
Strengths
Established Internet as the major vehicle in networking industry
Universal, hyperlinked information access and dissemination
Transparent networking navigation and GUI with multimedia information access for information - dissemination--- a killer networking application

HTML version of Basic Foils prepared May 7 1996

Foil 94 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WWW/CGI Computing - Key points in Web Technology
Weaknesses
Static, browser-oriented client
Document update done manually, hard to automate
Flat UNIX file system supports only primitive information system functions such as open, read/write and close.

HTML version of Basic Foils prepared May 7 1996

Foil 95 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WWW/CGI Computing - Some Technologies to be Integrated into Web
ATM, ISDN, Wireless, Satellite will be hybrid physical implementation of NII
CORBA, Opendoc, OLE, SGML, Hytime are critical file and document standards
High Performance Multimedia servers to enable digital information delivery on demand
Data transport from MPI/PVM etc.

HTML version of Basic Foils prepared May 7 1996

Foil 96 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
WWW/CGI Computing - Some Technologies to be Integrated into the Web
Windows95/NT -- the last of the the non social (Web) operating
systems - will follow dinosaurs (IBM mainframes) into extinction
except as WebServer/Client platforms with only base operating
system services
Personal Digital Assistants -- WebNewtons done right - Learn from
Telescript (agent based communication) and Magic Cap operating
system

HTML version of Basic Foils prepared May 7 1996

Foil 97 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A WWW Based Computing Project Undertaken at NPAC
In collaboration with Boston University and Cooperating Systems NPAC has been developing concepts and prototypes of "Compute-Webs" over the last year.
This work is partly motivated by the integration of information processing and computation for both a better programming environment and for a natural support of data intensive computing.
Further the Web itself represents the largest available computer with worldwide some 20 million potential nodes which is expected to grow by a factor of 10 as the Information Superhighway is deployed fully.

HTML version of Basic Foils prepared May 7 1996

Foil 98 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A WWW Based Computing Project Undertaken at NPAC
Our first prototype was built on compute-extended Web Servers using the standard CGI mechanism and applied successfully to the factorisation of the RSA 130 decimal digit number using the latest sieving algorithm which was distributed to a net of Web servers and clients in a load balanced fault tolerant fashion.
This work was presented at the Supercomputing 95 and was given the award as the most geographically dispersed and heterogeneous metacomputing solution in the Teraflop Challenge contest.

HTML version of Basic Foils prepared May 7 1996

Foil 99 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A WWW Based Computing Project Undertaken at NPAC
RSA Factoring Challenge - Introduction
Public-key cryptosystem for both encryption and authentication; it was invented in 1977 by Rivest-Shamir-Adleman (RSA).
RSA is a public key cryptosystem, a cryptosystem where each party has two keys: a public key and a corresponding secret key.
The public key is made public, the secret key is kept secret.

HTML version of Basic Foils prepared May 7 1996

Foil 100 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A WWW Based Computing Project Undertaken at NPAC - RSA Factoring
Anyone can encrypt a message using the public key that belongs to the intended recipient, but only parties that know the corresponding secret key can decrypt the encrypted message.
The secrecy of the secret key, and therefore the security of the public keycryptosystem, depends on the fact that it is computationally infeasible to derive the secret key from the public key.
If that would be easy, anyone would be able to decrypt intercepted messages; if that is impossible, then the system is secure.

HTML version of Basic Foils prepared May 7 1996

Foil 101 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A WWW Based Computing Project Undertaken at NPAC - RSA Factoring
In the RSA public key cryptosystem the secret key can be derived from the public key,
If one is able to find the factorisation of a number that is part of the public key. Thus, the security of RSA depends on the difficulty of factoring.
Since factoring large numbers is believed to be hard - RSA is believed to be secure.
RSA interested in factoring to be able to evaluate the security of RSA implementations: how large should the numbers so that RSA becomes impossible to break?

HTML version of Basic Foils prepared May 7 1996

Foil 102 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A WWW Based Computing Project Undertaken at NPAC - RSA Factoring
In the early eighties some people thought that 100-digit numbers would offer enough security; 100-digit numbers can now routinely be factored.
In the August 1977 issue of Scientific American the inventors of posed the129-digit RSA challenge, and predicted that it would take 40 quadrillion years to factor the challenge; it was factored in April 1994 after 8 months on the Internet.
Right now many people are still protecting their data and money using 155- digit (i.e., 512-bit) numbers.

HTML version of Basic Foils prepared May 7 1996

Foil 103 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A WWW Based Computing Project Undertaken at NPAC - RSA Factoring
The progress in factoring is due both to better factoring methods and to more and faster hardware.
The www-factoring project is the first large scale project that makes use of a new and faster factoring method: the Number Field Sieve (NFS).
First goal is to factor a 130-digit number (known as RSA-130, part of the RSA- factoring challenge). After that we go for bigger numbers, with the ultimate goal to evaluate how hard it would be to break a 155-digit number.

HTML version of Basic Foils prepared May 7 1996

Foil 104 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A WWW Based Computing Project Undertaken at NPAC - RSA Factoring
RSA Factoring Components - FAFNER
FAFNER is a collection of Perl scripts, HTML pages, and associated documentation which together comprise the "server-side" of the Web factoring effort.
The FAFNER software doesn't actually make any progress towards factoring RSA130; rather, it provides interactive registration, task assignment, and solution database services to sieving clients.

HTML version of Basic Foils prepared May 7 1996

Foil 105 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A WWW Based Computing Project Undertaken at NPAC - RSA Factoring
GNFS (General Number Field Sieve)
The GNFS client package implements the sieving algorithms that converts a task specification into a set of useful results (called "relations").
GNFS is implemented in C, and has been ported to most Unix environments.
GNFS performs relatively little I/O, does not use the network, and has large (but configurable and constant for an entire run) memory requirements.

HTML version of Basic Foils prepared May 7 1996

Foil 106 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A WWW Based Computing Project Undertaken at NPAC - RSA Factoring
This makes it a good code to run on idle workstations, because it is almost entirely CPU-bound. GNFS is the original sieving client, but it executes exactly one task specified on its command line, and is not network-aware.
GNFSD (General Number Field Sieving Daemon)
GNFSD is an augmented sieving client (also written in C) that allows a GNFS process to interact with a "task server" over the net, rather than requiring task specification on the GNFS command line.
Other key features are automatic failure detection and restart via a watchdog timer, persistent configuration state, and a TCP/IP monitor interface at port 5453.

HTML version of Basic Foils prepared May 7 1996

Foil 107 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A WWW Based Computing Project Undertaken at NPAC - RSA Factoring
How It All Fits Together
The FAFNER servers are hierarchical; there is a root server, plus several major subservers. Each of these in turn has subservers, and so forth.
A FAFNER subserver depends on its parent for sieving tasks. As they pass from server to server, they are broken up into smaller and smaller bites. By the time they get to clients, there may be as few as 100 Q-values per task.
The sieving clients (GNFS or GNFSD processes) are the leaves of the FAFNER tree; they get a single task from a FAFNER server, and then spend anywhere from 15 minutes to several days computing the problem.

HTML version of Basic Foils prepared May 7 1996

Foil 108 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A WWW Based Computing Project Undertaken at NPAC - RSA Factoring
When the answers are ready (in the form of a text file containing a few 100 or few 1000 relations), the clients send them back to their FAFNER server.
There, they are distilled, archived, and ultimately sent back to Bellcore, where they are integrated into the final solution -- the factoring of RSA130.

HTML version of Basic Foils prepared May 7 1996

Foil 109 Near and Future Projects

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A WWW Based Computing Project Undertaken at NPAC - RSA Factoring
The Problem with Fafner
We found that a major problem with our CGI enhanced Web servers that supported RSA130 factoring, was that they did not provide the standard support which one expects from clustered computing packages.
Such as load balancing, fault tolerance, process management, automatic minimisation of job impact on user workstations, security, and accounting support.

HTML version of Basic Foils prepared May 7 1996

Foil 110 Near and Future Projects - MetaWeb

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
A Scalable Metacomputer and Cluster Management Package
The overall goal of this project is to design, develop and implement a WWW-based Metacomputer management package - MetaWeb.
Project will build on existing knowledge and experiences with the management of LAN-based computing clusters to produce a software package capable of managing a potentially globally distributed Metacomputer.

HTML version of Basic Foils prepared May 7 1996

Foil 111 Near and Future Projects - MetaWeb

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
The primary objective of MetaWeb is to increase the through-put of user applications by utilising the wealth of existing networked computing resources efficiently and effectively together. A side product of this objective is the encouragement of individuals, groups and organisations to collaborate in setting up and utilising MetaWeb to build a global Metacomputer.
MetaWeb will be a truly heterogeneous package capable of managing resources ranging from personal computers running Windows 95 or NT through to vector/MPP supercomputers - this capability will be based on the use of pervasive WWW software such as HTML, HTTP and Java.

HTML version of Basic Foils prepared May 7 1996

Foil 112 Near and Future Projects - MetaWeb

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
MetaWeb will be a truly heterogeneous package capable of managing resources ranging from personal computers running Windows 95 or NT through to vector/MPP supercomputers - this capability will be based on the use of pervasive WWW software such as HTML, HTTP and Java.
MetaWeb will be designed to be fully fault tolerant. Not only will it be able to reboot itself and retain its previous status but also be able to resume or restart failed application jobs. This ability is enabled by the fully duplicated design of MetaWeb and also by the use of a persistent database to maintain the Metacomputer's current status

HTML version of Basic Foils prepared May 7 1996

Foil 113 Near and Future Projects - MetaWeb

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
MetaWeb will be prototyped using existing WWW-based technologies such as Perl CGI-scripts, C-modules and HTTPD servers. This prototyping phase of the project will allow the MetaWeb design be proven and adapted, if necessary. Thus ensuring that the implemented version of the package will functional, robust and work as it is intended to.
MetaWeb will replace the need to use existing research and commercial cluster management packages, such as Codine, LoadLeveler, DQS, etc, by exploiting emerging technologies and the ubiquitous nature of WWW. MetaWeb will exhibit all the best features of the existing management packages but will have the advantage of being specifically designed and developed with all the latest and emerging WWW technologies at hand.

HTML version of Basic Foils prepared May 7 1996

Foil 114 Near and Future Projects - MetaWeb

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index

HTML version of Basic Foils prepared May 7 1996

Foil 115 Near and Future Projects - MetaWeb

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index

HTML version of Basic Foils prepared May 7 1996

Foil 116 Metacomputing in the Future

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Long term is hard to predict - See changes over last 5 Years!!
Can see trends, however...

HTML version of Basic Foils prepared May 7 1996

Foil 117 Metacomputing in the Future

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Hardware Trends (5 - 10 Years) - Computers
Millions (100 - 300) of "settop" boxes
One in every US household
More worldwide
Ranging from Supercomputer to Personal Digital Assistants

HTML version of Basic Foils prepared May 7 1996

Foil 118 Metacomputing in the Future

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Hardware Trends (5 - 10 Years) - Networks
Networks (1 - 20 MBytes/s) - fulfil needs of "home" entertainment industry.
Technologies ranging from high-bandwidth fibre to Electro-magnetic types such as Microwave.

HTML version of Basic Foils prepared May 7 1996

Foil 119 Metacomputing in the Future

From MetaComputing -- MRA Meeting Part II:The Practical Issues Tutorial for CRPC MRA Meeting at Cornell -- May 7 1996. *
Full HTML Index
Hardware Trends (5 - 10 Years) - Software
Very Hard to Predict in the relatively short term - JAVA has been
around for about a year !!
Ubiquitous and pervasive (WWW/JAVA-like).
Can forget about underlying hardware and operating system.
Metacomputing "plug-ins"
Micro-kernel-like JAVA based servers with add-on services that can support Metacomputing (load balancing, migration, checkpointing, etc...)

Northeast Parallel Architectures Center, Syracuse University, npac@npac.syr.edu

If you have any comments about this server, send e-mail to webmaster@npac.syr.edu.

Page produced by wwwfoil on Sun Apr 11 1999