Forwarded: Thu, 02 May 1996 20:08:41 -0400 Forwarded: leskiwd@npac.syr.edu Received: from ringworm.cs.UMD.EDU (ringworm.cs.umd.edu [128.8.128.41]) by postoffice.npac.syr.edu (8.7.5/8.7.1) with ESMTP id NAA28090 for ; Mon, 29 Apr 1996 13:56:17 -0400 (EDT) Received: by ringworm.cs.UMD.EDU (8.7.5/UMIACS-0.9/04-05-88) id NAA01017; Mon, 29 Apr 1996 13:56:15 -0400 (EDT) From: als@cs.UMD.EDU Date: Mon, 29 Apr 1996 13:56:15 -0400 (EDT) Message-Id: <199604291756.NAA01017@ringworm.cs.UMD.EDU> To: gcf@npac.syr.edu CC: saltz@cs.UMD.EDU Subject: HPCC and Java report Content-Type: text Content-Length: 12887 Here are the Maryland sections in HTML, including the headers from your version (mainly the old section numbers). Alan ----------
2.4 An Example -- Remote Sensing and Data Fusion [UMD]
[original text:2.5umd) Remote sensing data fusion]
There are many applications that are likely to require the generation of data products derived from multiple sensor databases. Such databases could include both static databases (such as databases that archive LANDSAT or AVHRR images) and databases that are periodically updated (like weather map servers).

Two examples of remote sensing defense applications are battle management and distributed simulation. A battle management application could take advantage of multiple sensing devices, both airborne and on satellites. Data from multiple remote sensing devices would be integrated to look for various untoward events, such as a collection of tanks on the move, aircraft taking off from an enemy runway, etc. A contemporary example would involve real time tracking of rocket attacks in northern Israel. One distributed simulation application would be to train soldiers in a hybrid environment, consisting of a combination of simulated and actual troops and equipment. For instance, a commander could command some number of real troops, tanks and planes, as well as many more simulated troops and pieces of equipment. Developing the technology to allow for such a hybrid training environment is already part of an important ARPA program that is well funded. At the last ARPA meeting, the claim was made that some form of hybrid battle training had been successfully tested in Europe.

The execution model for these applications is for a client to direct (High Performance) Java programs to run on the database servers. The Java programs would make use of the server's computational and data resources to carry out (at least) partial processing of data products. Portions of data processing that required the integration of data from several databases would be carried out on the client or via a Java program running on another computationally capable machine. The main question for such an model is whether the programs executing on the servers are solely Java programs, or Java programs that utilize the services of programs written in other languages (e.g. High Performance Fortran).

2.5 An Example -- Medical Interventional Informatics [UMD]
[original text:2.6umd) Interventional Epidemiology/Medical crisis management]

The first set of applications apply to medical emergencies associated with armed combat, terrorist actions, inadvertent releases of toxic chemicals, or major industrial plant explosions. In such scenarios, the goal is to rapidly acquire and analyze medical data as the crisis unfolds in order to determine how best to allocate scarce medical resources. The applications involve the discovery and integration of multiple sources of data that might be in different geographical locations. A doctor could, for instance, determine the best diagnosis and treatment for patients exposed to combinations of chemicals whose effects have not been well characterized. This class of scenarios is motivated by a recent ARPA-sponsored National Research Council workshop on crisis management.

Another set of applications involves discovering and exploiting clinical patterns in patient populations. These scenarios involve study of health records to determine risk factors associated with morbidity and mortality for specific subpopulations (e.g. various substrata of armed forces personnel serving in a particular region). In such scenarios, the goal is to provide high quality, low cost care by exploring diverse clinical data sources for patterns and correlations and by ensuring that significant data for individual patients is not overlooked. For instance, a system of this type could rapidly track the spread of serious infectious diseases such as Hepatitis B or HIV. It could also be used to identify specific soldiers whose demographic profile or past medical history might place them at particularly high risk of contracting the disease.

The systems software requirements for these medical scenarios should be very similar to the requirements motivated by the sensor data fusion scenario, namely employing a client program to direct Java programs to run on the servers for the relevant databases. The data mining aspects of these applications could also require the client to direct Java programs to run on computational servers, which may not be at the same site as the database servers.

3.4 Data Parallelism [UMD]
[original text:1.6tex) What should NOT be done]
[original text:1.6ind) Is Data Parallelism relevant for Java]

There are a variety of ways in which Java could be used to support data parallel computing. We assume that Java may be extended in one of a variety of ways. In the following, we are also regarding any closely coupled set of machines as a parallel architecture.

8.1. At Maryland -- Migrating Programs [UMD]
[original text:7.4.1umd) Mobile Programs]
Mobile Programs
The University of Maryland is pursuing a project designed to provide a single unified framework for remote access and remote execution in the form of a type of agent we call an itinerant program. Itinerant programs can execute on and move between the user's machine, the servers providing the datasets to be processed or third-party compute servers available on the network. The motion is not just client-server; it can also be between servers. Programs move either because the current platform does not have adequate resources or because the cost of computation on the current platform or at the current location in the network is too high. Various parts of a single program can be executed at different locations, depending upon cost and availability of resources as well as user direction.

The architecture also provides support for plaza servers. Plaza servers provide facilities for:

  1. execution of itinerant programs,
  2. storage of intermediate data,
  3. monitoring cost, usage and availability of local and remote resources on behalf of itinerant programs and alerting them if these quantities cross user-specified bounds,
  4. routing messages to and from itinerant programs, and
  5. value-added access to other servers available on the network.
Examples of value-added functionality include conversions to and from standard formats and operations on data in its native format. In addition to allowing cost and performance optimization, itinerant programs and plaza servers also provide a flexible framework for extending the interface of servers, in particular legacy servers with large volumes of data as well as servers that are not under the control of the user. The University of Maryland has a prototype system that supports plaza servers and itinerant programs. Program migration is currently carried out either in response to an external signal or when a potentially itinerant program invokes the necessary runtime support. We are currently in the process of adapting resource monitoring software so that we can demonstrate the ability to carry out program migration in response to fluctuations in processor and network load.

Coupling Data Parallel Programs
Maryland has developed prototype software to demonstrate methods designed to flexibly and efficiently couple multiple data parallel and sequential programs at runtime. The program coupling software is used to allow the coordinated use of multiple separately developed parallel programs in solving complex problems. The program coupling software will also be used to couple itinerant programs.

Maryland has developed a layer of runtime support (called Meta-Chaos) to establish mappings and to carry out communication between data structures in different sequential or data parallel programs. Efficient data movement is achieved by pre-computing an optimized communication schedule. Our approach is to define a set of functions that each runtime library should export and to build Meta-Chaos on top of this minimal API.

At the level above MetaChaos, interaction between programs occurs through a user-specified consistency model. Mappings are established at runtime and can be added and deleted while the programs being coupled are in execution. Mappings, or the identity of the processors involved, do not have to be known at compile-time or even link-time. A-priori knowledge of consistency requirements by a runtime library allows buffering of data as well as concurrent execution of the coupled applications.

Periodic Data and Computation Servers
The integration of program coupling software into itinerant programs will make it straightforward to develop long-running itinerant programs that process periodically generated data. We anticipate that a collection of itinerant programs will process sequences of data from multiple sources where each source may have a different periodicity. An itinerant program can either visit each of the data sources in order or it can install a surrogate at each data source, which is activated every time the dataset is updated. A surrogate acts as a filter for the sequence of data generated by the data source and communicates appropriate updates to the parent itinerant program (possibly after preliminary processing). For complex processing on a number of datasets, a hierarchy of surrogates and result combining programs can be created. The result of such itinerant programs can either be a single accumulated result or a sequence of results.