Meeting Notes: 2nd Datorr Workshop

Sandia National Laboratory, Albuquerque, NM, February 15-16, 1999

http://www-fp.mcs.anl.gov/~gregor/datorr/

Please report corrections to gregor@mcs.anl.gov


Introduction

On February 15th and 16th, 1999 the 2nd Datorr meeting took place at Sandia National Laboratory. Judy Beringer of Sandia National, and Gregor von Laszewski of Argonne National Laboratory organized the meeting. The technical preparation for the meeting was steered by Geoffrey C. Fox, Dennis Gannon, Piyush Mehotra, and Gregor von Laszewski. The meeting had over 25 participants. Each of the participants contributed considerably during the workshop. The presence of Legion, Ninja, Unicore, Netsolve, Hotpage, and other projects was very valuable for considering potentially different views.

The meeting contained two parts:  presentations and discussions. The short presentations (30 minutes each) were a basis for some of the discussions in the two working groups. A detailed agenda of the meeting can be found in the Call For Participation: http://www-fp.mcs.anl.gov/~gregor/datorr/sandia/datorr_cfp.html.

The meeting was started with a presentation by Geoffrey C. Fox, summarizing the previous meetings and activities (compared the working notes of the 1st Datorr Meeting). Currently about 47 relevant projects have been identified from which 32 are on WWW, at SC98 we had identified 18 projects. During an initial discussion concerns were voiced about the expense and effort associated with adhering to a standard interface.

After the personal introductions by the Workshop participants, the meeting started with the first presentations. The following Datorr related presentations have been made:

The following excellent Datorr related presentations have been made:

An additional presentation, outside of the Datorr main objectives, was given by John Mitchiner from Sandia, the hosting institute.

Working Groups

The working groups at this meeting were defined in such a way that they are (a) not too controversial so we could achieve some results by the end of the meeting, (b) solid enough to build a foundation for future Datorr activities.

The following topics for working groups were suggested at the beginning of the meeting:

We determined in a short discussion that in order to make progress on the last two questions, one has to first address the definition of tasks and remote resources. Thus two working groups were formed to concentrate on these topics. In the following section we summarize the discussions of the two working groups that are from now on referred to as,

Working Group: Remote Resources   

This working group was lead by Geoffrey Fox in lieu of Dennis Gannon.

Taxonomy Overview

The first task of this working group was to identify Sources of Remote Resource Taxonomies. The following list identifies projects which use, in some form or another, a "model" of remote computing resources:

We are convinced that this list is not complete, but gives a valuable start on comparing the taxonomies/models and deriving a common subset. During the workshop we identified the following follow-up tasks:

[Task: analyze the above projects and present their taxonomies]

[Task: identify more tasks and add them]

Compute Resources

A quick survey identified some resources, which should be covered by the resulting Datorr definition of compute resources. This list has to be revisited in follow-up discussions to the Datorr meeting:

Generally, it was agreed that a service implies a resource and vice versa. Certain resources are bound to a particular user context. For example, we discussed a "bank account balance" which can be viewed as a resource from the user perspective, but which also implies a service in order to access this resource.

The printers have been included in this list primarily in order to remind us to take a look at the Jini-Printer definition. Visualization devices and services have been classified as very difficult to be captured and no further discussions on this issue have been held during the meeting. We determined that it will be impossible to describe all available resources.

Thus, in order to focus this effort, some resources should be immediately ignored; and only those in the framework of a meta-computing and problem-solving environment should be considered. In cases where a standard is already defined, it would be advantageous to reuse this standard and translate the specification in a form usable for Datorr.

Since the time was too limited in the meeting to address each of the items listed above in more detail, we quickly determined that discussions pertaining to these items should be postponed.

[Task: complete the list and analyze details of each]

Choices and Issues

The next discussion of this working group focused on initial issues and choices, which should be encompassed in a future architecture design. As was pointed out before, the desire of an extensible framework for the definition of the remote resources was emphasized. Since the framework will be used to define resource management tools, it is important to realize the temporal and spatial components of resource management. This includes where and when resources are available to execute a particular job. Some keywords might encourage further discussion on selected topics:

Specification and Publication

At the datorr meeting it was decided to formulate the standard objects in XML. It was pointed out that a given resource will have multiple XML descriptions depending on the type of query issued (e.g. some queries need just high level descriptions of MPP; others need detailed hierarchical definitions). The need to specify queries can be fulfilled by the use of query languages, which are currently defined by the XML standards committee.

We agreed that it is useful to try to represent existing architectures as an XML specification. The list of architectures that we suggested, are as follows: Tera, T90, old SP, new SP, cluster of PC’s, WS’s. We identified NPACI/UCB as possible implementers of the examples. A question that was left unanswered is if the representation of a compute resource needs a correct representation of a memory hierarchy. In order not to get lost in the complexity of this task, it is important to identify the successes and failures of previous systems and projects.

One of the remaining tasks is to precisely determine the appropriate subset to be represented. We suggested identifying all objects that would allow submission to an abstract compute resource that is either a batch queue managing N supercomputers and/or an interactive submission.

The scheduling of the jobs should be steered with the help of additional performance data that we integrate into www.datorr.org and is based on the TOP500 computing list.

The Legion, Condor, and Globus representatives pointed out that it would be necessary to prove whether integration into the current Metacomputing systems is possible. We assume the same can be said for the other projects. Nevertheless, it was viewed as a difficult task that should be postponed.

Generally, the specification of objects should be published through www.datorr.org so that other projects can access this information. The scope of www.datorr.org is "computing" as the commodity market. To make this a viable approach it is necessary to prepare a statement on where www.datorr.org is different from mds.globus.org

Process Guideline Summary

The following items summarize the actions that have to be performed in order to develop a prototype implementation.

To make concrete progress for defining such a prototype, we determined to specify a given collection of computers. It should be a subset of the following types: IBM SP, NOW/CPLANT, Tera, Sun E10000, Origin 2000, T3E. The list must include multiprocessor nodes (include digital SMP’s), and node linkage has to be represented. The query of the "linpack" performance database is used to select appropriate compute resources. We also determined that it must build on the XML base infrastructure, supporting extensibility, multiple views, and a hierarchy. A registration service is needed to add resources to www.datorr.org and provide information for a lookup service.

We identified that it would take at least two people to define a prototype. As many participants as possible should perform the testing. Two researchers will be needed to clarify the general principles, which includes building XML base infrastructure, supporting extensibility, multiple views, and a hierarchy of objects (UCB was identified as potential candidate).

An immediate action item is to write a "letter" to Condor, Globus, Legion, UNICORE, Ninf, Ninja and Netsolve in order to find out what these projects have learned, e.g.:

The creation of www.datorr.org can be hosted by ANL. The design principles including searching, scaling services and principles are suggested to be performed by UCB and others to be determined.

A precise definition of the Datorr Project Description is under way by NPAC and ANL.

 

Deliverables

We proposed the following list of deliverables:

Early success possibilities are as follows:

What Prototypes Leave Out

Naturally, this prototype leaves out many issues that have to be addressed later. These issues are as follows:

Task working group

This working group was lead by Piyush Mehotra.

The second working group was asked to work on a definition for task objects. Task objects are used in various forms by the different projects participating in the Datorr workshops. The following section provides a summary of the working group discussions.

A task is a static representation of "what" is to be done. A Job is an instantiation of a task for execution. A task has a core set of attributes, is extensible, and can be recursively defined.

Based on the discussions, a simple task or basic task object contains the following:

A task object contains the following:

These preliminary definitions require verification by the community. In order to clarify this definition, existing projects have to further analyze the current concept of a task. For the prototype implementation the focus is placed on a simple task to start a remote process. The community should comment on the resulting prototype that is formulated in XML. The process of refining and exposing it to the community should be iterated. For a demonstration we suggest the building of a simple task matching service that allows one to execute a combination of tasks on a dynamically selected set of resources.

Action Items

Many projects define their own definition of a "task." We must focus on the simplest task definition in order to start a remote process. This includes the collection of attributes for a remote process that should be sent to Piyuish Mehotra (pm@icase.edu). The XML definition will be crafted by T. Tannenbaum and T. Haupt, P. Mehrotra; and is expected to be delivered at the end of March. Comments from users and system designers must be considered after its initial exposure in order to start the iteration process for refining the definitions. The Reference implementation should contain a simple-to-find, matching resource for a simple task; as well as the execution of the task on this resource.

Once this has been accomplished future tasks will build a simple GUI that allows the formulation of tasks and task dependencies. This will include the building of a task description that is handed to an executive service, finding a matching resource, a task description and find resources, and the commencement and monitoring of a task.

The last 2 points relate to a simple demo planned for SC99. The demo will be a datorr interface used with an existing system (Legion was suggested as one contender).

Status

The current activities of Datorr have been postponed, as there is a considerable overlap between other projects. Though it is a good sign for the usefulness of the Datorr activities, we have to first identify in which way Datorr contributes to these projects or how it is different from them. Projects that are considered for this discussion are the ASC Gateway Project, the DO2000 Schema definition project and the Gridforum.

Acknowledgment

I would thank Peter Lane for proofreading and improving the document.