ATD: Abstract Task Descriptor

version 1.01, January 2000 author: T. Haupt, Syracuse University

Introduction

A computational task requested by the user may involve many steps. Some steps can be performed concurrently, but typically there are data dependencies that force execution of the steps in some particular order. In many cases, it might be convenient to divide a particular step into smaller, "atomic" operations.

The task descriptor is abstract in a sense that it may not describe all resources needed for completion of the task. The final resolution and actual resource allocation is left to discretion of middle-tier services.

Atomic task

An atomic task is described by its

Name of the task must be unique within the document. AD is a separate XML document that provides all necessary information on how to install and run the application, as well as input and output files. The task is submitted by invoking a method of the middle-tier proxy module. The method that can be used for submitting the task is called input port. Each task must define at least one input port. Upon completing the task (successful or not), the middle-tier proxy fires an event. Each event type signaling end of task processing is called output port. Each task must define at least one output port.

Example of an atomic task:
<Task>
     <TaskName name="task1" descriptor="task1.xml">
     <InputPort method="run" />
     <OutputPort event ="done">
</Task>

Building complex tasks from atomic tasks

Atomic tasks can be grouped together to form a complex task. Optionally, their dependency can be defined by creating a computational graph. The graph is constructed by connecting output and input ports of atomic tasks. This means that an event fired by one task (output port) will cause invocation of a method of the other task (input port).

As in the case of the atomic task, a complex task must define at least one input and one output ports. However, instead of specifying events and methods, input and output ports from the constituent tasks are used to define ports of the complex task.

Example of a complex task built from atomic tasks:

<Task>
    <TaskName name="ComplexTask" />
    <Task>
         <TaskName name="atomic_task1" descriptor="task1.xml" />
         <InputPort method="run" />
         <OutputPort event ="done" />
    </Task>

    <Task>
         <TaskName name="atomic_task2" descriptor="task2.xml" />
         <InputPort event="run" />
         <OutputPort method ="done" />
    </Task>

    <connection>
        <output task="task1" />
        <input task="task2" />
    </ connection>
    <InputPort task="atomic_task1" />
    <OutputPort task="atomic_task2" />
</Task>

In this example, event "done" fired by atomic_task1 will result in invoking method run of atomic_task2. The complex task can be submitted by invoking input port of atomic_task1 (bacause of  <InputPort task="atomic_task1" />), and event done of atomic_task2 task will signal completion of the complex task.

Hierarchy of tasks

Complex tasks can be grouped and connected to build an arbitralily deep hierarchy of tasks.

<Task>
<TaskName name="example_task">

<Task>
   <TaskName name="A">
    <Task>
          <TaskName="A1" descriptor="A1.xml" />
         <InputPort method="run" />
         <OutputPort event="done" />
    </Task>
   <Task>     
          <TaskName="A2" descriptor="A2.xml" />
          <InputPort method="run" />
          <OutputPort event="done" />
    </Task>
    <connection>
        <output task="A1" />
        <input task="A2" />
    </connection>
    <OutputPort application="A" event="done" />
</task>

<Task>
    <TaskName name="B" descriptor="B.xml">
    <InputPort method="run" />
    <OutputPort application="B" event="done" />
</Task>

<Task>
    <TaskName name="C" descriptor="C.xml" />
    <InputPort application="C" method="run" />
   <OutputPort application="C" event="done" />
</Task>

<Task>
    <TaskName name="D">
    <Task>
        <TaskName name="D1" descriptor="D1.xml" />
       <InputPort method="run" />
       <OutputPort event="done" />
    </Task>
    <Task> 
        <TaskName name="D2" descriptor="D2.xml" />
       <InputPort method="run" />
        <OutputPort event="done" />
   </Task>
     <connection>
        <output task="D1" />
        <input task="D2" />
    </connection>
    <InputPort task="D1" />
    <OutputPort task="D2" />
</Task>

<connection>
    <output task="A" />
    <output task="B" />
    <input task="C" />
</connection>

<connection>
    <output task="C" />
    <input task="D" />
</connection>

</Task>

More on connecting tasks

The example of a complex task above show follow a simple dataflow paradigm. Actually, the model presented here is more general. A proxy module representing an atomic task can define more than one input and output port. For example, the module can fire two types of events: one signaling a successful completion of the task, the other failure. Hence, a different action can be defined depending on the outcome of processing the task (a different task or a different method of the same task). Note, that submission of a task is more than submitting a job. It may involve selecting of host, file transfers, database access, mass storage access, compilation, setting environmet variables, generating batch scripts, generating Globus RSL strings, and more. Different methods of the proxy module may implement different procedures for preparing a job for submission and/or postprocessing - none of those require any modifications of the code to be run at the back end.

Connecting modules by matching events and methods allows also for constructing loops: completion of one task results in submission of the other untill some stopping criteria are satisfied (say, all input files are processed). If the back-end code is capable of setting flags at runtime, the flags can be used for generating custom events, which in turn can be used for ansynchronous communications, or even message passing, between concurrent tasks (lattency permitting).

At this time, we have no mechanisms of specifying high performance connections between tasks representing tightly coupled codes.

If the task defined more than one input port, there is a potential ambiguity when to submit the task: when at least one or all events are fired. We follow the conventions that all events defined within a single <connection> tag have AND relationship, while event defined in different <connection> tags are OR related.

Examples:

<connection>
    <output task="a" />
    <output task="b" />
    <input task="c" />
</connection>


in the above example both a AND b task must complete to trigger task c

<connection>
    <output task="a" />
    <input task="c" />
</connection>
<connection>
    <output task="b" />
    <input task="c" />
</connection>

while here completion of either a OR b will result in submitting task c

Since each task may define more than one input and output ports, <input> and <output> tags within <connection> tag have optional attributes, method and event, respectively to reslove possible ambiguities, as shown in the example below:

Example (multiple ports):

<Task>
    <TaskName name="ComplexTask" />
    <Task>
         <TaskName name="task1" descriptor="task1.xml" />
         <InputPort event="run" />
         <OutputPort method ="done" />
        <OutputPort method="failure" />
    </Task>

    <Task>
         <TaskName name="task2" descriptor="task2.xml" />
         <InputPort event="run" />
         <OutputPort method ="done" />
    </Task>
    <Task>
         <TaskName name="task3" descriptor="task3.xml" />
         <InputPort event="run" />
         <OutputPort method ="done" />
         <OutputPort method="restart" />
    </Task>

    <connection>
        <output task="task1" event="done" />
        <input task="task2" />
    </ connection>
    <connection>
        <output task="task1" event="failure" />
        <input task="task3" method="restart" />
    </ connection>
    <InputPort task="atomic_task1" />
    <OutputPort task="atomic_task2" />
</Task>

In this example, if task1 fires event "done" then method "run" of task2 is invoked. Otherwise, method "restart" of task3 is submitted.

ATD.dtd

<!ELEMENT Task (TaskName, (Task|connection)*, InputPort+, OutputPort+>
<!ELEMENT TaskName EMPTY>
<!ATTLIST TaskName
          name CDATA #REQUIRED
         descriptor CDATA #IMPLIED>
<!ELEMENT connection (output+,input+)>
<!ELEMENT output EMPTY>
<!ATTLIST output
task CDATA #REQUIRED
event CDATA #IMPLIED>
<!ELEMENT input EMPTY>
<!ATTLIST input
task CDATA #REQUIRED
method CDATA #IMPLIED>
<!ELEMENT InputPort EMPTY>
<!ATTLIST InputPort
          task CDATA #REQUIRED>
<!ELEMENT OutputPort EMPTY>
<!ATTLIST OutputPort
         task CDATA #REQUIRED>