DOGMA


<Pending:  Include an overall paper describing DOGMA>
<Pending:  This paper describes 1.0 functionality.  Possibly change for 0.9>

Introduction

DOGMA allows parallel applications to be developed and executed in a straightforward manner.  Utilizing the Java platform, DOGMA allows non-technical users to run parallel applications without the need to install any software on their local machine.  In addition, DOGMA allows application developers to publish parallel executable code on a webserver where it can then be executed by users anywhere in the world.
 

Figure 1: DOGMA Architecture

Architecture

DOGMA has a decentralized architecture divided into units known as "configurations".  Each configuration is managed by a "configuration manager" which is capable of linking to other configuration managers in a hierarchical fashion. Configuration managers may optionally allow other configuration managers to join them and use their local resources.  For instance in Figure 1 configuration manager A can utilize the resources of B but the reverse is not true as indicated by the unidirectional arrow.  However, configuration manager B and C can both utilize each other's resources as indicated by the bi-directional arrow.  Configuration managers periodically exchange information such as total number of nodes and total number of nodes.  This information then propagates through the hierarchy.

As an example, consider Figure 1.  An arrow from one configuration manager to another indicates that the source configuration manager may run remote jobs on the destination configuration manager.  Note that configuration managers never see the global view.  So from configuration manager A's point of view there is only one configuration manager attached to it, but this configuration manager has 35 nodes which is the sum of all nodes downstream in the configuration graph.

Within each configuration three types of nodes may contact the configuration manager to join the system:

Also within each configuration are "code servers" which are http servers used to distribute binary code for a given set of applications; they allow DOGMA to function with no shared file system.  Optionally, "data servers" may be added to the configuration to serve application data files.

When nodes join a configuration they sign a "contract" with the configuration manager informing the configuration manager of the duration for which they will stay in the system.  This prevents the configuration manager from scheduling long running jobs on transitory nodes.  Nodes also inform the configuration manager of certain attributes such as processing power.

To improve application performance and enhance usability, applications have an associated definition file which informs DOGMA of the application's system requirements and startup arguments.  Application requirements may consist of the number of nodes, processing power required, networking connection required, whether or not the application allows a varying number of nodes, how long the application expects to run, etc..  Startup arguments free the user from the need to remember which arguments are required by which application, and can be simple text entries, a list of choices, etc..

When an application is run, the system attempts to match the application with a set of nodes which meet the application's requirements.  If it cannot find a set of nodes locally, it may (depending on the application requirements) look remotely for nodes to run the application on.  The information exchanged periodically by configuration managers helps them make intelligent decisions about where to remotely run a job.

DOGMA Programming APIs

DOGMA is programmed using one of two programming APIs:  MPIJ or Dynamic Object Groups.  MPIJ is DOGMA's implementation of the MPI standard (we are working with other members of the Java Grande Forum on a formal proposal for MPI-like bindings).  Dynamic Object Groups allow hierarchical master/slave style programs to be written in a very simple fashion which can scale to large numbers of nodes where the number of nodes is allowed to vary at runtime.