From furm@npac.syr.edu Mon Nov  2 12:54:04 1998
Date: Sun, 1 Nov 1998 16:37:44 -0500
From: Wojtek Furmanski <furm@npac.syr.edu>
To: timucin@npac.syr.edu
Cc: furm@npac.syr.edu
Subject: Sandia issues

here are some recent exchanges related to Sandia project
not clear what's going but apparently Sandia still needs help
and the project will likely continue after SC98 meetings
Could you write a few paragraphs on heartbeat and XML support
in JWORB as input for the 'care package'? We will also need
a list of all our demos available at SC98 (JWORB, OW-RTI, Jager, 
CMS, OMBuilder, FMS Training Space, what else) - please make
your list with one paragraph description per demo, I will make
my list and let's compare/combine tomorrow or Tuesday.
Geoffrey wants to discuss issues Thuersday. Tom's Haupt role
unclear and he apparently does not know yet what's going on
but it might suggest more Sandia money on the horizon
(and more of the usual uncertainties re project management..)
In any case, I think it is worth to put some effort in
preparing care package and demo handouts for Sandia this week.
Please give it some thought and let's talk more tomorrow.
I will check OMG Fault Tolerance RFP tonight - it might
be a useful item for the care package/next year proposal.
thanks
Wojtek

>From furm@npac.syr.edu Sun Nov  1 16:18:06 1998
Date: Wed, 28 Oct 1998 14:17:03 -0500 (EST)
From: Wojtek Furmanski <furm@npac.syr.edu>
To: Geoffrey Fox <gcf@npac.syr.edu>
Cc: furm@nova.npac.syr.edu
Subject: Re: FYI


Here is more comments on this interesting email. I think we could
indeed be of some help in providing JWORB based support for
integrating various metacomputing environments, basically by
continuing end extending the work we already started on
cluster management for C-Plant. I think the concept of adopting
HLA and extending it for more generic federation services
applies to metacomputing as well. Perhaps we could call it
Metagrid or GridHLA or MetaHLA or VHLA or something like that..?
Each domain such as Globus, Legion, Condor etc.  would be represented
as a federate that conforms to a common FOM (Federation Object
Model) and can join any time an existing or start a new
Metagrid Federation, and interact with other federates
via RTI bus services and their extensions. The latter
can be naturally experimented with and supported by our
Object Web RTI, for example using Globus/Nexus for high performance
communication, or using Java for rapid prototyping of 
CORBA services, or using XML for universally parsable
control messaging, metadata or trader formats etc.
I include some more specific answers or comments below.

On Tue, 13 Oct 1998, Geoffrey Fox wrote:

> 
> Geoffrey Fox  gcf@npac.syr.edu,   http://www.npac.syr.edu 
> Director of NPAC and Professor of Physics and Computer Science
> Phone 3154432163 (Npac central 3154431723) Fax 3154434741
> 
> ------- Forwarded Message
> 
> Date:    Mon, 12 Oct 1998 14:09:25 -0600
> From:    "Pollock, Robert" <rdpollo@sandia.gov>
> To:      "'gcf@npac.syr.edu'" <gcf@npac.syr.edu>
> Subject: Follow up on Conversation
> 
> Jeffrey, 
> 
> I wanted to follow up on a few items you mentioned to me (and my colleagues)
> last Friday on the way to the Chicago airport from the Java Grande workshop.
> 
> 
> As we stated at the workshop, we are attempting to develop a DRM system so
> that a consistent access to, and management of, high-end computational
> resources that are distributed throughout the Defense Programs complex can
> be made available to geographically dispersed users.  (for example, the ASCI
> and C-Plant resources)
> 

It is perhaps worth mentioning that HLA is already making some inroads
into the DoE - I noticed some interesting papers by Argonne people
in the area of logistics simulations during the last HLA conference
Fall98 SIW in Orlando where we presented our WebHLA work.

> 1st.  We are attempting to track down the reason you have not been able to
> receive a copy of the C-Plant software so that you may load it on your
> "C-Seed Plant" hardware for evaluation.  Please standby.  
> 
> 
> 2nd.  As you already know, we (SNL' DisCom2/DRM Group) are trying to
> identify an implementation model that would fit nicely into our overall DRM
> logical architecture model (seven conceptual layers) that was presented last
> week at the workshop.  You mentioned the WebFlow and JWorb three tier model
> as a possible implementation for our logical model.    The question I have
> has to do with the maturity of the JWorb services. To what environments do
> the JWorb servers support today?  Does JWorb provide any API's for accessing
> Globus managed resources?  If so, to what extent? If not, are their any
> plans (either from your group or from Globus community) for incorporating
> JWorb interfaces into Globus? You also mentioned a paper that is currently
> in draft form that addresses in more detail the Three-tier model and
> specifically focuses on the JWorb concept.   Is it possible for you to
> provide Sandia with a copy of this paper for our review?  
> 

Our main focus so far was on providing JWORB services for DoD Modeling
and Simulations and its Web based HPC extensions - therefore we selected
DMSO HLA/RTI as the target for our first JWORB service called
Object Web RTI (i.e. DMSO RTI implemented in Java as CORBA service -
note btw that DMSO has now an HLA proposal into OMG for which our
OW-RTI is already a prototype implementation). The next step and
early work in progress is JWORB based Cluster Management Service for
C-Plant - here we currently have heartbeat support operational and
we start building Clustering FOM and XML data support for
resourse allocation and management.
Regarding Globus, we have some involvement on wrapping Globus
applications as coarse grain WebFlow modules. New WebFlow based on
JWORB middleware will offer a natural visual Metagrid authoring tools
with Globus as one of the supported metacomputing domains.
RCI paper included some top level description of JWORB only - more
info will be available in our Wiley book by the end of this year.


> 3rd.   Do you currently have any plans for writing JWorb API's for
> interfacing with the PBS scheduler?  I believe the PBS scheduler (provided
> by NASA) is being ported this year (FY99) to run on the C-Plant hardware.
> See Art Hale for further details on this tasking. 
> 

We are looking into PBS and we are planning to provide support for
various clustering/scheduling tools via our Clustering FOM - but
priorities are not clear yet, though. So far, we were looking
in more detail into Condor and Beowulf while waiting for more
hints from the C-Plant team.

> 4th.   Do you know of any commercially available DRM systems that can
> support several meta-computing service models (i.e., Globus, Condor, Legion,
> etc...) ?  
>

I doubt there are any robust commercial tools in this area as market
for interoperable metacomputing is yet to be built. Our ansatz is that
the DoD M&S has most experience in this area (or at least in the
subset of irregular distributed computing) and various large scale
simulation communities are being pushed now hard to interoperate
due to the DoD budget cuts. Hence HLA initiative and early products
which seem to be quite promising - SISO for IEEE standards, DoD-wide
mandate, OMG presence, target for initial commercial activities,
significant interest by Boeing and other large manufacturers etc.
Adopting HLA standard and extending it towards Web based HPCC seems
to be our unique angle so - perhaps we are closer then others
to robust interoperable metacomputing framework?
 
>
> As part of our FY '99 demonstration of a DRM system, we are looking at the
> C-Plant clusters as being a critical supported component in our solution.
> In creating the DRM system, we recognize the need to take full-advantage of
> commercially available products and tools as part of overall solution.  I am
> hoping that as we better understand your goals and vision, we might be able
> to leverage off some of your activities and visa-versa.  
> 

I would love to learn more about DRM status. As I said before, more
material on our approach including JWORB, WebHLA etc. will be available
in our new Wiley book by the end of this year.

> 
> 
> 
> Any insight, clarification, or guidance you can provide is much appreciated.
> 
> 
> 
> bp
> rdpollo@sandia.gov
> 505-844-4442
> 
> ------- End of Forwarded Message
> 
> 


>From gcf@npac.syr.edu Sun Nov  1 16:18:41 1998
Date: Fri, 30 Oct 1998 15:09:03 -0500
From: Geoffrey Fox <gcf@npac.syr.edu>
To: "Pollock, Robert" <rdpollo@sandia.gov>
Cc: furm@npac.syr.edu, Art Hale <alhale@sandia.gov>
Subject: Comments on your interesting email!

Who from from Sandia be at SC98 -- maybe we could meet there!

We think we could indeed be of some help in providing JWORB based support for
integrating various metacomputing environments, basically by continuing end
extending the work we already started on cluster management for C-Plant. We
think the concept of adopting HLA and extending it for more generic federation
services applies to metacomputing as well. Perhaps we could call it Metagrid
or GridHLA or MetaHLA or VHLA or something like that..? Each domain such as
Globus, Legion, Condor etc.  would be represented as a federate that conforms
to a common FOM (Federation Object Model) and can join any time an existing or
start a new Metagrid Federation, and interact with other federates via RTI bus
services and their extensions. The latter can be naturally experimented with
and supported by our Object Web RTI, for example using Globus/Nexus for high
performance communication, or using Java for rapid prototyping of CORBA
services, or using XML for universally parsable control messaging, metadata or
trader formats etc. We include some more specific answers or comments below.

Note two references
1. A long paper we wrote for RCI called
   High Performance Commodity Computing on the Pragmatic Object Web

2. A book we are writing called
   Building Distributed Systems on the Pragmatic Object Web
   
> ------- Forwarded Message
> 
> Date:    Mon, 12 Oct 1998 14:09:25 -0600
> From:    "Pollock, Robert" <rdpollo@sandia.gov>
> To:      "'gcf@npac.syr.edu'" <gcf@npac.syr.edu>
> Subject: Follow up on Conversation
> 
> Jeffrey, 
> 
> I wanted to follow up on a few items you mentioned to me (and my colleagues)
> last Friday on the way to the Chicago airport from the Java Grande workshop.
> 
> 
> As we stated at the workshop, we are attempting to develop a DRM system so
> that a consistent access to, and management of, high-end computational
> resources that are distributed throughout the Defense Programs complex can
> be made available to geographically dispersed users.  (for example, the ASCI
> and C-Plant resources)
> 

It is perhaps worth mentioning that HLA is already making some inroads
into the DoE - We noticed some interesting papers by Argonne people
in the area of logistics simulations during the last HLA conference
Fall98 SIW in Orlando where we presented our WebHLA work.

> 1st.  We are attempting to track down the reason you have not been able to
> receive a copy of the C-Plant software so that you may load it on your
> "C-Seed Plant" hardware for evaluation.  Please standby.  
> 
> 
> 2nd.  As you already know, we (SNL' DisCom2/DRM Group) are trying to
> identify an implementation model that would fit nicely into our overall DRM
> logical architecture model (seven conceptual layers) that was presented last
> week at the workshop.  You mentioned the WebFlow and JWorb three tier model
> as a possible implementation for our logical model.    The question I have
> has to do with the maturity of the JWorb services. To what environments do
> the JWorb servers support today?  Does JWorb provide any API's for accessing
> Globus managed resources?  If so, to what extent? If not, are their any
> plans (either from your group or from Globus community) for incorporating
> JWorb interfaces into Globus? You also mentioned a paper that is currently
> in draft form that addresses in more detail the Three-tier model and
> specifically focuses on the JWorb concept.   Is it possible for you to
> provide Sandia with a copy of this paper for our review?  
> 

Our main focus so far was on providing JWORB services for DoD Modeling
and Simulations and its Web based HPC extensions - therefore we selected
DMSO HLA/RTI as the target for our first JWORB service called
Object Web RTI (i.e. DMSO RTI implemented in Java as CORBA service -
note btw that DMSO has now an HLA proposal into OMG for which our
OW-RTI is already a prototype implementation). The next step and
early work in progress is JWORB based Cluster Management Service for
C-Plant - here we currently have heartbeat support operational and
we start building Clustering FOM and XML data support for
resourse allocation and management.
Regarding Globus, we have some involvement on wrapping Globus
applications as coarse grain WebFlow modules. New WebFlow based on
JWORB middleware will offer a natural visual Metagrid authoring tools
with Globus as one of the supported metacomputing domains.
RCI paper included some top level description of JWORB only - more
info will be available in our Wiley book by the end of this year.


> 3rd.   Do you currently have any plans for writing JWorb API's for
> interfacing with the PBS scheduler?  I believe the PBS scheduler (provided
> by NASA) is being ported this year (FY99) to run on the C-Plant hardware.
> See Art Hale for further details on this tasking. 
> 

We are looking into PBS and we are planning to provide support for
various clustering/scheduling tools via our Clustering FOM - but
priorities are not clear yet, though. So far, we were looking
in more detail into Condor and Beowulf while waiting for more
hints from the C-Plant team.

> 4th.   Do you know of any commercially available DRM systems that can
> support several meta-computing service models (i.e., Globus, Condor, Legion,
> etc...) ?  
>

We doubt there are any robust commercial tools in this area as market
for interoperable metacomputing is yet to be built. Our ansatz is that
the DoD M&S has most experience in this area (or at least in the
subset of irregular distributed computing) and various large scale
simulation communities are being pushed now hard to interoperate
due to the DoD budget cuts. Hence HLA initiative and early products
which seem to be quite promising - SISO for IEEE standards, DoD-wide
mandate, OMG presence, target for initial commercial activities,
significant interest by Boeing and other large manufacturers etc.
Adopting HLA standard and extending it towards Web based HPCC seems
to be our unique angle so - perhaps we are closer then others
to robust interoperable metacomputing framework?
 
>
> As part of our FY '99 demonstration of a DRM system, we are looking at the
> C-Plant clusters as being a critical supported component in our solution.
> In creating the DRM system, we recognize the need to take full-advantage of
> commercially available products and tools as part of overall solution.  I am
> hoping that as we better understand your goals and vision, we might be able
> to leverage off some of your activities and visa-versa.  
> 

We would love to learn more about DRM status. As I said before, more
material on our approach including JWORB, WebHLA etc. will be available
in our new Wiley book by the end of this year.

> 
> 
> 
> Any insight, clarification, or guidance you can provide is much appreciated.
> 
> 
> 
> bp
> rdpollo@sandia.gov
> 505-844-4442
> 
> ------- End of Forwarded Message
> 
>

>From gcf@npac.syr.edu Sun Nov  1 16:19:03 1998
Date: Sat, 31 Oct 1998 01:04:41 -0500
From: Geoffrey Fox <gcf@npac.syr.edu>
To: furm@nova.npac.syr.edu, haupt@boss.npac.syr.edu
Subject: Sandia

I agreed to meet Art Hale and others Wednesday Nov 11 at SC98
He agrees that Sandia CPLANT software not ready for release and
so encourages our general approach

1) we need a "care package" documenting our work. Probably also a proposed
  follow on for next year


We should have comments on following
a)Security
  His concern is that Sandia security people have to check out everything on a
case by case basis. He believes this is not viable.
e.g. they are now checking "ORBIX on Solaris" -- this approach will not "scale"
 Can we arrange a general "wrapper" which once tested by their security folks will
allow flexible implementations ORBIX/Zeus/JWORB/NT/Solaris/LInux underneath
 We use what in our implementation for communication?

 They have some sort of Kerberos environment now

b)Thin v Thick Nodes
  Do we need a Java VM/will we run JWORB on all nodes or just their service nodes
leaving their single user compute nodes uyntouched (host-node model)

c) Pollard (who we sent email to)
  Likes Globus but not so keen on Nexus.

d) He likes Vic Holmes work at Sandia who uses object database to store metadata
   describing computation which seems similar to classic WebFlow linkage of simulations
  data and visualization.
  He sees such metadata as being part of future product databases in engineering design processes

Geoffrey Fox  gcf@npac.syr.edu,   http://www.npac.syr.edu 
Director of NPAC and Professor of Physics and Computer Science
Phone 3154432163 (Npac central 3154431723) Fax 3154434741

>From haupt@npac.syr.edu Sun Nov  1 16:19:29 1998
Date: Sun, 1 Nov 1998 11:53:42 -0500
From: Tomasz Haupt <haupt@npac.syr.edu>
To: "'gcf@npac.syr.edu'" <gcf@npac.syr.edu>,
     "furm@nova.npac.syr.edu" <furm@nova.npac.syr.edu>,
     "haupt@boss.npac.syr.edu" <haupt@boss.npac.syr.edu>
Subject: RE: Sandia

A few comments to start with....

On Saturday, October 31, 1998 1:05 AM, Geoffrey Fox [SMTP:gcf@npac.syr.edu] 
wrote:
> I agreed to meet Art Hale and others Wednesday Nov 11 at SC98
> He agrees that Sandia CPLANT software not ready for release and
> so encourages our general approach
>
> 1) we need a "care package" documenting our work. Probably also a 
proposed
>   follow on for next year
>
What does it mean "our" work. A collection of QS, LMS, WebHLA, ...? A 
coordinated overview? A general target, or specific for Sandia?
I guess it would help if we start with some outline and expectations for 
the total volume.

>
> We should have comments on following
> a)Security
>   His concern is that Sandia security people have to check out everything 
on a
> case by case basis. He believes this is not viable.
> e.g. they are now checking "ORBIX on Solaris" -- this approach will not 
"scale"
>  Can we arrange a general "wrapper" which once tested by their security 
folks will
> allow flexible implementations ORBIX/Zeus/JWORB/NT/Solaris/LInux 
underneath
>  We use what in our implementation for communication?
>
>  They have some sort of Kerberos environment now

This is a very complex issue because the word "secure" is so vaguely 
defined. I think our strategy should be pointing to solutions that are 
accepted by others. A good candidate is AKENTI. Thus my suggestion is to 
try to promise something that goes along AKENTI, at least some of its 
components, and delegate "responsibility" to experts in this field.

> b)Thin v Thick Nodes
>   Do we need a Java VM/will we run JWORB on all nodes or just their 
service nodes
> leaving their single user compute nodes uyntouched (host-node model)

We can adopt several different strategies. WebFlow can work as a job broker 
and delegate resource allocation to some other system (such as Globus or 
Condor). This is a model suggested by "Seamless access..." of JavaGrande. 
Does it mean a Thin node? Well, from the point of view of WebFlow - yes. On 
the other hand there must be some resource allocation demon running on each 
node. If they got one (Globus, Condor,etc) there is no need to duplicate it 
by WebFlow.  Other approach is to use "pure" RMI/CORBA approach which means 
changing WebFlow philosophy (why not?), going away from Commodity 
Components (?), and addressing the security issues by ourselves (I do not 
like this idea). The "pure" WebFlow approach means Fat nodes. What is 
interesting, we can offer any combination of the above because is some case 
one wants a fat node, in others (like using NSF/DOE/DoD HPCC resources) one 
want to stay away from the administrative/security issues.


> c) Pollard (who we sent email to)
>   Likes Globus but not so keen on Nexus.

This is a weird statement. Globus is layered on top of Nexus. Does he want 
to retarget Globus on something else? Seems to me as a major effort (look 
at resources $$$+menpower Globus got). So I do not understand. What exactly 
is wrong with Nexus? As a user you never see Nexus.
>
> d) He likes Vic Holmes work at Sandia who uses object database to store 
metadata
>    describing computation which seems similar to classic WebFlow linkage 
of simulations
>   data and visualization.
>   He sees such metadata as being part of future product databases in 
engineering design processes
>
> Geoffrey Fox  gcf@npac.syr.edu,   http://www.npac.syr.edu
> Director of NPAC and Professor of Physics and Computer Science
> Phone 3154432163 (Npac central 3154431723) Fax 3154434741
>