I append two referee reports on your excellent paper
C476: Effective Multicast Programming in Large Scale Distributed Systems:The DACE Approach

I would be happy to publish your paper if you addressed the changes
suggested by the referees. This looks quite possible with modest extensions.
Please include a discussion of your changes and their answer to the
referees in your resubmittal. 

I thank you for your interest in Concurrency.Practice and Experience.
Send us other good papers!

Please send all communication -- including the resubmission -- electronically
if possible using the address
fox@csit.fsu.edu

If you should need a "real address", please use:
Geoffrey Fox
Computational Science and Information Technology
Florida State University
400 Dirac Science Library
Tallahassee Florida 32306-4130


Referee One
This is an important topic and I support strongly publication of this
paper. I would like clarification of discussion of JMS as I see this of great
interest in Industry with several commercial implementations from Sun, Softwired
and others. JMS does not address many of issues in DACE but I wonder if DACE
could be usefully implemented on top of JMS. If so how would subtopics and
multi-cast look? These are I think not addressed in JMS?
How does selector mechanism in JMS map into DACE.

Referee Two
DACE is a middleware system implementing the publish/subscribe 
model of interaction in a distributed environment.   It is topic-based,
in that users subscribe to topics which presumably consist
of multiple event types.  Topics are organized hierarchically, in that
subtopics can be derived for which a subscriber can subscribe in
addition to or exclusive to the topic itself. 

The paper has an interesting approach to publish/subscribe
middleware.  The conceptualization of communication as a
hierarchy of classes of 'collections' is novel.  Topic membership in 
the presence of failures is well thought out. 

A strength of DACE is its tolerance to network partitions and crash failures.
Topic knowledge is maintained at each site.  When a network partitions,
participants will renegotiate a topic member set.   Crash failure is achieved
by providing each participant
with access to a local failure detector module which outputs hints
about the closed channels with other participants.  Topic member set is 
then renegotiated.

A second strength of DACE is the conceptualization of publish/subscribe
communication as a collection.  For instance, the API allows 'pull' 
style communication by registering a callback to 'remove' an event from 
the collection.  The notion of a collection lends itself easily to 
supporting collection subtypes that impose order on the events and
QoS features (i.e., reliability) on the communication infrastructure.

I have two major reservations with the paper.  First, practical
experience with the system is not evident from the paper.  Second,
the measurements section, primarily by the absence of results,
does not convince the reviewer that the system is mature enough
to have been used in any practical experience setting.  

Specifically, the paper contains one measurement comparing the
collection subtype with the least overhead (i.e., DAStrongBag) 
against an unreliable multicast protocol.
The measurement convincingly demonstrates the effectiveness of the 
first-participant algorithm developed by DACE to reduce the number of messages
sent, but details were absent.  I would have liked to have seen a description of the model
of communication employed in the experiment, an indication of the
loss rate for the unreliable multicast protocol, details as to 
the unreliable multicast protocol used, and a breakdown
in number of events sent in the DACE case.  Regarding the latter, are all events
user level events or is `first-participant' traffic included?

I would also have liked to have seen results supporting the two
major strengths of the approach:  multiple subtypes and failure recovery. 
What cost is associated with semantics like 'at-least-once FIFO'?
What is the overhead of topic network knowledge propagation?