I append two referee reports on your excellent paper C476: Effective Multicast Programming in Large Scale Distributed Systems:The DACE Approach I would be happy to publish your paper if you addressed the changes suggested by the referees. This looks quite possible with modest extensions. Please include a discussion of your changes and their answer to the referees in your resubmittal. I thank you for your interest in Concurrency.Practice and Experience. Send us other good papers! Please send all communication -- including the resubmission -- electronically if possible using the address fox@csit.fsu.edu If you should need a "real address", please use: Geoffrey Fox Computational Science and Information Technology Florida State University 400 Dirac Science Library Tallahassee Florida 32306-4130 Referee One This is an important topic and I support strongly publication of this paper. I would like clarification of discussion of JMS as I see this of great interest in Industry with several commercial implementations from Sun, Softwired and others. JMS does not address many of issues in DACE but I wonder if DACE could be usefully implemented on top of JMS. If so how would subtopics and multi-cast look? These are I think not addressed in JMS? How does selector mechanism in JMS map into DACE. Referee Two DACE is a middleware system implementing the publish/subscribe model of interaction in a distributed environment. It is topic-based, in that users subscribe to topics which presumably consist of multiple event types. Topics are organized hierarchically, in that subtopics can be derived for which a subscriber can subscribe in addition to or exclusive to the topic itself. The paper has an interesting approach to publish/subscribe middleware. The conceptualization of communication as a hierarchy of classes of 'collections' is novel. Topic membership in the presence of failures is well thought out. A strength of DACE is its tolerance to network partitions and crash failures. Topic knowledge is maintained at each site. When a network partitions, participants will renegotiate a topic member set. Crash failure is achieved by providing each participant with access to a local failure detector module which outputs hints about the closed channels with other participants. Topic member set is then renegotiated. A second strength of DACE is the conceptualization of publish/subscribe communication as a collection. For instance, the API allows 'pull' style communication by registering a callback to 'remove' an event from the collection. The notion of a collection lends itself easily to supporting collection subtypes that impose order on the events and QoS features (i.e., reliability) on the communication infrastructure. I have two major reservations with the paper. First, practical experience with the system is not evident from the paper. Second, the measurements section, primarily by the absence of results, does not convince the reviewer that the system is mature enough to have been used in any practical experience setting. Specifically, the paper contains one measurement comparing the collection subtype with the least overhead (i.e., DAStrongBag) against an unreliable multicast protocol. The measurement convincingly demonstrates the effectiveness of the first-participant algorithm developed by DACE to reduce the number of messages sent, but details were absent. I would have liked to have seen a description of the model of communication employed in the experiment, an indication of the loss rate for the unreliable multicast protocol, details as to the unreliable multicast protocol used, and a breakdown in number of events sent in the DACE case. Regarding the latter, are all events user level events or is `first-participant' traffic included? I would also have liked to have seen results supporting the two major strengths of the approach: multiple subtypes and failure recovery. What cost is associated with semantics like 'at-least-once FIFO'? What is the overhead of topic network knowledge propagation?