I enclose 3 Referee reports on your paper.
We would be pleased to accept it and could you please send me
a new version before November 5 99
Please send a memo describing any suggestions of the
referees that you did not address
Ignore any aggressive remarks you don't think appropriate but 
please tell me. I trust you!

Thank you for your help in writing and refereeing papers!

Referee 1 **********************************************************
Subject: referee report 428

referee report
Wide-area parallel programming using the remote invocation model

This paper describes a Java-based distributed system intended to support
highly-parallel computation on a network or cluster.  The paper surveys the
system, and describes its behavior on a few parallel benchmarks.  It is well
written, and I have only a few comments.

I think the paper would be improved by a more thorough discussion design
trade-offs underlying the system's customized RMI system, which is arguably
the
most novel aspect of the system..  For example, how type-safe is it?  The
JDK
serialization/deserialization does run-time checks to ensure that objects
really belong to the correct class.  I could not tell from the description
what
kind of guarantees the Manta/Panda RMI provides.  More generally, many
people
think that that the JDK serialization mechanism suffers from feature bloat,
encorporating too much machinery (versioning, externalizable, class
information, etc.) and I think the readers of this paper would be very
interested in understanding exactly what they would have to give up to
achieve
the same kind of performance.  Can Manta's RMI serialization be used for
persistent storage, or is that something sacrified for performance?  I think
these issues would be of substantial interest to the community, and I would
urge the authors to share their experience in greater detail.

I was also a little unclear about the thread structure of RMI.  When I make
an
RMI to a remote object, does that create a new thread?  Are there deadlock
issues here?

Referee 2 ****************************************************************
Subject: C428 JGSI Review


a) Overall Recommendation:

Accept

b) Words suitable for authors:

I enjoyed this paper very much.  The paper is well written and backed up
with a lot of good experimental work.  Also, Section 4, on Alternative
programming models, is a good dissection of the issues.

I imagine that the premise for this work---high-performance
metacomputing applications will typically run on collections clusters or
MPPs---is valid.

In the introduction, code mobility is mentioned as one strength of the
Java-centric approach.  However, because Mantra uses a native compiler,
it appears to me that Mantra code is not mobile, although it does
support heterogeneity.  True?

In Subsection 2.3, Mantra on the wide area DAS system, it is noted that
Panda uses one dedicated gateway machine per cluster.  Are there
concerns about, or attempts to implement, fault tolerance?  I realize
that fault tolerance is not the focus of this work.

In the discussion of SOR, I appreciated the detail given on effecting
asynchronous RMI via multiple threads.  Similarly, I appreciated the
detail in the discussion about multiple threads used to implement
broadcast for the All pairs shortest path problem.

Along these lines, for the TSP problem, I would appreciate a bit more
detail on how exactly the program does an RMI to all other workers to
update their value for the best solution; no code was given for this.
Specifically: does the method scale? Later in the paper, the authors
note that such updates are infrequent.  I imagine that the frequency
decreases over time, being quite frequent, at first.  True?  If not,
please explain.

For the IDA*, what is the victim selection method used for work
stealing.

Again, the comparison of alternative programming methods is clear and
useful.  I also think the conclusions about what is needed is correct
(including JavaSpaces shortcomings).  I hope Sun sees this!

I also commend the authors on their Related work section: very
professional.

Overall, I think the paper makes several valuable contributions and is
eminently publishable.


Referee 3 *************************************************************

C428 JGSI Review
Overall recommendation: accept
Comments:

Good paper.  It would be interesting to see how the performance
numbers break down into computation and communication phases
(especially for applications such as SOR where the communication phase
would be the exchange phase).  Don't bother with TSP and IDA: their
irregularity makes it difficult to obtain any meaningful breakdown.
However, I'd like to hear a little bit more about IDA in the
discussion section.  In particular, my own experience with IDA (the
version downloadable from the web) indicates that its measured
execution times have huge variances: in average, the speed up for 8
nodes is poor (less than 2).  I am interested in knowing whether the
authors observed something similar.  Just my two cents worth.  The
paper is fine as is.