I enclose 3 Referee reports on your paper. We would be pleased to accept it and could you please send me a new version before November 5 99 Please send a memo describing any suggestions of the referees that you did not address Ignore any aggressive remarks you don't think appropriate but please tell me. I trust you! Thank you for your help in writing and refereeing papers! Referee 1 ********************************************************** Subject: referee report 428 referee report Wide-area parallel programming using the remote invocation model This paper describes a Java-based distributed system intended to support highly-parallel computation on a network or cluster. The paper surveys the system, and describes its behavior on a few parallel benchmarks. It is well written, and I have only a few comments. I think the paper would be improved by a more thorough discussion design trade-offs underlying the system's customized RMI system, which is arguably the most novel aspect of the system.. For example, how type-safe is it? The JDK serialization/deserialization does run-time checks to ensure that objects really belong to the correct class. I could not tell from the description what kind of guarantees the Manta/Panda RMI provides. More generally, many people think that that the JDK serialization mechanism suffers from feature bloat, encorporating too much machinery (versioning, externalizable, class information, etc.) and I think the readers of this paper would be very interested in understanding exactly what they would have to give up to achieve the same kind of performance. Can Manta's RMI serialization be used for persistent storage, or is that something sacrified for performance? I think these issues would be of substantial interest to the community, and I would urge the authors to share their experience in greater detail. I was also a little unclear about the thread structure of RMI. When I make an RMI to a remote object, does that create a new thread? Are there deadlock issues here? Referee 2 **************************************************************** Subject: C428 JGSI Review a) Overall Recommendation: Accept b) Words suitable for authors: I enjoyed this paper very much. The paper is well written and backed up with a lot of good experimental work. Also, Section 4, on Alternative programming models, is a good dissection of the issues. I imagine that the premise for this work---high-performance metacomputing applications will typically run on collections clusters or MPPs---is valid. In the introduction, code mobility is mentioned as one strength of the Java-centric approach. However, because Mantra uses a native compiler, it appears to me that Mantra code is not mobile, although it does support heterogeneity. True? In Subsection 2.3, Mantra on the wide area DAS system, it is noted that Panda uses one dedicated gateway machine per cluster. Are there concerns about, or attempts to implement, fault tolerance? I realize that fault tolerance is not the focus of this work. In the discussion of SOR, I appreciated the detail given on effecting asynchronous RMI via multiple threads. Similarly, I appreciated the detail in the discussion about multiple threads used to implement broadcast for the All pairs shortest path problem. Along these lines, for the TSP problem, I would appreciate a bit more detail on how exactly the program does an RMI to all other workers to update their value for the best solution; no code was given for this. Specifically: does the method scale? Later in the paper, the authors note that such updates are infrequent. I imagine that the frequency decreases over time, being quite frequent, at first. True? If not, please explain. For the IDA*, what is the victim selection method used for work stealing. Again, the comparison of alternative programming methods is clear and useful. I also think the conclusions about what is needed is correct (including JavaSpaces shortcomings). I hope Sun sees this! I also commend the authors on their Related work section: very professional. Overall, I think the paper makes several valuable contributions and is eminently publishable. Referee 3 ************************************************************* C428 JGSI Review Overall recommendation: accept Comments: Good paper. It would be interesting to see how the performance numbers break down into computation and communication phases (especially for applications such as SOR where the communication phase would be the exchange phase). Don't bother with TSP and IDA: their irregularity makes it difficult to obtain any meaningful breakdown. However, I'd like to hear a little bit more about IDA in the discussion section. In particular, my own experience with IDA (the version downloadable from the web) indicates that its measured execution times have huge variances: in average, the speed up for 8 nodes is poor (less than 2). I am interested in knowing whether the authors observed something similar. Just my two cents worth. The paper is fine as is.