Subject: Re: referee report Resent-Date: Mon, 11 Oct 1999 08:23:48 -0400 Resent-From: Geoffrey Fox Resent-To: p_gcf@npac.syr.edu Date: Sun, 10 Oct 1999 13:31:04 -0400 From: Maurice Herlihy To: Geoffrey Fox Referee report for Javelin++ Seems like a reasonable paper, but the presentation could be cleaned up a bit. This paper describes a distributed work-stealing system, focusing mostly on two work-stealing algorithms. The introduction is rambling and repetitious, talking about a number of important issues, most of which are only touched upon in the body of the paper. This paper is about load balancing, which is fine, but you have to get pretty far through the paper to realize that the other issues are dealt with only in passing. I would rather see an introduction that says ``there are n fundamental issues, but here we focus on scalable work stealing and load balancing''. The paper would benefit from a more complete discussion of the kinds of applications Javelin++ is intended to support. The only example considered is rendering, which is the very model of an ``embarrasingly parallel'' application. I have no quarrel with this, but I think a discussion of how other kinds of applications (or even other applications) fit into their splittable interface would be helpful in understanding what this system can and can't do. Are there requirements about determinism (what happens if you do the same piece twice, perhaps as a result of fault-tolerance?). What if pieces need to share data? Can concurrent pieces communicate with one another? When loading classes dynamically, does Javelin++ ensure that all the classes needed by an application have been loaded before the computation starts? I can imagine that pausing an application in the middle to load a user-defined class could be disruptive. What was the broker structure for the test application? Are there any performance numbers for broker lookup, reconfiguration, etc? <