Referee 1 ************************************************** This paper presents a system called the Virtual Service Grid, which manages replication and load-balancing of network services in the wide area. VSG automatically creates and destroys service replicas in response to client demand, and performs replica selection based on performance prediction for each service node. The paper suggests that this approach can be applied to a wide range of services, including high-end scientific computing applications as well as commodity Web-based services. Performance results are given on a distributed testbed demonstrating the viability of the VSG approach on a small scale. I have a number of reservations about this paper that could perhaps be addressed by the authors through a round of (heavy) revisions. Therefore, I am recommending a marginal rejection for this paper as I think these revisions would warrant re-review once completed. The first concern is that this paper appears to draw very closely from 3 other papers by the same authors on similar topics, and I am worried that this might be the "least publishable unit". Only one of the 3 others papers is cited here and it is not placed in context. What are the new results presented in this paper, and how does it build upon the previous work you have done? There are several technical concerns about this work that, while the authors acknowledge that they are areas for future work, seriously limit the applicability of the approach here. First, VSG assumes that services are stateless - a term that is not defined in this paper but presumably means that services can be started or stopped (or fail) at any time, have no side effects, are idempotent, and are homogenous. In the real world these assumptions are very rarely (if ever) true. Even purely computational services generally produce a great deal of local state, and reuse of intermediate results is an important aspect of pipelining and optimization in this regime. (This fact is even relied upon in the performance results presented in section 4 of the paper.) I have a hard time believing that this issue is really orthogonal to replication management and selection. The introduction claims that this paper addresses fault transparency, supposedly by automatically failing over to a backup replica in the event of a service failure; however, the paper does not actually address this issue, which certainly complicates most of the techniques described here. If the paper does not address failures then it should not be claimed in the introduction. My most important reservation about this paper is that it does not appear to address scalability. The mechanisms presented here rely upon (a) a centralized replica manager (RM) for each service; (b) active network probing by individual clients or groups of clients; and (c) an apparently static configuration in which clients are bound to groups. The performance results given here are for a very small testbed (only around 16 clients with 14 replica hosting machines), and obviously in this environment one can get away with these kinds of assumptions. However, in general network services must support extremely bursty demands with potentially many tens of thousands of clients with no pre-defined administrative hierarchy or configuration. The authors claim that using a recent history of replica performance is adequate; however, it is not at all clear that this is true, given that sudden bursts can throw off performance estimates by orders of magnitude. The paper does not describe the overhead of the monitoring and probing mechanisms or the frequency with which probes are made. As far as a prototype goes the authors may be justified in choosing this design, but it is unfortunate that they do not discuss the implications nor how they would overcome these limitations. I am concerned about the issues that arise with tuning the VSG system. The "P-Q Algorithm" (which looks like a classic discrete PID controller in disguise) has a number of complex parameters, and it is not clear how a system administrator would go about setting them. The choice seems to depend upon the hardware, network, application, client load, and many other factors - not all of which can be determined a priori. The paper presents a particular setting for these parameters but nothing is said about how they were derived. The related work section here is somewhat weak and could do a better job describing load balancing and replication schemes that are in wide use in other domains. The Oceano project from IBM research as well as recent work in SOSP'01 by Jeff Chase come to mind; although these deal mainly with replication and load management in local area environments it is important to draw parallels to this prior work. Akamai has done considerable work in the area of wide-area replication for (static) services, and relies upon complex techniques for replica selection based on performance criteria; however little is said here about their approach. My specific suggestions would be to reframe the introduction to state more precisely what problems this paper actually addresses, to spend less space on the myriad performance numbers (many of which do not contribute to a better understanding of the system), and discuss in some depth how you could extend this approach to cover some of the limitations described above. As the paper stands it is not obvious that VSG could be extended to address scalability, fault tolerance, or stateful services. F: Presentation Changes Overall the paper is well-written. A few minor points: Figure 6 is very difficult to understand. It would be helpful for the various acronyms (GM, RM, etc.) to be defined in the caption. The meaning of the various boxes, circles, and lines is not at all clear. There seems to be a problem with fonts, especially in the mathematical formulae. The caption on Figure 7 is truncated. I'm not sure that publishing the hostnames of your testbed machines is a good idea, lest you invite a denial-of-service attack. Section 4 degenerates into a large number of figures, and I'm not sure what high-level information the reader is supposed to derive from them. This is especially true of the scatterplots in Figures 17, 21, 24, and 26; more helpful would have been smoothed averages or histograms/CDFs. Also, I am interested to see the number of replicas created/destroyed/utilized during the discussion of replica creation and destruction. The figures only show the aggregate response time, but this is a second-order effect. . Referee 2 ************************************************** This is an interesting paper addressing an important problem. It is clearly written. I was not convinced by the particular applications chosen -- matrix multiplication and full matrix jacobi Iteration are hardly central computational science applications. i think some of the motivating examples in the beginning were much more to the point -- remote data and visualization are very clear cases. Further how does this work compare to that used in Akamai and other commercial situations -- as the Author's say this basic issue stretches from commercial to technical computing. So although this paper is not the "answer" it is a useful well written contribution. I recommend publication after some discussion of issues such as those raised above.