Referee 1 ********************************************************** This paper is very well written, interesting to read, and provides a large number of experimental results. The authors provide adequate discussion of their results. I am providing here detailed comments on several points that I think could be made clearer. Section 2: I think that the section should contain some qualitative discussion of why one model is more "user-friendly" than the other. I think that Figure 2 does not make enough of a point and that the text could develop a little more on the fact that explicit yields are more complicated. Maybe a more complex example would do. And the text misses some explicit statement about which model is less intuitive or attractive to developers. This brings me to a general comment about the paper. The abstract foreshadows a trade-off between "complicated code" and "performance", which makes sense. However, throughout the paper, the concern is mostly performance. In fact, the only place where programming styles are mentioned is in Section 6 (an 8 line paragraph on page 24). I would have liked to see more mention of that trade-off during the description of experimental results, if only to remind the reader why all this is worth the trouble. Section 3.1: I would maybe rename that section "Key CP Implementation Issues" as it is entirely about the CP programming paradigm. Section 3.2.1 and 3.2.2: I felt that those two sections were a little difficult to read. After spending quite a lot of time analyzing them I believe they are correct. I think that maybe a table or a paragraph describing the use of locks for each strategy for each platform would be good. As in: "SR_ic on single processor: no locking; SR_ts on single processor: a simple locked flag, etc.". That way the reader can just refer to that table for following the development rather than referring to section 3.1 constantly while reading the paper. Section 5: I think that this section should explicitly contrast the results with those of the previous section for SR. There are so many results in Section 4 that the reader needs a reminder to see how the Java results are different/similar to the SR results. Section 6: This section should really start with a summary of results. Throughout the paper the reader finds out facts such as: "SR_{IC} is good when there is a lot of synchronization", "SR_{IC} is bad when there is little work", etc.. I suggest you go through Sections 4 and 5, find all such statements and summarize the trends at the beginning of Section 6. My experience reading the paper was that I kept going back and forth between sections to remind myself of previous results and reasons for the results. I believe everything is there, but a summary is necessary. Maybe a section right before Section 6 called 'Summary of results and trends' would be good. Or maybe at the end of Section 4 and Section 5. I realize that some trends are not as clear cut as one would hope, but I still think that the authors can provide a coherent summary. F: Presentation Changes The presentation of the experiemtns and results is really good and the paper is extremely well-written. Besides my comments above I do not have further comments. Referee 2 ********************************************************** The paper presents a detailed analysis of various concurrent programming models. In particular, the authors modeled and evaluated the runtime efficiency and code complexity of a cooperative multithreading paradigm and a concurrent programming paradigm using modified versions of the SR programming language using toy benchmarks running on linuxPC systems. The authors were especially thorough in their analysis by introducing context switching modeling into their concurrent programming model. This added accommodation (context switching) made the comparison between the two models (concurrent programming versus cooperative multithreading) especially realistic. Overall, this paper was excellent. The modeling methodology is sound and the results are thoroughly analyzed. Readers who work with grid computing or other large-scale parallel systems can use this paper to assist them in determining which approach (concurrent programming or cooperative multithreading) is the best approach for their parallel programming needs. My only concern about this paper is that the authors should add a sentence or two somewhere around the third paragraph of section 1 to better distinguish their definition of multithreading from the multithreading currently coming out in commercial processors (IBM Power 4, Alpha 21464, Intel Xeons). These new processors execute multiple threads concurrently, not one at a time, as the authors' definition of multithreading assumes. F: Presentation Changes I suggest using numbers, rather than Roman numerals, to refer to tables. Numbers tend to be less confusing, especially to readers who are skimming the paper. Table XI -> Table 11 Referee 3 ********************************************************** A comparison of two models of quasi parallel computation, concurrent programming and concurrent multi-tasking is made on the basis of various timing experients. The experiments compare the performances of several implementations of both models for four example problems: producer-consumer, reader-writer, dining philosophers and jacobi iteration. It is assumed that the number of processes greatly exceeds the number of processors (in the paper the maximum number of processors used for experiments is two); thus, the comparison seeks to analyse the efficiency of scheduling processes to processors in both computational models. The models are realised in two implementation languages, namely, SR and Java. The results are not conclusive but indicate that concurrent multi-tasking is generally the more efficient model (albeit only marginally better for large work loads). The results are not surprising; one would expect a model in which process activity can be controlled explicitly (by commands such as yield) to produce more efficient results - given good implementations - than a model in which process control is managed behind the scenes by an operating system. The conclusions are weak - "cooperative multithreading models deserve further exploration". I am not convinced that a reader would be satisfied with this conclusion after working through a 26 page paper. The subject matter of the paper is ideally suited to Concurrency and Computation:Practice and Experience. The work is an extension of an earlier EURO-PAR paper. The current paper contains the same four examples and considerable portions of text have been reproduced (are there copyright implications?). However, the results have been updated (more efficient implementations), the section comparing the implementations of the two models in Java is entirely new and there are some new experiments which analyse, for example, the effect of changing the granularity of the time slice are given. The conclusion of the current paper adds little to the EuroPar conclusion. The strength of the paper is the detail with which: (i) each timing experiment is described; and (ii) each result is analysed. There are often practical details which might be of interest to a general audience - for example, the authors have described the (unwanted) effects that a Java garbage collection thread can have on timing experiments. My reservation about the paper is that the timing results are often what one would expect. Would it be possible to design new experiments which would shed further light on the sequence of process activiation (i.e. give a trace of process activity in an execution). Or perhaps report on experiments which analyse if processes carry out the same amount of work in each time slice? Perhaps it would then be possible to make further remarks on fairness of process activation. I cannot make a strong case for accepting the paper as it stands: 1. The conclusion section is little more than a summary . The authors should try to say in what circumstances one model is to be preferred to the other. 2. I am unconvinced that timing experiments alone allow one to conclude which model is the better. It may be possible to improve the interest level of the paper by conducting a small number of further experiments which analyse both fairness of activation and the amount of work done by a process in a given time slice. 3. The presentation of the paper should be improved (see below). F: Presentation Changes 1. There are too many tables in the paper (26) - the authors should try to compress much of this data into graphs (for example, some of the results for the four problems could be compressed into a single graph - as in Figure 2 of the EuroPar paper). 2. Often, experimental information is repeated - the authors should try to avoid saying the same thing several times. 3. The descriptions of Green and native versions of Java are not clear.