Referee 1 **************************************************************************** This is a well conducted study on a parallel slicing algorithm for concurrent programs operating in a Unix-like environment. The paper is well structured and mostly follows a logical structure. My main concern is about the originality and the relevance of the study. This problem has been well studied in automatic parallelization (of Fortran for example) work in 80s. Intermediate structures were called Directed Acyclic Graphs (DAGs) most of the time. It has been established that parallelism obtained that way only gets you very modest performance gains and the only way better performance can be achieved is in the presence of loops (not tackled by the authors) for which a large body of literature exists. Experiments reported in your paper show a 50% efficiency for 4 processors so it confirms that claim. However, the results of the study seem to work well in a specific context and it is detailed in a way that can be implemented so the paper has practical value in some cases. I would ask the authors to improve the evaluation section to clarify exactly what they have shown i.e. how can we expect the results to generalize to other codes and more processors. There are also some problems in the way the paper is structured. I have had problems understanding Section 2 because too many concepts in it are only introduced much later (CFG, RCFG, etc.) so it should be removed or placed much later. Also, the work reported in the paper is an extension of previous work reported as reference [4]. It is not very clear where the new contribution of this paper is in relation to the other one. Presentation Changes General: there is a need for consistency in the way graphs are named. Sometimes you use abbreviations (CFG, RCFG), sometimes a full name (process graph, concurrency graph). It is best to stick to one way. -page 1: "at distributed location" -> "at distributed locations" -page 6: "can handle only" -> "can only handle" -page 12: "Wesier" -> "Weiser" -page 13: difficult to understand what the slicing criterion
is until page 17 so it is best to do some rearranging here. -page 13: "as Weiser slice" -> "as a Weiser slice" -page 13: "we extend the parallel algorithm proposed in [5]" : explain what was that original algorithm proposed for. -page 15: "Behavior of each" -> "The behavior of each" -page 17: "process behavior already discussed" -> "already discussed process behavior" -page 21: "then submit to the Scheduler" -> "then submit them to the Scheduler" -page 21: "8 sample programs" -> how are them characterized ? beside their length, how many independent threads of computations do they contain ? Referee 2 **************************************************************************** Summary: This paper is about static slicing - finding the set of program statements which might influence a given variable's value at a specified program point. The paper concerns slicing of concurrent programs, and in particular how to reduce the time taken to compute the slice using parallel processing. It combines two published ideas: (1) the authors' algorithm for static slicing of unix-style concurrent programs, and (2) Danicic et al's parallel process network slicing algorithm. The paper explains how the scheme should work, presents an argument for its correctness, and gives some experimental performance results. Strengths: This is a short paper presenting a simple idea, and addresses a fairly interesting problem. Weaknesses: 1) The experimental evaluation has little or no value: - Although speedup is reported, execution time is not. Thus, there is no baseline against which to evaluate performance. - The algorithm was evaluated on a series of artificial programs of increasing size. No further details of these programs are given. Again, this makes interpretation of the performance figures impossible. 3) The contribution of the paper appears small.It looks like a small increment over the authors' prior work and the Danicic et al paper. 3) The authors fail to persuade me that parallelisation is a sensible approach to improving the performance of a slicing system: - it is well-known that the efficiency of data flow analyses depends crucially on traversal order. Is this parallel algorithm performing the same amount of work as the optimal sequential algorithm? - Has the sequential implementation taken advantage of careful data structure design (at least bit-vectors?) - the paper says nothing about this and reports no sequential performance data. - What is the asymptotic complexity of the slicing problem? Is the complexity of slicing concurrent programs worse than slicing sequential programs? How much parallelism is actually present - are there worst-case programs whose analysis runs sequentially? Conclusion: unfortunately, this paper is not suitable for publication in its current form. It needs better presentation of the experimental results and more analysis of the algorithmic complexity and available parallelism, On a more positive note, I hope very much to see static slicing tools for real languages with concurrency and encourage the authors to develop and apply their ideas. Referee 3 **************************************************************************** I am not an expert in this area but I think I can comment usefully on section 7. How are the results sensitive to parallel architecture e.g. what happens on distributed memory parallel machines? What happens if you use a more efficient parallel execution environment -- you seem to use UNIX processes which are very heavyweight? Referee 4 **************************************************************************** I find the following problems with the work: 1. The language seems to be too simplified -- in the sequential setting it does not have aliases, irregular control flow etc. which makes it unrealistic from a practical point of view. In the parallel setting, it does not have any interrupt driven communication etc. (such as hrecv() etc.) Many modern languages including Java have these features and recent work on slicing is focussed on these issues. 2. Why you do not use use notion of control dependence etc. to develop sequential slices? Definition 1 seems to be weak and lacking a clear explanation. See Hiralal Agrawal's work in PLDI '93 which formalized the systematic way of discovering closure in terms of control and data dependences using control dependences (BTW, this work needs to be cited) 3. Your performance figures are ad-hoc. In particular, which benchmarks did you use? What was the baseline of comparisons. Also please provide raw number of timings. Esp. in a practical journal such as Concurrency: Practice and Experience these things are essential.. 4. Which system did you use to program your parallel algorithm? Some analysis of complexity of parallel algorithm is needed in case you submit it to a theoretically oriented journal such as Parallel Computing. Presentation Changes Up front some presentation needs improvements. You should motivate the problem by citing practical examples and why slicing is time consuming. Also, distinguish between static and dynamic slicing. A more thorough survey of current work on slicing is essential.