Referee 1 **************************************************************************** 

This is a well conducted study on a parallel slicing algorithm for concurrent programs
operating in a Unix-like environment. The paper is well structured and mostly follows a
logical structure.

My main concern is about the originality and the relevance of the study. This problem has
been well studied in automatic parallelization (of Fortran for example) work in 80s.
Intermediate structures were called Directed Acyclic Graphs (DAGs) most of the time. It has
been established that parallelism obtained that way only gets you very modest performance
gains and the only way better performance can be achieved is in the presence of loops (not
tackled by the authors) for which a large body of literature exists. Experiments reported
in your paper show a 50% efficiency for 4 processors so it confirms that claim. However,
the results of the study seem to work well in a specific context and it is detailed in a
way that can be implemented so the paper has practical value in some cases. I would ask
the authors to improve the evaluation section to clarify exactly what they have shown
i.e. how can we expect the results to generalize to other codes and more processors.


There are also some problems in the way the paper is structured. I have had problems
understanding Section 2 because too many concepts in it are only introduced much later
(CFG, RCFG, etc.) so it should be removed or placed much later. Also, the work reported in
the paper is an extension of previous work reported as reference [4]. It is not very clear
where the new contribution of this paper is in relation to the other one.

Presentation Changes 

General: there is a need for consistency in the way graphs are named. Sometimes you use
abbreviations (CFG, RCFG), sometimes a full name (process graph, concurrency graph). It is
best to stick to one way.

-page 1: "at distributed location" -> "at distributed locations"

-page 6: "can handle only" -> "can only handle"

-page 12: "Wesier" -> "Weiser"

-page 13: difficult to understand what the slicing criterion <P,s,V> is until page 17 so it
is best to do some rearranging here.

-page 13: "as Weiser slice" -> "as  a Weiser slice"

-page 13: "we extend the parallel algorithm proposed in [5]" : explain what was that
original algorithm proposed for.

-page 15: "Behavior of each" -> "The behavior of each"

-page 17: "process behavior already discussed" -> "already discussed process behavior"

-page 21: "then submit to the Scheduler" -> "then submit them to the Scheduler"

-page 21: "8 sample programs" -> how are them characterized ? beside their length, how many
independent threads of computations do they contain ?

Referee 2 **************************************************************************** 

Summary: This paper is about static slicing - finding the set of
program statements which might influence a given variable's value at a
specified program point.  The paper concerns slicing of concurrent
programs, and in particular how to reduce the time taken to compute
the slice using parallel processing.  It combines two published ideas:
(1) the authors' algorithm for static slicing of unix-style concurrent
programs, and (2) Danicic et al's parallel process network slicing
algorithm.  The paper explains how the scheme should work, presents an
argument for its correctness, and gives some experimental performance
results.

Strengths: This is a short paper presenting a simple idea, and
addresses a fairly interesting problem.

Weaknesses:

1) The experimental evaluation has little or no value:

  - Although speedup is reported, execution time is not.  Thus, there
    is no baseline against which to evaluate performance.

  - The algorithm was evaluated on a series of artificial programs of
    increasing size.  No further details of these programs are given.
    Again, this makes interpretation of the performance figures 
    impossible.

3) The contribution of the paper appears small.It looks like a small increment
  over the authors' prior work and the Danicic et al paper.

3) The authors fail to persuade me that parallelisation is a sensible 
  approach to improving the performance of a slicing system:

  - it is well-known that the efficiency of data flow analyses depends
    crucially on traversal order.  Is this parallel algorithm
    performing the same amount of work as the optimal sequential
    algorithm?

  - Has the sequential implementation taken advantage of careful
    data structure design (at least bit-vectors?) - the paper says
    nothing about this and reports no sequential performance data.

  - What is the asymptotic complexity of the slicing problem?  Is 
    the complexity of slicing concurrent programs worse than 
    slicing sequential programs?  How much parallelism is actually 
    present - are there worst-case programs whose analysis runs 
    sequentially?

Conclusion: unfortunately, this paper is not suitable for
publication in its current form.  It needs better presentation of the 
experimental results and more analysis of the algorithmic complexity 
and available parallelism, On a more positive note, I hope very much to 
see static slicing tools for real languages with concurrency and 
encourage the authors to develop and apply their ideas.


Referee 3 **************************************************************************** 
I am not an expert in this area but I think I can comment usefully on
section 7.

	How are the results sensitive to parallel architecture
e.g. what happens on distributed memory parallel machines?
What happens if you use a more efficient parallel execution environment
-- you seem to use UNIX processes which are very heavyweight?

Referee 4 ****************************************************************************
I find the following problems with the work:

1. The language seems to be too simplified -- in the sequential setting
it does not have aliases, irregular control flow etc. which makes it
unrealistic from a practical point of view. In the parallel setting,
it does not have any interrupt driven communication etc. (such as
hrecv() etc.) Many modern languages including Java have these features
and recent work on slicing is focussed on these issues.

2. Why you do not use use notion of control dependence etc. to
develop sequential slices? Definition 1 seems to be weak and lacking
a clear explanation. 
See Hiralal Agrawal's work in PLDI '93 which formalized the
systematic way of discovering closure in terms of control
and data dependences using control dependences (BTW, this work
needs to be cited)

3. Your performance figures are ad-hoc. In particular, which benchmarks
did you use? What was the baseline of comparisons. Also please provide
raw number of timings. Esp. in a practical journal such as Concurrency:
Practice and Experience these things are essential..

4. Which system did you use to program your parallel algorithm?
Some analysis of complexity of parallel algorithm is needed
in case you submit it to a theoretically oriented journal
such as Parallel Computing.


Presentation Changes


Up front some presentation needs improvements. You should motivate
the problem by citing practical examples and why slicing is time
consuming. Also, distinguish between static and dynamic slicing.
A more thorough survey of current work on slicing is essential.