I enclose 3 Referee reports on your paper.
We would be pleased to accept it and could you please send me
a new version before November 5 99
Please send a memo describing any suggestions of the
referees that you did not address
Ignore any aggressive remarks you don't think appropriate but 
please tell me. I trust you!

Thank you for your help in writing and refereeing papers!

Referee 1 ************************************************************
Subject: C434 JGSI Review

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Overall Recommendation: Accept
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This paper describes the content of and the design philosophy behind a
benchmark suite for Java Grande applications, and presents the results
of benchmarking a half dozen hardware/JVM combinations.  The paper is
well-written, interesting, and highly relevant to the Java Grande
community.

General Comments
~~~~~~~~~~~~~~~~

I have only fairly minor comments on the paper's content.

1. The last paragraph of section 2 touches on "a feature peculiar to Java
   benchmarking, which is that it is possible to distribute the benchmark
   without revealing the source code." The importance of this point is
   lessened by the wide availability of Java de-compilers, which produce
   quite readable Java source code.

2. At the top of the right column on page 2, it is stated that I/O
components
   of the benchmarks have been removed, presumably because they are not
   considered relevant to Java Grande applications. Yet in the first
   paragraph of the introduction, large network requirements are considered
   one of the hallmarks of Grande applications (and perhaps disk usage
   should be added to that list). Perhaps the paragraph in section 3 meant
   to refer to "*terminal* I/O"?

3. In the descussion in section 4.1 on the meaning of the temporal and
   relative performance metrics, I think it would be worth adding an
explicit
   statement that bigger values under both metrics indicate *better*
   performance.

4. The last paragraph of section 7 (describing how to obtain the benchmarks)
   does not constitute future work; it should probably be moved to the
   very beginning of section 5.

Detailed Comments
~~~~~~~~~~~~~~~~~

These comments are just minor nitpicks.

1. The acronyms EPCC and MPI should be spelled out on the first
   use of each.

2. In section 5, the descriptions of some benchmarks begin with a verb
   (e.g., "measures", "tests"), while others begin with "This benchmark
   measures..." or "This benchmark tests...". For uniformity, they
   should all be changed to follow the same convention. Similarly,
   some descriptions say "Performance units are..." while others say
   "Results are reported in...". Finally, some say something like
   "This kernel/benchmark exercises ...", while others simply say
   something like "Memory and integer intensive."

3. There are a couple of places where a comma appears immediately
   before a parenthetical remark; these commas should be moved
   after the closing right parenthesis.

4. Pet peeve: there are many, many instances of the word "which" that
   should be replaced by "that". For a guide to the correct usage, see
   the topic on "Which-Hunting" in "A Handbook for Scholars", Mary-
   Claire van Leunen, Oxford University Press, 1992.

5. Typos and suggested improvements:

  Pg 1, col 1, section 1, line 3:
     "...well outside its original design specifications."
  -> "...well outside its original design goals."

  Pg 1, col 1, line -3 (counted from bottom):
     "Show that real large scale codes can be..."
  -> "Show that real, large-scale codes can be..."

  Pg 1, col 1, line -3:
     "...can be written and provide..."
  -> "...can be written, and provide..."

  Pg 1, col 2, line 2:
     "...execution environments thus allowing..."
  -> "...execution environments, thus allowing..."

  Pg 1, col 2, line 6:
     "...to Grande Applications and in doing so encourage..."
  -> "...to Grande Applications, and in doing so encourage..."

  Pg 1, col 2, section 2, line 6:
     "...a number of benchmarks [] ..."
  -> "...a number of micro-benchmarks [] ..."

  Pg 1, col 2, last line:
     "These are useful in that they can be representative..."
  -> "These are useful in that they are representative..."

  Pg 2, col 1, "Robust" item:
     "The performance of suite ..."
  -> "The performance of the suite ..."

  Regarding the Robustness criterion, I am dubious as to whether it
  is possible to eliminate hardware factors (such as cache size) from
  a performance measurement.

  Pg 2, col 1, "Portable" item:
     "...a variety of Java environments as possible."
  -> "...a variety of Java platforms as possible."

  Pg 2, col 1, line -12:
     "..., we provide three types of benchmark, ..."
  -> "..., we provide three benchmark types, ..."

  Pg 2, col 1, line -4:
     "...of real applications running under the Java environment."
  -> "...of real Java applications."

  Pg 2, col 2, line 12:
     "We also choose the kernels..."
  -> "We also chose the kernels..."

  Pg 2, col 2, line -24:
     "..., as well as ensuring adherence to..."
  -> "..., as well as to ensure adherence to..."

  Pg 3, col 1, line 14:
     "Relative performance is the ration of temporal performance ..."
  -> "Relative performance is the ratio of temporal performance ..."

  Pg 3, col 1, line 15:
     "... that is a chosen JVM/operating system/hardware combination."
  -> "... that is, a chosen JVM/operating system/hardware combination."

  Pg 3, col 1, line -16:
     "Accessing benchmark methods as class methods."
  -> "Accessing benchmark methods as static methods."
  (This ain't Smalltalk ;-)

  Pg 3, col 1, line -8:
     "We can force compliance to common structure..."
  -> "We can force compliance to a common structure..."

  Pg 3, col 2, line -14:
     "...to different types of variable."
  -> "...to different types of variables."

  Pg 5, col 1, line 9:
     "This performs a one-dimensional forward transform..."
  -> "This performs a one-dimensional Fourier transform..."

  Pg 5, col 1, Sparse:
     The first and third sentences can be merged as follows:

     Multiplies an N x N sparse matrix stored in compressed-row
     format with a prescribed sparsity structure by a dense
     vector 200 times.

     "This kernel exercises indirection-addressing and..."
  -> "This kernel exercises indirect-addressing and..."

  Pg 5, col 1, Search:
     "... using a alpha-beta pruned search technique."
  -> "... using an alpha-beta pruned search technique."

  Pg 5, col 2, section 6.1, line 3:
     "Also of interest is language comparisons, comparing the performance
      of Java versus other programming languages such as Fortran, C and
      C++."
  -> "Also of interest are language comparisons, that is, comparing the
      performance of Java to other programming languages such as Fortran,
      C and C++."

  Pg 5, col 2, section 6.1, line 6:
     "Currently, the LUFact and MolDyn benchmarks, allow..."
  -> "Currently, the LUFact and MolDyn benchmarks allow..."

  Pg 5, col 2, section 6.1, lines 7-10:
     "It is intended, however, that the parallel part of the suite will
      contain versions of well-known Fortran and C parallel benchmarks, ..."
  -> "However, we intend the parallel part of the suite to contain
      versions of well-known Fortran and C parallel benchmarks, ..."

  Pg 5, col 2, section 6.1, lines 11-17:
     "Measurements have been taken for the Linpack Benchmark (on a
      1000 x 1000 problem size) and the Molecular Dynamics benchmark
      (2048 particles), using Java (Sun JDK 1.2.1 02 production version,
      and Sun JDK 1.2.1 reference version + Hotspot 1.0), Fortran and
      C on a 250MHz Sun Ultra Enterprise 3000 with 1Gb of RAM and the
      results are shown in Figure 3."
  -> "Measurements have been taken for the LUFact benchmark (on a
      1000 x 1000 problem size) and the MolDyn benchmark (2048
      particles) using Java (Sun JDK 1.2.1 02 production version,
      and Sun JDK 1.2.1 reference version + Hotspot 1.0), Fortran
      and C on a 250MHz Sun Ultra Enterprise 3000 with 1Gb of RAM.
      The results are shown in Figure 3."

  Pg 8, col 2, line 1:
     "Consideration of these issues has lead us to decide ..."
  -> "Consideration of these issues has led us to decide ..."

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Referee 2 ********************************************************************

C434 JGSI Review

Paper: A Benchmark Suite for High Performance Java
Authors: J M Bull, L A Smith, M D Westhead, D S Henty and R A Davey
Number: C434

a)Overall Recommendation

Scale used: 1(trivial) to 5(outstanding)
Recommendation:         3, accept
Technical Contribution: 3
Technical Content:      3
Presentation:           3

Accept it.

b)Words suitable for authors

I recommend " weak acceptance". The work presented has no significant
contribution to the area of Java benchmarking yet, but it summarizes
well the effort this research group and others are imposing towards
defining benchmarks for Java Grande applications. The main problem
in this work is that the reviewer doesn't think the paper results
test the methodology they are suggesting for Java benchmarking.

1) To have Java benchmarks being released in source code rather than
bytecode programs is an important design decision for benchmarks. I
strongly support source code release. Much performance analysis will
depend on program semantics that though possible to retrieve from
bytecodes it is easier to retrieve such info from source code.
This way, it is also factored out the fact that the benchmarks would
depend on the same Java fron-end/ Bytecode compiler, leaving to the
JVM performance analyser open choices of compilers it can use.

2) Section 5, Current Status, subsection 5.1, Low Level Operations
Cast operations tests should include the dynamic type checking tests
for casting objects (reference) types. Primitive data types casting
checks are not the only ones that are of relevance. JVMs have to
perform many implicit dynamic type checks and would be interesting
to see how different Java engines optimize these checks.

3) Section 6.1, Programming language Comparison
The code versions in C and Fortran are 100% Java? These code versions
have been modified to include Java implicit run-time checks? If not,
the results not only reflect the differences in language implementation
but also the overhead of extra security policies imposed by Java.
The authors should point this detail out.

4) Section 6.2, JVM comparison
The whole purpose of this benchmark suite construction was to point
out where the differences among JVMs are. However the results presented
only permit to say whether a certain JVM performs better or worse in
relation to the others, no more detailed insight. So these results
do not represent what the initial goal of this project was. How can
more detailed performance comparison info can be extracted from the
benchmark suite already constructed so far???

One problem pointed out by the authors is related to how to force
JIT compilation of methods in different JVM engines. I don't see
that as an issue. The performance analyser has to understand that
there are different technologies for execution Java, and these
different technologies yield different performance improvement
and require different system requirements. So, when comparing
technologies, he has to be aware that different systems exist
and comparison across systems may not be fair. Overall you
can see technologies for executing Java in the following groups:
Interpreters
JIT compilers
  - simple, baseline compilers
  - more optimizing compilers, but quick
  - dynamic optimizing and re-optimizing compilers

c)Words for the committee, if  necessary

I recommend "weak acceptance". The work presented has no significant
contribution to the area of Java benchmarking yet, but it summarizes
well the effort this research group and others are imposing towards
defining benchmarks for Java Grande applications. The main problem
in this work is that the reviewer doesn't think the paper results
test the methodology they are suggesting for Java benchmarking.

Referee 3 ********************************************************
Subject: C434 JGSI Review


a) Overall: Good

b)
I think this benchmark suite is pretty precious. Especially,
"transparency" described in section 3 makes this suite very valuable.
Results from benchmark tests are very interesting, but if
possible, the reason for the results (e.g. Why is NT + HotSpot good at
Search but bad at MolDyn?) might help programmers to write efficient
codes. I am eagerly looking forward to the parallel version of JGF
benchmark.