Paper Number: C424
Title: Performance Measurement of Dynamically Compiled Java Executions
Authors: Tia Newhall and Barton P. Miller

The paper describes Paradyn-J, a Java performance measuring tool.  The
paper describes how Paradyn-J copes with dynamically compiled
executions, correlating the performance of the bytecode and native code
versions of the same method, and also accounting for the dynamic
compilation costs.  Constructs that limit the effectiveness of dynamic
compilation are identified.  Using performance data from Paradyn-J, the
authors achieved speedups of about 10% using a simulated JIT compiler
(i.e., the native code was produced by hand, rather than by a real JIT
compiler).  Both synthetic benchmarks and a neural net application are
used to generate results.

Recommendation: Major revision.  The authors do not show how the results
  they obtained with a simulated JIT compiler relate to any real JIT
  compiler, and the results may be significantly better than could be
  obtained with a real JIT compiler (see major comment 1 below) or
  statistically insignificant (see major comment 2 below).

Comments for the authors
------------------------
Major comments:

1. The major flaw I see with this paper is the use of a simulated JIT
   compiler.  This is justified on page 6 by saying that no source code
   was available for ExactVM or HotSpot.  However, this is not
   sufficient justification.  Source code *is* available for Sun's
   Reference Implementation of the JDK, which includes a JIT compiler.
   Also, Sun is not the only producer of JIT compilers.  The Kaffe JVM,
   from Transvirtual, includes a JIT and is available under the Gnu
   Public License.  Furthermore, simulating the JIT compiler rather than
   using a real one introduces factors which need to be controlled for.
   The paper offers no evidence that this was done.  These factors
   include:

   a) The quality of the generated code.  The paper seems to indicate
      that the compiled code was written and tuned by hand.  JIT
      compilers are not known for producing high quality native code.
      Hence, the results may have been skewed in the authors' favor by
      running native code that was significantly better than a JIT
      compiler would produce.

   b) Memory usage.  The simulator, as described, will use less memory
      than a real JIT compiler, since the JIT will end up with both the
      bytecode and native code versions AND also need space for holding
      information during the compile.  This, again, may skew the results
      in the authors' favor.

   c) Dynamic library costs.  The simulator loads the precompiled code
      in a dynamic library, whereas a JIT would attach the native code
      to the class file using attributes.  The costs of managing the
      linking of a dynamic library versus looking up the compiled code
      in the classfile are unknown.

   All of this makes me suspect that the results obtained are better
   than one could expect with a real JIT, but the paper gives
   insufficient information to tell for certain.  Also, this means that
   the statement on lines 3-4 of page 11 is false: it was demonstrated
   for a simulated JIT, but we don't know if Paradyn-J can associate
   performance data in that way with a real JIT or not.

2. The footnotes on pages 4 and 5 refer to validating experiments.  It
   would be good if Table 1 (or some other table) included an explicit
   control experiment, which would be an expansion on those noted in the
   footnotes.  Show me the normal case, so I can understand how well
   your efforts have paid off.  Also, I don't see any confidence
   intervals or standard deviations.  If the standard deviations are
   high, the differences between the two columns may not be
   statistically significant.

3. The word "simulated" on page 1 appears to refer to the programs.  I
   was expecting performance results for benchmark-type programs, rather
   than real applications.  Then on page 3, the word "simulated" is used
   again.  This time, it appears to refer to the execution of the
   programs.  Now I was expecting performance results from a simulated
   Java system.  Finally, on page 6 I learn that the word "simulated"
   refers to the dynamic (JIT) compiler.  Make the referent of the word
   "simulated" clear.

4. I don't understand the last paragraph on page 10.  Does this imply
   that your technique only works on one method at a time?

5. In section 3.3 (page 7), why not use a custom class loader to insert
   the instrumentation when the class is loaded?  That way you could
   avoid copying the entire method to the heap, saving on memory costs.
   This approach would also remove the problems mentioned at the bottom
   of page 8.  Also, note that the objections to such preinstrumentation 
   raised on page 16 are not insurmountable: one could certainly leave
   space for instrumentation in each loaded class file (with NOPs), and
   do the instrumentation later, for example.  Only the last objection
   on page 16 is really an objection to preinstrumentation.

6. Also on page 16, preinstrumentation by certain tools is criticized in 
   a footnote, which declares that these tools are not thread aware.
   But then on page 17, the authors admit that Paradyn-J does not
   properly handle multithreaded programs either!  If you meant that
   inability to handle multithreading only matters for preinstrumenting
   JIT compilers, then say so, but be prepared to justify it.

7. Section 2 seems too long.  Is it really necessary to divide cases 1
   and 2 after distinguishing that VM interaction are a sort of runtime
   library?  For example, code for I/O and object creation can both be
   seen as external to the program and do not appear to need separate
   cases.

8. In Section 3, it would have been nice to see recommended JVMPI
   extensions.

Minor comments:

  Replace all the instances of "byte-code" with "bytecode", which is the 
  way the word is written in Sun's documentation.

  The first part of the paper uses the phrase "dynamic compiler", but
  not "JIT compiler".  The last part of the paper uses "JIT compiler",
  but not "dynamic compiler".  Some kind of consistency would be good.

  There are several run-on sentences.  Most are the result of using a
  comma where a semicolon is needed.  (For example, the third sentence
  on page 6 has this problem.)

  Abstract, 1st sentence: the word "promises" is quite strong, too
  strong in my opinion.

  Abstract, last sentence: "*The* results of our work are *a* guide
  ... as *to* what type of ..."

  Page 2, line -7: "... of *the* JDK [19]."

  Page 5, line 5: "... 100,000 iteration case is due *to* two factors."

  Page 5, paragraph beginning "Case 3:", last line: "... three small
  methods *would* not result ..." (tense agreement).

  Page 6, section 3.2: What if the instrumented function begins with a
  loop?  Then there will be a branch back to the instrumentation jump
  later in the function.  What about cache effects?  [I realize that
  none of this concerns the main topic of the paper; the authors should
  find a way to succinctly answer such concerns, or avoid raising them
  in the first place.]

  Page 7, last full paragraph: While method call costs are unavoidable,
  since CPU time must be measured, those costs CAN be quantified!

  Page 8, line 6: This is the first mention of the SPARC architecture.
  Mention it earlier, probably somewhere in the Introduction.  Also, why 
  did you choose the Sparc architecture instead of the x86 architecture?
  Why not both?

  Page 8, last paragraph: The name do_baseTramp is meaningless to me,
  and seema to be a weird mixture of C-style underscore-based names and
  Java-style title case-based names.

  Page 8, last paragraph: Under what circumstances is it unsafe to
  instrument a method?

  Page 9, 3rd paragraph: Why is this a problem?  The first use of the
  do_baseTramp method will automatically load the class, won't it?
  Line 1: "... Paradyn-J has to get the VM [omit "has"] to load the ..."

  Page 10, line 1: "... the class'*s* methods."

  Page 10, 2nd full paragraph, line 1: omit the first comma.

  Page 10, 1st paragraph of section 4, line 4: "... neural network
  application*, thereby* improving ..."

  Page 11 and later: Choose one of "CPUtime", "CPUTime", and "CPU
  time" instead of using all three.  I suggest the last one.

  Page 11, 1st paragraph, line -4: choose one of "each" and "all".

  Page 13, Table 2: How many trials were performed?  What is the
  standard deviation or confidence interval?

  Page 14, Table 3: Why is there a blank space for case 3?  Was the cost 
  0?  Was it nonzero but negligible?

  Page 15, line 1: "... (4.9 us to 6.7 us per call)*;* however, ..."

  Page 15, line 15: omit the apostrophe in "developer's".

  Page 16, last full paragraph: Why no company name for JProbe, but a
  company name for OptimizeIt?  Be consistent.

  References: check capitalization of everything.  It is inconsistent.

  References: what is the significance of the dates on some of the
  references that contain only URLs?

  Reference 6: There should be an umlaut over the 'o' in Holzle.

  Reference 11: There is a missing 's' somewhere in that title.

  Reference 15 is useless as a references.  This should probably be a
  footnote or parenthetical remark.