Paper Number: C424 Title: Performance Measurement of Dynamically Compiled Java Executions Authors: Tia Newhall and Barton P. Miller The paper describes Paradyn-J, a Java performance measuring tool. The paper describes how Paradyn-J copes with dynamically compiled executions, correlating the performance of the bytecode and native code versions of the same method, and also accounting for the dynamic compilation costs. Constructs that limit the effectiveness of dynamic compilation are identified. Using performance data from Paradyn-J, the authors achieved speedups of about 10% using a simulated JIT compiler (i.e., the native code was produced by hand, rather than by a real JIT compiler). Both synthetic benchmarks and a neural net application are used to generate results. Recommendation: Major revision. The authors do not show how the results they obtained with a simulated JIT compiler relate to any real JIT compiler, and the results may be significantly better than could be obtained with a real JIT compiler (see major comment 1 below) or statistically insignificant (see major comment 2 below). Comments for the authors ------------------------ Major comments: 1. The major flaw I see with this paper is the use of a simulated JIT compiler. This is justified on page 6 by saying that no source code was available for ExactVM or HotSpot. However, this is not sufficient justification. Source code *is* available for Sun's Reference Implementation of the JDK, which includes a JIT compiler. Also, Sun is not the only producer of JIT compilers. The Kaffe JVM, from Transvirtual, includes a JIT and is available under the Gnu Public License. Furthermore, simulating the JIT compiler rather than using a real one introduces factors which need to be controlled for. The paper offers no evidence that this was done. These factors include: a) The quality of the generated code. The paper seems to indicate that the compiled code was written and tuned by hand. JIT compilers are not known for producing high quality native code. Hence, the results may have been skewed in the authors' favor by running native code that was significantly better than a JIT compiler would produce. b) Memory usage. The simulator, as described, will use less memory than a real JIT compiler, since the JIT will end up with both the bytecode and native code versions AND also need space for holding information during the compile. This, again, may skew the results in the authors' favor. c) Dynamic library costs. The simulator loads the precompiled code in a dynamic library, whereas a JIT would attach the native code to the class file using attributes. The costs of managing the linking of a dynamic library versus looking up the compiled code in the classfile are unknown. All of this makes me suspect that the results obtained are better than one could expect with a real JIT, but the paper gives insufficient information to tell for certain. Also, this means that the statement on lines 3-4 of page 11 is false: it was demonstrated for a simulated JIT, but we don't know if Paradyn-J can associate performance data in that way with a real JIT or not. 2. The footnotes on pages 4 and 5 refer to validating experiments. It would be good if Table 1 (or some other table) included an explicit control experiment, which would be an expansion on those noted in the footnotes. Show me the normal case, so I can understand how well your efforts have paid off. Also, I don't see any confidence intervals or standard deviations. If the standard deviations are high, the differences between the two columns may not be statistically significant. 3. The word "simulated" on page 1 appears to refer to the programs. I was expecting performance results for benchmark-type programs, rather than real applications. Then on page 3, the word "simulated" is used again. This time, it appears to refer to the execution of the programs. Now I was expecting performance results from a simulated Java system. Finally, on page 6 I learn that the word "simulated" refers to the dynamic (JIT) compiler. Make the referent of the word "simulated" clear. 4. I don't understand the last paragraph on page 10. Does this imply that your technique only works on one method at a time? 5. In section 3.3 (page 7), why not use a custom class loader to insert the instrumentation when the class is loaded? That way you could avoid copying the entire method to the heap, saving on memory costs. This approach would also remove the problems mentioned at the bottom of page 8. Also, note that the objections to such preinstrumentation raised on page 16 are not insurmountable: one could certainly leave space for instrumentation in each loaded class file (with NOPs), and do the instrumentation later, for example. Only the last objection on page 16 is really an objection to preinstrumentation. 6. Also on page 16, preinstrumentation by certain tools is criticized in a footnote, which declares that these tools are not thread aware. But then on page 17, the authors admit that Paradyn-J does not properly handle multithreaded programs either! If you meant that inability to handle multithreading only matters for preinstrumenting JIT compilers, then say so, but be prepared to justify it. 7. Section 2 seems too long. Is it really necessary to divide cases 1 and 2 after distinguishing that VM interaction are a sort of runtime library? For example, code for I/O and object creation can both be seen as external to the program and do not appear to need separate cases. 8. In Section 3, it would have been nice to see recommended JVMPI extensions. Minor comments: Replace all the instances of "byte-code" with "bytecode", which is the way the word is written in Sun's documentation. The first part of the paper uses the phrase "dynamic compiler", but not "JIT compiler". The last part of the paper uses "JIT compiler", but not "dynamic compiler". Some kind of consistency would be good. There are several run-on sentences. Most are the result of using a comma where a semicolon is needed. (For example, the third sentence on page 6 has this problem.) Abstract, 1st sentence: the word "promises" is quite strong, too strong in my opinion. Abstract, last sentence: "*The* results of our work are *a* guide ... as *to* what type of ..." Page 2, line -7: "... of *the* JDK [19]." Page 5, line 5: "... 100,000 iteration case is due *to* two factors." Page 5, paragraph beginning "Case 3:", last line: "... three small methods *would* not result ..." (tense agreement). Page 6, section 3.2: What if the instrumented function begins with a loop? Then there will be a branch back to the instrumentation jump later in the function. What about cache effects? [I realize that none of this concerns the main topic of the paper; the authors should find a way to succinctly answer such concerns, or avoid raising them in the first place.] Page 7, last full paragraph: While method call costs are unavoidable, since CPU time must be measured, those costs CAN be quantified! Page 8, line 6: This is the first mention of the SPARC architecture. Mention it earlier, probably somewhere in the Introduction. Also, why did you choose the Sparc architecture instead of the x86 architecture? Why not both? Page 8, last paragraph: The name do_baseTramp is meaningless to me, and seema to be a weird mixture of C-style underscore-based names and Java-style title case-based names. Page 8, last paragraph: Under what circumstances is it unsafe to instrument a method? Page 9, 3rd paragraph: Why is this a problem? The first use of the do_baseTramp method will automatically load the class, won't it? Line 1: "... Paradyn-J has to get the VM [omit "has"] to load the ..." Page 10, line 1: "... the class'*s* methods." Page 10, 2nd full paragraph, line 1: omit the first comma. Page 10, 1st paragraph of section 4, line 4: "... neural network application*, thereby* improving ..." Page 11 and later: Choose one of "CPUtime", "CPUTime", and "CPU time" instead of using all three. I suggest the last one. Page 11, 1st paragraph, line -4: choose one of "each" and "all". Page 13, Table 2: How many trials were performed? What is the standard deviation or confidence interval? Page 14, Table 3: Why is there a blank space for case 3? Was the cost 0? Was it nonzero but negligible? Page 15, line 1: "... (4.9 us to 6.7 us per call)*;* however, ..." Page 15, line 15: omit the apostrophe in "developer's". Page 16, last full paragraph: Why no company name for JProbe, but a company name for OptimizeIt? Be consistent. References: check capitalization of everything. It is inconsistent. References: what is the significance of the dates on some of the references that contain only URLs? Reference 6: There should be an umlaut over the 'o' in Holzle. Reference 11: There is a missing 's' somewhere in that title. Reference 15 is useless as a references. This should probably be a footnote or parenthetical remark.