Subject: C435 JGSI Review Resent-Date: Thu, 30 Sep 1999 23:19:26 -0400 Resent-From: Geoffrey Fox Resent-To: p_gcf@npac.syr.edu Date: Mon, 20 Sep 1999 11:45:07 -0400 (EDT) From: Bill Pugh To: gcf@npac.syr.edu Paper: C435 Title: Annotating Java Class files with Virtual Registers for Performance Authors: Joel Jones and Samuel Kamin Overall recommendation: The paper raising an interesting issue but doesn't answer any of the important questions about it. Weak recommendation against publication. The authors describe techniques for annotating class files with additional information to assist in performing register allocation when methods are compiled to native code. The basic problem is that the paper doesn't address the difficult and important topics associated with this issue. The empirical results in the paper show that using annotations, it is possible to perform better global register allocation than the very weak local register allocation performed by kaffe. That was never in question, and all commercial JIT's that I know of do better register allocation than kaffe. The interesting issues are: * Compare two methods for generating code: 1) Generate native code using a good register allocation algorithm at JIT time. 2) Generate native code using the same good register allocation algorithm, but assisted by class annotations. What takes longer? Generating code directly (method 1), or downloading and verifying the class annotations and then generating code using the annotations (method 2)? * How much do you lose by trying to do machine independent register allocation? Different targets have widely varying number of registers, and some machines have constraints on which registers can be used for which operations. All of the results in the paper are for Sparcs. What about IA-32? * Many high performance JIT's do massive amounts of inlining. Is information like this useful in such a context? The people I talked to on the HotSpot project thought that such information would be useless for them. * How many of the problems that the author's raise could be addressed by bytecode optimization and transformation? For example, in Figure 11, bytecode optimization could delay the initialization of sum2 and allow sum1 and sum2 to share the same local. The standard bytecode generated by javac is completely unoptimized. But there isn't any reason why we can't optimize it.