Subject: C435 JGSI Review
Resent-Date: Thu, 30 Sep 1999 23:19:26 -0400
Resent-From: Geoffrey Fox <gcf@npac.syr.edu>
Resent-To: p_gcf@npac.syr.edu
Date: Mon, 20 Sep 1999 11:45:07 -0400 (EDT)
From: Bill Pugh <pugh@cs.umd.edu>
To: gcf@npac.syr.edu

Paper: C435
Title: Annotating Java Class files with Virtual Registers for Performance
Authors: Joel Jones and Samuel Kamin

Overall recommendation: The paper raising an interesting issue but doesn't
answer any of the important questions about it. Weak recommendation
against publication.

The authors describe techniques for annotating class files with additional
information to assist in performing register allocation when methods are
compiled to native code.

The basic problem is that the paper doesn't address the difficult
and important topics associated with this issue.

The empirical results in the paper show that using annotations, it is
possible to perform better global register allocation than the very weak
local register allocation performed by kaffe. That was never in question,
and all commercial JIT's that I know of do better register allocation
than kaffe. The interesting issues are:

* Compare two methods for generating code:

        1) Generate native code using a good register allocation
        algorithm at JIT time.

        2) Generate native code using the same good register allocation
        algorithm, but assisted by class annotations.

  What takes longer? Generating code directly (method 1), or downloading
  and verifying the class annotations and then generating code using
  the annotations (method 2)?

* How much do you lose by trying to do machine independent register
  allocation? Different targets have widely varying number of registers,
  and some machines have constraints on which registers can be used for
  which operations.  All of the results in the paper are for Sparcs.
  What about IA-32?

* Many high performance JIT's do massive amounts of inlining. Is information
  like this useful in such a context? The people I talked to on the HotSpot
  project thought that such information would be useless for them.

* How many of the problems that the author's raise could be addressed
  by bytecode optimization and transformation? For example, in Figure 11,
  bytecode optimization could delay the initialization of sum2 and
  allow sum1 and sum2 to share the same local. The standard bytecode
  generated by javac is completely unoptimized. But there isn't any reason
  why we can't optimize it.