Java Grande Meeting in Menlo Park March 11-12 99
Notes by Ron Boisvert
Geoffrey Fox was present by phone link on first morning
Note the the major Sun Involvement in this important meeting
See contemporaneous talk
http://www.npac.syr.edu/users/gcf/frontiersfeb99/index.html
at Frontiers 99 Conference Annapolis February 21-25 99
Java Grande Forum (JGF)
Numerics Working Group (NWG)
Executive Committee Meeting
March 11-12, 1999
Sun Microsystems, Menlo Park, CA
Attendees
Thomas Arkwright, Sun, forjava@sun.com
Joshua Bloch, Sun, Joshua.Bloch@eng.sun.com
Ron Boisvert, NIST, Co-chair, boisvert@nist.gov
John Brophy, Visual Numerics, jbrophy@houston.vni.com
Shah Datardina, NAG, shah@nag.co.uk
Jack Dongarra, U. Tenn. Knoxville & ORNL, dongarra@cs.utk.edu
Geoffrey Fox, Syracuse (via phone), gcf@npac.syr.edu
James Gosling, Sun, jag@eng.sun.com
Sia Hassanzadeh, Sun, Siamak.Hassanzadeh@eng.sun.com
Tim Lindholm, Sun, Timothy.Lindholm@eng.sun.com
Cleve Moler, The Mathworks, moler@mathworks.com
Jose Moreira, IBM, jmoreira@us.ibm.com
Roldan Pozo, NIST, Co-chair, pozo@nist.gov
Visitors
David Hough, Sun, validgh@validgh.com
Henry Sowizral, Sun, henry.sowizral@eng.sun.com
Greg Tarsey, Sun, gregory.tarsey@eng.sun.com
AGENDA
Presentations
Overview of the JGF and the NWG, Ron Boisvert
JAMA: A Matrix Class for Java, Cleve Moler
High Performance Java Compilation, Jose Moreira
ATLAS and the BLAS, Jack Dongarra
Discussion Topics
1. Operator Overloading and Lightweight Objects
2. Floating-point Model in Java 1.2
3. Use of Floating Multipy-Accumulate (FMA)
4. Math Library Specification
5. Additional Floating-point Optimizations
6. Relationship to BigDecimal
7. Relationship with Vecmath
-------------------------------------------------------------------------------
SUMMARY
-------------------------------------------------------------------------------
The Working Group expressed its appreciation to Sun for taking the time to
exchange information on progress and plans for improving Java's usability
for numerical computations common to most Grande applications. Tim Lindholm
said that the Working Group was providing valuable input to Sun and that its
work had already had a significant effect on Java. Sun realizes that although
scientific and engineering computation is not its primary market for Java, it
represents an important consituency which they would like to accommodate. Sun
is happy to cooperate with the Java Grande Forum and will seek their advice on
matters relating to Java numerics. They appreciate the Forum's willingness to
find compromise solutions which preserve Java's overall design and philosophy.
(See attached note from Tim Lindholm.) Sun will reassign a staff member to
spend full time working on Java numerics issues within the Java inner circle.
The report of the Java Grande Forum was influential in preventing the adoption
of the Proposal for Extension of Java Floating Point Semantics released by
Sun in May 1998. Instead, for Java 1.2 Sun adopted the key ideas of the Java
Grande proposal: two modes (strictfp and default), with default admitting the
use of an extended exponent for the representation of anonymous variables.
Additional optimizations, including the use of the floating multiply accumulate
(FMA), were not allowed. Sun is willing to consider admitting use of the
FMA under some circumstances, and the Working Group was asked to present a
draft proposal within two weeks. Additional types of optimizations cannot
be considered unless they are carefully characterized and their use understood.
Sun is hoping to improve the specification of the functions in java.lang.Math
soon. Options include specifying a correctly rounded result in strict mode
and allowing deviations of about 1 ulp in default mode, allowing careful use
of hardware sqrt and trig functions. The former may be difficult to achieve;
a group at IBM Haifa claims to have such a library in C with good performance,
although its size may be problematical. VNI announced the availability of
a pure Java implementation of fdlibm; this could be used as an operational
definition of the functions in strict mode. The Working Group agreed to
evaluate each of these libraries and provide a draft proposal to Sun in
four weeks.
The group also discussed the need for operator overloading and lightweight
objects. These were seen as crucial to the numerical community, but somewhat
controversial outside of it. Sun is ameanable to considering such proposals
using its Community Process. Since major changes in the language have not
yet been tried using this process, Gosling suggested that Sun itself should
manage the proposals. Joe Darcy of Sun will be in charge of developing
concrete versions of these proposals. The Working Group will help provide
justification and seek out other constituencies that might support it. It
was deemed best to generate two separate proposals.
Finally, the Working Group discussed overlap of interests with Sun staff
working on Java3D and BigDecimal. Sun will soon issue a "call for experts"
to consider the extension of the vecmath package in javax, which provides
linear algebra support for Java3D. It was suggested that the LU, Cholesky
and SVD codes in this package would be better deprecated, along with the
generation of a full-fledged general-purpose linear algebra package for
Java. The latter would be managed by numerical analysts. The Working Group
agreed to provide representation on this expert group.
ACTION ITEMS
1. Draft proposal for FMA in Java. Deliver to Lindholm in two weeks.
(Boisvert)
2. Draft proposal for Math library. Deliver to Lindholm in four weeks.
(Boisvert)
3. Evaluate VNI's fdlibm. (Moler)
4. Evaluate IBM Haifa implementation of math functions. (Moler)
5. Contact Joe Darcy to begin plan for proposals for operator overloading
and lightweight objects. (Boisvert)
6. Provide people to work with Henry Sowizral when he issues a "call for
experts" for the evolution of javax.vecmath. (All)
-------------------------------------------------------------------------------
DETAILS
-------------------------------------------------------------------------------
1. Operator Overloading and Lightweight Objects
a Sun clearly sees the need for these extensions for the numerics community.
b Sun is worried about the reaction of "purists" to a proposal for operator
overloading. Ideally, they would like to see mechanisms which led to
only reasonable usage. (Examples: use descriptive names for overloaded
methods; require implementation of an arithmetic interface.) It does not
seem possible to stop the truly perverse, however.
c Lightweight objects are a little more problematical. There are many ways to
provide such a facility, and these would have to be fleshed out.
Lightweight objects can be first introduced in the Java language with
little or no change to the VM. However, extensive changes to the VM
may be necessary to actually deliver performance benefits.
(Major problem is how to return lightweight objects from a method.)
d Sun has little experience in the community process for making changes to the
Java language itself. As a result, Gosling suggested that Sun should take
the lead in developing proposals for operator overloading and lightweight
classes. Hassanzadeh offered to fund the assignment of Joe Darcy within
Sun to work with Gosling, Lindholm and their colleages on developing
these proposals. Gosling and Lindholm agreed (enthusiastically).
e The JGF would help in the proposal process, providing justification of the
need, providing comments, etc. It was suggested that finding allies in
other user communities might help. The financial community (which uses
decimal arithmetic, for example) or the graphics community were cited
as posibilities. [Later discussions with Henry Sowizral indicated that
there is probably not a compelling need for operator overloading in the
graphics community.]
f The timing for such proposals was discussed. It is unlikely that major
language changes will be on the table for the next few releases, so this
is viewed as a longer term project. It was also agreed that the proposals
for operator overloading and lightweight classes should be unbundled in
order to maximize the chances for success with "the process".
g We learned that a group is drawing up a proposal for generic types in Java
and that it too would be put through the process. (Generic types have
less of an impact on the language itself and hence it is viewed as less
controversial.)
h IBM has done research on supporting lightweight objects. Representing
complex numbers as full-fledged objects results in a performance
of 1.1 Mflops for matrix-multiply and 1.7 Mflops for a CFD kernel
(two-dimensional convolution). Using lightweight objects and extensive
compiler optimization raises those numbers to 89.5 Mflops (MATMUL) and
60.5 Mflops (CFD). Fortran numbers are 136.4 and 66.5 Mflops,
respectively. (All experiments performed on an RS/6000 model 590.)
-------------------------------------------------------------------------------
2. Floating-point Model in Java 1.2
a Input to Sun from the JGF Numerics Working Group helped kill the "Proposal
for Extension of Java Floating Point Semantics (Revision 1)" (PEJFPS) of
May 1998. Changes to Java floating-point semantics in Java 1.2 are based
on those proposed by the Java Grande Forum. Because of time constraints,
only the first few of the JGF proposed modifications were implemented.
These are described in the following paragraphs.
b There are two floating-point modes: default and strictfp. Strict mode is
indicated by the programmer using the strictfp keyword. (The widefp
keyword of PEJFPS was dropped.)
c strictfp implements the classic Java floating-point model.
d Default mode allows the use of wider (15-bit) exponents to be used to
represent anonymous variables on the stack. This allows Intel processors
to run at full speed, since extended precision can be disabled using a
control word. The differences in observed results between x86 and SPARC
processors would occur only when the SPARC would overflow and the x86
would not, a relatively rare event.
e The JGF requirement that the use of extended exponents be used consistently
was not adopted. (This would require that either _none_ or _all_ anonymous
variables be represented using an extended exponent.) This was not done
because it would require adding new JVM instructions to push, pop and
reverse extended exponent numbers on the stack. [Why?]
f No floating-multiply-accumulate (FMA) instructions are allowed.
g Associative fp was not adopted.
h Those features that were not implemented are not out of the running. They
just need more study to understand their effects more clearly before they
are considered.
-------------------------------------------------------------------------------
3. How Can Java Accommodate the FMA?
a Sun appears willing to consider allowing use of the fused multiply accumulate
instruction under carefully controlled circumstances.
b The following rough proposal was hammered out at the meeting. A method would
be added to java.lang.Math, say fma(y,a,x), which would return a*x + y. Its
behavior would be different depending on the current floating-point mode.
strictfp : Forced FMA. Requires use of extended precision FMA, even if it
requires simulation in software. That is: y, a, x are each converted
to IEEE extended format (double or float as appropriate), the
computation occurs in extended format, with the result rounded
to double or float as appropriate.
default : Opportunistic FMA. The FMA should be used if it can be done fast,
e.g. as a single hardware instruction, otherwise the expression
a*x+y should be computed using the usual Java floating-point
semantics.
c Compilers would be forbidden to use hardware FMAs except as a replacement for
the fma() method invocation in default mode. Moreira pointed out that
compilers are often able to locate opportunities for using FMAs that the
programmer did not realize were present, thus providing additional speedup.
In particular, for the SPEC95 benchmark APPLU, when the compiler was free
to rearrange computations at will, a 30% reduction in dynamic instruction
count was observed.
d fma()+strictfp can be used to force the computation of a*x+y in extended
precision. However, since the result is a double (or float), this does not
provide a means for the extended precision accumulation of inner products.
e fma()+default can be used to speed up expression evaluation by the use
of FMAs on processors that support it, without paying the extra cost of
simulating it on processors that don't. In default mode fma() will return
slightly different results on processors which have native FMA instructions
than on processors that don't.
f Sun would like a concrete proposal from the working group by March 26.
g IBM has analyzed the impact of using the FMA operation in several Java
benchmarks. For matrix-multiply (real numbers), performance can be
improved from 100.9 Mflops to 209.8 Mflops through the use of FMAs.
For BSOM (a neural-network data mining kernel) the improvement is from
94.5 Mflops to 120.5 Mflops. For Cholesky factorization, the improvement
is from 83.4 to 129.9 Mflops. These FMAs were all obtained from
explict a*b+c operations. The Fortran numbers for MATMUL, BSOM, and
Cholesky are 244.6, 128.0, and 134.0, respectively. (All experiments
performed on an RS/6000 model 590.)
-------------------------------------------------------------------------------
4. Math library specification.
a The current specification for java.lang.Math says, roughly, to use the
algorithms of fdlibm translated straightforwardly to Java. It is not clear
who, if anyone, does this. In practice, Java programs which reference
elementary functions are not producing the same results on all Java
platforms.
b In JDK 1.2 Sun is distributing Java wrappers for fdlibm in C for
java.lang.Math.
c Brophy announced that he has produced a complete translation of Sun's fdlibm
in Java. This is posted at http://www.vni.com/corner/garage/grande/.
Numerics Working Group members were invited to test this library. Moler
agreed to test this implementation.
d Moreria announced that Abraham Ziv and colleagues at the IBM Haifa Labs have
developed a suite of correctly rounded math functions for IEEE arithmetic
in C. These functions are claimed to cost only about 10% more than fdlibm
in computing time. (This would be remarkable.) The Haifa researchers
claim to have a proof of correctness for their algorithms, but it is
proprietary. Several present expressed skepticism about proofs that are
not public.
e Unfortunately, the footprint of the Haifa library seems large. The
compiled binary library occupies about 250Kb (and compresses to 185 Kb). One
of the reasons for its size is the inclusion of large tables. (The source
form of the atan table is 500 kB, for example.) It is probably possible
to trade space for time in the algorithm, but it is not clear, say, how
much slower a library of half the current size would run.
f Greg Tarsey's group at Sun (including KC Ng) are working on a similar
library, which is not yet complete. It is based on slightly different
algorithms than those used by the Haifa group. (The Haifa code works
in fixed space, which their proofs show to be sufficient, while the Sun
codes repeat computations in increasing precision as necessary until the
result is satisfied.) The Sun codes run very roughly three times longer
than fdlibm at present, but probably could be sped up considerably.
g Possible specification for strictfp mode: produce the correctly rounded
result.
Pros: This is the ideal definition.
Cons: The Java code for this does not yet exist. Existing algorithms would be
much too large for inclusion in the Java core. The resulting code
might run too slowly (a factor of 2 is unacceptable). The code would
be difficult to test.
h Possible specification for strictfp mode: mandate use of VNI's native
Java fdlibm.
Pros: This code is available now. It is reasonably fast and has a small
footprint. It produces results that are usually within one unit in
the last place (ulp) of the correctly rounded result.
Cons: An operational definition like this would mandate incorrect results.
Incorporating future imporvements in the algorithms yielding better
results would change the behavior of existing Java class files.
i Possible specification for default mode: specify largest acceptable relative
error in result for each function (e.g. 1 ulp), being sure that the bounds
were satisfied by fdlibm.
Pros: This allows flexibility of implementation. It allows (guarded) use of
specialized hardware instructions for sqrt, sine, cosine, etc. It
allows improvements in algorithms producing more accurate results
to be trivially accommodated.
Cons: Careful testing of fdlibm would have to be done to insure it satisfied
the criteria; this might be difficult to do rigorously, but could be
done satisfactorily by relying on comparisons with a multiprecision
system like Maple or Mathematica. Exhaustive testing in pure Java
would be difficult.
j Tim Lindholm asked the Working Group to draft a proposal before April 9.
-------------------------------------------------------------------------------
5. Additional Floating-point Optimizations
a There was a brief discussion of whether, and under what circumstances,
additional compiler optimizations should be allowed. Examples include use
of the associative law, transformation of 0*x and x-x to zero, etc.
b The attendees agreed that such optimizations can sometimes improve
performance, however, no one wants optimizations that cause the program
to behave in ways other than the programmer intended. This requirement is
difficult to quantify.
c Attendees were dubious about transformations like 0*x to 0, since these
are usually incorrect when one of the variables is not finite.
d It was agreed that inability to apply optimizations like these was not
the main reason for the poor performance of current Java systems, and
hence that it was not crucial to address such issues in the short term.
-------------------------------------------------------------------------------
6. Relationship to the BigDecimal Package.
a Joshua Bloch of Sun briefed the Working Group on the status of BigDecimal.
This class provides basic support for multiple precision real arithmetic.
The target audience is the business community. Internal representation of
numbers is by an integer and a scale (another integer). Add, subtract and
multiply are done exactly. Division is not exact; the user can select from
eight different rounding modes for the result.
b Bloch informed the group of a proposal from IBM (authored by Mike
Cowlishaw) to extend BigDecimal to include pow(), as well as to add support
for multiple precision decimal floating-point. He asked the group whether
this would be of use in scientific computation. The group responded that
multiple precision floating-point was often quite useful in scientific
computing, but that the package would not satisfy most needs unless all
of the functions in java.lang.Math (sin(), exp(), sqrt(), etc.) were
included. Additional discussion ensued about the base for representation
of numbers in such a package. Hough explained that base two was known to
be optimal. In particular, the same precision, binary is faster, and for
the same speed, binary delivers better precision.
-------------------------------------------------------------------------------
7. Relationship to the VecMath package
a Henry Sowizral of Sun briefed the group on the current status of VecMath.
javax.vecmath VecMath is a package which provides a variety of low level
mathematical utilities of use in computer graphics. Examples include
multiplication of 2x2, 3x3 and 4x4 matrices. Also included in a class
Gmatrix which implements operations on general nxn matrices. Methods for
computing LU, Cholesky and singular value decompositions are included.
b Moler posed several problems for vecmath. The first was the SVD of a 3x3
matrix of zeros. The result returned were left and right singular vectors
of all NaNs. The matrix of singular values was unchanged from its value
on input. This is incorrect. The second problem was the SVD of the rank 2
matrix with rows (1 2 3), (4 5 6), (7 8 9). The computed singular vectors
were correct, but the matrix of singular values was unchanged from its
input state, and the rank was reported incorrectly as 3.
c It was agreed that the numerical analysis community should participate more
actively in the development of APIs for linear algebra. One option would
be for Gmatrix to be deprecated in vecmath and such matrix operations be
provided instead in a separate class for numerical linear algebra which
could be sheparded by the Numerics Working Group.
d There will be a "call for experts" soon to consider extensions to vecmath.
The Working Group agreed to provide representatives to this team to help
work out details for future development of this package.
-------------------------------------------------------------------------------
FEEDBACK FROM SUN FOR NUMERICS WORKING GROUP
-------------------------------------------------------------------------------
Subject: Feedback for the Numerics Working Group
Date: Fri, 12 Mar 1999 15:59:40 -0800 (PST)
From: Tim Lindholm <Timothy.Lindholm@Eng.Sun.COM>
To: boisvert@cam.nist.gov
Hi Ron,
The following gives my perspective on the Java Grande Numerics Working
Group and its relationship and relevance to Sun in Sun's role as the
steward of Java development.
As you know, I'm a Distinguished Engineer at Sun, one of the members
of the original Java project, the author of The Java Virtual Machine
Specification, and currently one of the architects of the Java 2
platform. I work closely with the other architects of the Java
technologies, such as James Gosling, and while I can't speak for them
in detail I can say that the opinions I express below are not out
of line with theirs.
During the development of the Java 2 platform version 1.2 (formerly
known as JDK 1.2), I was handed responsiblity for creating licensee and
industry consensus around changes to Java's primitive floating-point
arithmetic targeted at improving performance on Intel CPUs. This was
an extremely difficult task because it required careful attention to
the balance between many factors, and was being done in an extremely
charged political environment. We who were responsible did not have a
broad background in numerics, and had not been successful finding help
within Sun. We understood that there was high risk: Java had taken a
rather different approach to floating point than many other languages,
and the wrong decision in 1.2 could throw away many of the advantages
of that approach.
Our best attempts led to a public proposal that we considered a bad
compromise and were not happy with, but were resigned to. At this
point the Numerics Working Group wrote a counterproposal to Sun's
public proposal that gave new technical insight on how to balance the
demands of performance on Intel without throwing away the important
part of reproducibility. The counterproposal was both very sensitive
to the spirit of Java and satisfactory as a solution for the
performance problem. When we saw the new proposal we revived efforts
to reach a better answer.
We were subsequently aided by email and phone calls with a number of
members on the Numerics Working Group (you, Roldan, and Cleve). Joe
Darcy, one of the authors of the counterproposal, helped us to under-
stand the proposal generally, and specifically to evaluate the effects
of modifications we required of the counterproposal. All of you helped
us understand how the Numerics Working Group represented a number of
rather different fields with different perspectives on numerics. This
helped us gain confidence that the new solution reflected a consensus
that would satisfy a broad range of Java users.
We are sure that we ended up with a better answer for 1.2, and arrived
at it through more complete consideration of real issues, because of
the efforts of the Numerics Working Group. We have internally resolved
to consult with the Group on future numeric issues, and to do so
earlier in the process. Our attendance at the 3/11 Numerics Working
Group meeting at Sun and all the work we accomplished there is evidence
of this resolution. We think that the Group continues to show great
sensitivity to the needs of the Java platform and the difficulty of
introducing change while preserving compatibility and robustness. We
look forward to continuing this relationship into the future.
-- Tim