Java Grande Meeting in Menlo Park March 11-12 99

Notes by Ron Boisvert

Geoffrey Fox was present by phone link on first morning

Note the the major Sun Involvement in this important meeting

See contemporaneous talk

http://www.npac.syr.edu/users/gcf/frontiersfeb99/index.html

at Frontiers 99 Conference Annapolis February 21-25 99

Java Grande Forum (JGF)

Numerics Working Group (NWG)

Executive Committee Meeting

March 11-12, 1999

Sun Microsystems, Menlo Park, CA

Attendees

Thomas Arkwright, Sun, forjava@sun.com

Joshua Bloch, Sun, Joshua.Bloch@eng.sun.com

Ron Boisvert, NIST, Co-chair, boisvert@nist.gov

John Brophy, Visual Numerics, jbrophy@houston.vni.com

Shah Datardina, NAG, shah@nag.co.uk

Jack Dongarra, U. Tenn. Knoxville & ORNL, dongarra@cs.utk.edu

Geoffrey Fox, Syracuse (via phone), gcf@npac.syr.edu

James Gosling, Sun, jag@eng.sun.com

Sia Hassanzadeh, Sun, Siamak.Hassanzadeh@eng.sun.com

Tim Lindholm, Sun, Timothy.Lindholm@eng.sun.com

Cleve Moler, The Mathworks, moler@mathworks.com

Jose Moreira, IBM, jmoreira@us.ibm.com

Roldan Pozo, NIST, Co-chair, pozo@nist.gov

Visitors

David Hough, Sun, validgh@validgh.com

Henry Sowizral, Sun, henry.sowizral@eng.sun.com

Greg Tarsey, Sun, gregory.tarsey@eng.sun.com

AGENDA

Presentations

Overview of the JGF and the NWG, Ron Boisvert

JAMA: A Matrix Class for Java, Cleve Moler

High Performance Java Compilation, Jose Moreira

ATLAS and the BLAS, Jack Dongarra

Discussion Topics

1. Operator Overloading and Lightweight Objects

2. Floating-point Model in Java 1.2

3. Use of Floating Multipy-Accumulate (FMA)

4. Math Library Specification

5. Additional Floating-point Optimizations

6. Relationship to BigDecimal

7. Relationship with Vecmath

-------------------------------------------------------------------------------

SUMMARY

-------------------------------------------------------------------------------

The Working Group expressed its appreciation to Sun for taking the time to

exchange information on progress and plans for improving Java's usability

for numerical computations common to most Grande applications. Tim Lindholm

said that the Working Group was providing valuable input to Sun and that its

work had already had a significant effect on Java. Sun realizes that although

scientific and engineering computation is not its primary market for Java, it

represents an important consituency which they would like to accommodate. Sun

is happy to cooperate with the Java Grande Forum and will seek their advice on

matters relating to Java numerics. They appreciate the Forum's willingness to

find compromise solutions which preserve Java's overall design and philosophy.

(See attached note from Tim Lindholm.) Sun will reassign a staff member to

spend full time working on Java numerics issues within the Java inner circle.

The report of the Java Grande Forum was influential in preventing the adoption

of the Proposal for Extension of Java Floating Point Semantics released by

Sun in May 1998. Instead, for Java 1.2 Sun adopted the key ideas of the Java

Grande proposal: two modes (strictfp and default), with default admitting the

use of an extended exponent for the representation of anonymous variables.

Additional optimizations, including the use of the floating multiply accumulate

(FMA), were not allowed. Sun is willing to consider admitting use of the

FMA under some circumstances, and the Working Group was asked to present a

draft proposal within two weeks. Additional types of optimizations cannot

be considered unless they are carefully characterized and their use understood.

Sun is hoping to improve the specification of the functions in java.lang.Math

soon. Options include specifying a correctly rounded result in strict mode

and allowing deviations of about 1 ulp in default mode, allowing careful use

of hardware sqrt and trig functions. The former may be difficult to achieve;

a group at IBM Haifa claims to have such a library in C with good performance,

although its size may be problematical. VNI announced the availability of

a pure Java implementation of fdlibm; this could be used as an operational

definition of the functions in strict mode. The Working Group agreed to

evaluate each of these libraries and provide a draft proposal to Sun in

four weeks.

The group also discussed the need for operator overloading and lightweight

objects. These were seen as crucial to the numerical community, but somewhat

controversial outside of it. Sun is ameanable to considering such proposals

using its Community Process. Since major changes in the language have not

yet been tried using this process, Gosling suggested that Sun itself should

manage the proposals. Joe Darcy of Sun will be in charge of developing

concrete versions of these proposals. The Working Group will help provide

justification and seek out other constituencies that might support it. It

was deemed best to generate two separate proposals.

Finally, the Working Group discussed overlap of interests with Sun staff

working on Java3D and BigDecimal. Sun will soon issue a "call for experts"

to consider the extension of the vecmath package in javax, which provides

linear algebra support for Java3D. It was suggested that the LU, Cholesky

and SVD codes in this package would be better deprecated, along with the

generation of a full-fledged general-purpose linear algebra package for

Java. The latter would be managed by numerical analysts. The Working Group

agreed to provide representation on this expert group.

ACTION ITEMS

1. Draft proposal for FMA in Java. Deliver to Lindholm in two weeks.

(Boisvert)

2. Draft proposal for Math library. Deliver to Lindholm in four weeks.

(Boisvert)

3. Evaluate VNI's fdlibm. (Moler)

4. Evaluate IBM Haifa implementation of math functions. (Moler)

5. Contact Joe Darcy to begin plan for proposals for operator overloading

and lightweight objects. (Boisvert)

6. Provide people to work with Henry Sowizral when he issues a "call for

experts" for the evolution of javax.vecmath. (All)

-------------------------------------------------------------------------------

DETAILS

-------------------------------------------------------------------------------

1. Operator Overloading and Lightweight Objects

a Sun clearly sees the need for these extensions for the numerics community.

b Sun is worried about the reaction of "purists" to a proposal for operator

overloading. Ideally, they would like to see mechanisms which led to

only reasonable usage. (Examples: use descriptive names for overloaded

methods; require implementation of an arithmetic interface.) It does not

seem possible to stop the truly perverse, however.

c Lightweight objects are a little more problematical. There are many ways to

provide such a facility, and these would have to be fleshed out.

Lightweight objects can be first introduced in the Java language with

little or no change to the VM. However, extensive changes to the VM

may be necessary to actually deliver performance benefits.

(Major problem is how to return lightweight objects from a method.)

d Sun has little experience in the community process for making changes to the

Java language itself. As a result, Gosling suggested that Sun should take

the lead in developing proposals for operator overloading and lightweight

classes. Hassanzadeh offered to fund the assignment of Joe Darcy within

Sun to work with Gosling, Lindholm and their colleages on developing

these proposals. Gosling and Lindholm agreed (enthusiastically).

e The JGF would help in the proposal process, providing justification of the

need, providing comments, etc. It was suggested that finding allies in

other user communities might help. The financial community (which uses

decimal arithmetic, for example) or the graphics community were cited

as posibilities. [Later discussions with Henry Sowizral indicated that

there is probably not a compelling need for operator overloading in the

graphics community.]

f The timing for such proposals was discussed. It is unlikely that major

language changes will be on the table for the next few releases, so this

is viewed as a longer term project. It was also agreed that the proposals

for operator overloading and lightweight classes should be unbundled in

order to maximize the chances for success with "the process".

g We learned that a group is drawing up a proposal for generic types in Java

and that it too would be put through the process. (Generic types have

less of an impact on the language itself and hence it is viewed as less

controversial.)

h IBM has done research on supporting lightweight objects. Representing

complex numbers as full-fledged objects results in a performance

of 1.1 Mflops for matrix-multiply and 1.7 Mflops for a CFD kernel

(two-dimensional convolution). Using lightweight objects and extensive

compiler optimization raises those numbers to 89.5 Mflops (MATMUL) and

60.5 Mflops (CFD). Fortran numbers are 136.4 and 66.5 Mflops,

respectively. (All experiments performed on an RS/6000 model 590.)

-------------------------------------------------------------------------------

2. Floating-point Model in Java 1.2

a Input to Sun from the JGF Numerics Working Group helped kill the "Proposal

for Extension of Java Floating Point Semantics (Revision 1)" (PEJFPS) of

May 1998. Changes to Java floating-point semantics in Java 1.2 are based

on those proposed by the Java Grande Forum. Because of time constraints,

only the first few of the JGF proposed modifications were implemented.

These are described in the following paragraphs.

b There are two floating-point modes: default and strictfp. Strict mode is

indicated by the programmer using the strictfp keyword. (The widefp

keyword of PEJFPS was dropped.)

c strictfp implements the classic Java floating-point model.

d Default mode allows the use of wider (15-bit) exponents to be used to

represent anonymous variables on the stack. This allows Intel processors

to run at full speed, since extended precision can be disabled using a

control word. The differences in observed results between x86 and SPARC

processors would occur only when the SPARC would overflow and the x86

would not, a relatively rare event.

e The JGF requirement that the use of extended exponents be used consistently

was not adopted. (This would require that either _none_ or _all_ anonymous

variables be represented using an extended exponent.) This was not done

because it would require adding new JVM instructions to push, pop and

reverse extended exponent numbers on the stack. [Why?]

f No floating-multiply-accumulate (FMA) instructions are allowed.

g Associative fp was not adopted.

h Those features that were not implemented are not out of the running. They

just need more study to understand their effects more clearly before they

are considered.

-------------------------------------------------------------------------------

3. How Can Java Accommodate the FMA?

a Sun appears willing to consider allowing use of the fused multiply accumulate

instruction under carefully controlled circumstances.

b The following rough proposal was hammered out at the meeting. A method would

be added to java.lang.Math, say fma(y,a,x), which would return a*x + y. Its

behavior would be different depending on the current floating-point mode.

strictfp : Forced FMA. Requires use of extended precision FMA, even if it

requires simulation in software. That is: y, a, x are each converted

to IEEE extended format (double or float as appropriate), the

computation occurs in extended format, with the result rounded

to double or float as appropriate.

default : Opportunistic FMA. The FMA should be used if it can be done fast,

e.g. as a single hardware instruction, otherwise the expression

a*x+y should be computed using the usual Java floating-point

semantics.

c Compilers would be forbidden to use hardware FMAs except as a replacement for

the fma() method invocation in default mode. Moreira pointed out that

compilers are often able to locate opportunities for using FMAs that the

programmer did not realize were present, thus providing additional speedup.

In particular, for the SPEC95 benchmark APPLU, when the compiler was free

to rearrange computations at will, a 30% reduction in dynamic instruction

count was observed.

d fma()+strictfp can be used to force the computation of a*x+y in extended

precision. However, since the result is a double (or float), this does not

provide a means for the extended precision accumulation of inner products.

e fma()+default can be used to speed up expression evaluation by the use

of FMAs on processors that support it, without paying the extra cost of

simulating it on processors that don't. In default mode fma() will return

slightly different results on processors which have native FMA instructions

than on processors that don't.

f Sun would like a concrete proposal from the working group by March 26.

g IBM has analyzed the impact of using the FMA operation in several Java

benchmarks. For matrix-multiply (real numbers), performance can be

improved from 100.9 Mflops to 209.8 Mflops through the use of FMAs.

For BSOM (a neural-network data mining kernel) the improvement is from

94.5 Mflops to 120.5 Mflops. For Cholesky factorization, the improvement

is from 83.4 to 129.9 Mflops. These FMAs were all obtained from

explict a*b+c operations. The Fortran numbers for MATMUL, BSOM, and

Cholesky are 244.6, 128.0, and 134.0, respectively. (All experiments

performed on an RS/6000 model 590.)

-------------------------------------------------------------------------------

4. Math library specification.

a The current specification for java.lang.Math says, roughly, to use the

algorithms of fdlibm translated straightforwardly to Java. It is not clear

who, if anyone, does this. In practice, Java programs which reference

elementary functions are not producing the same results on all Java

platforms.

b In JDK 1.2 Sun is distributing Java wrappers for fdlibm in C for

java.lang.Math.

c Brophy announced that he has produced a complete translation of Sun's fdlibm

in Java. This is posted at http://www.vni.com/corner/garage/grande/.

Numerics Working Group members were invited to test this library. Moler

agreed to test this implementation.

d Moreria announced that Abraham Ziv and colleagues at the IBM Haifa Labs have

developed a suite of correctly rounded math functions for IEEE arithmetic

in C. These functions are claimed to cost only about 10% more than fdlibm

in computing time. (This would be remarkable.) The Haifa researchers

claim to have a proof of correctness for their algorithms, but it is

proprietary. Several present expressed skepticism about proofs that are

not public.

e Unfortunately, the footprint of the Haifa library seems large. The

compiled binary library occupies about 250Kb (and compresses to 185 Kb). One

of the reasons for its size is the inclusion of large tables. (The source

form of the atan table is 500 kB, for example.) It is probably possible

to trade space for time in the algorithm, but it is not clear, say, how

much slower a library of half the current size would run.

f Greg Tarsey's group at Sun (including KC Ng) are working on a similar

library, which is not yet complete. It is based on slightly different

algorithms than those used by the Haifa group. (The Haifa code works

in fixed space, which their proofs show to be sufficient, while the Sun

codes repeat computations in increasing precision as necessary until the

result is satisfied.) The Sun codes run very roughly three times longer

than fdlibm at present, but probably could be sped up considerably.

g Possible specification for strictfp mode: produce the correctly rounded

result.

Pros: This is the ideal definition.

Cons: The Java code for this does not yet exist. Existing algorithms would be

much too large for inclusion in the Java core. The resulting code

might run too slowly (a factor of 2 is unacceptable). The code would

be difficult to test.

h Possible specification for strictfp mode: mandate use of VNI's native

Java fdlibm.

Pros: This code is available now. It is reasonably fast and has a small

footprint. It produces results that are usually within one unit in

the last place (ulp) of the correctly rounded result.

Cons: An operational definition like this would mandate incorrect results.

Incorporating future imporvements in the algorithms yielding better

results would change the behavior of existing Java class files.

i Possible specification for default mode: specify largest acceptable relative

error in result for each function (e.g. 1 ulp), being sure that the bounds

were satisfied by fdlibm.

Pros: This allows flexibility of implementation. It allows (guarded) use of

specialized hardware instructions for sqrt, sine, cosine, etc. It

allows improvements in algorithms producing more accurate results

to be trivially accommodated.

Cons: Careful testing of fdlibm would have to be done to insure it satisfied

the criteria; this might be difficult to do rigorously, but could be

done satisfactorily by relying on comparisons with a multiprecision

system like Maple or Mathematica. Exhaustive testing in pure Java

would be difficult.

j Tim Lindholm asked the Working Group to draft a proposal before April 9.

-------------------------------------------------------------------------------

5. Additional Floating-point Optimizations

a There was a brief discussion of whether, and under what circumstances,

additional compiler optimizations should be allowed. Examples include use

of the associative law, transformation of 0*x and x-x to zero, etc.

b The attendees agreed that such optimizations can sometimes improve

performance, however, no one wants optimizations that cause the program

to behave in ways other than the programmer intended. This requirement is

difficult to quantify.

c Attendees were dubious about transformations like 0*x to 0, since these

are usually incorrect when one of the variables is not finite.

d It was agreed that inability to apply optimizations like these was not

the main reason for the poor performance of current Java systems, and

hence that it was not crucial to address such issues in the short term.

-------------------------------------------------------------------------------

6. Relationship to the BigDecimal Package.

a Joshua Bloch of Sun briefed the Working Group on the status of BigDecimal.

This class provides basic support for multiple precision real arithmetic.

The target audience is the business community. Internal representation of

numbers is by an integer and a scale (another integer). Add, subtract and

multiply are done exactly. Division is not exact; the user can select from

eight different rounding modes for the result.

b Bloch informed the group of a proposal from IBM (authored by Mike

Cowlishaw) to extend BigDecimal to include pow(), as well as to add support

for multiple precision decimal floating-point. He asked the group whether

this would be of use in scientific computation. The group responded that

multiple precision floating-point was often quite useful in scientific

computing, but that the package would not satisfy most needs unless all

of the functions in java.lang.Math (sin(), exp(), sqrt(), etc.) were

included. Additional discussion ensued about the base for representation

of numbers in such a package. Hough explained that base two was known to

be optimal. In particular, the same precision, binary is faster, and for

the same speed, binary delivers better precision.

-------------------------------------------------------------------------------

7. Relationship to the VecMath package

a Henry Sowizral of Sun briefed the group on the current status of VecMath.

javax.vecmath VecMath is a package which provides a variety of low level

mathematical utilities of use in computer graphics. Examples include

multiplication of 2x2, 3x3 and 4x4 matrices. Also included in a class

Gmatrix which implements operations on general nxn matrices. Methods for

computing LU, Cholesky and singular value decompositions are included.

b Moler posed several problems for vecmath. The first was the SVD of a 3x3

matrix of zeros. The result returned were left and right singular vectors

of all NaNs. The matrix of singular values was unchanged from its value

on input. This is incorrect. The second problem was the SVD of the rank 2

matrix with rows (1 2 3), (4 5 6), (7 8 9). The computed singular vectors

were correct, but the matrix of singular values was unchanged from its

input state, and the rank was reported incorrectly as 3.

c It was agreed that the numerical analysis community should participate more

actively in the development of APIs for linear algebra. One option would

be for Gmatrix to be deprecated in vecmath and such matrix operations be

provided instead in a separate class for numerical linear algebra which

could be sheparded by the Numerics Working Group.

d There will be a "call for experts" soon to consider extensions to vecmath.

The Working Group agreed to provide representatives to this team to help

work out details for future development of this package.

-------------------------------------------------------------------------------

FEEDBACK FROM SUN FOR NUMERICS WORKING GROUP

-------------------------------------------------------------------------------

Subject: Feedback for the Numerics Working Group

Date: Fri, 12 Mar 1999 15:59:40 -0800 (PST)

From: Tim Lindholm <Timothy.Lindholm@Eng.Sun.COM>

To: boisvert@cam.nist.gov

Hi Ron,

The following gives my perspective on the Java Grande Numerics Working

Group and its relationship and relevance to Sun in Sun's role as the

steward of Java development.

As you know, I'm a Distinguished Engineer at Sun, one of the members

of the original Java project, the author of The Java Virtual Machine

Specification, and currently one of the architects of the Java 2

platform. I work closely with the other architects of the Java

technologies, such as James Gosling, and while I can't speak for them

in detail I can say that the opinions I express below are not out

of line with theirs.

During the development of the Java 2 platform version 1.2 (formerly

known as JDK 1.2), I was handed responsiblity for creating licensee and

industry consensus around changes to Java's primitive floating-point

arithmetic targeted at improving performance on Intel CPUs. This was

an extremely difficult task because it required careful attention to

the balance between many factors, and was being done in an extremely

charged political environment. We who were responsible did not have a

broad background in numerics, and had not been successful finding help

within Sun. We understood that there was high risk: Java had taken a

rather different approach to floating point than many other languages,

and the wrong decision in 1.2 could throw away many of the advantages

of that approach.

Our best attempts led to a public proposal that we considered a bad

compromise and were not happy with, but were resigned to. At this

point the Numerics Working Group wrote a counterproposal to Sun's

public proposal that gave new technical insight on how to balance the

demands of performance on Intel without throwing away the important

part of reproducibility. The counterproposal was both very sensitive

to the spirit of Java and satisfactory as a solution for the

performance problem. When we saw the new proposal we revived efforts

to reach a better answer.

We were subsequently aided by email and phone calls with a number of

members on the Numerics Working Group (you, Roldan, and Cleve). Joe

Darcy, one of the authors of the counterproposal, helped us to under-

stand the proposal generally, and specifically to evaluate the effects

of modifications we required of the counterproposal. All of you helped

us understand how the Numerics Working Group represented a number of

rather different fields with different perspectives on numerics. This

helped us gain confidence that the new solution reflected a consensus

that would satisfy a broad range of Java users.

We are sure that we ended up with a better answer for 1.2, and arrived

at it through more complete consideration of real issues, because of

the efforts of the Numerics Working Group. We have internally resolved

to consult with the Group on future numeric issues, and to do so

earlier in the process. Our attendance at the 3/11 Numerics Working

Group meeting at Sun and all the work we accomplished there is evidence

of this resolution. We think that the Group continues to show great

sensitivity to the needs of the Java platform and the difficulty of

introducing change while preserving compatibility and robustness. We

look forward to continuing this relationship into the future.

-- Tim