Draft detailed report of Numerics Working Group
(Roldan Pozo and Ron Boisvert from NIST reporting)
Back to General Java Grande Resource
Back to Report from First Meeting
Floating-point Issues
A number of issues related to the behavior of floating-point arithmetic
are of concern to the numeric community. Among these are the following.
-
Exact Reproducibility
-
Requiring that every JVM compute bitwise identical results for a given
Java program seems to be not only unrealizable in practice, but also inhibits
efficient floating-point processing on some platforms. For example, it
precludes the efficient use of floating-point hardware on Intel processors,
which utilize extended precision in registers. It also prevents compiler
writers from using certain kinds of optimizations when users will tolerate
them.
-
Rounding
-
Round-to-nearest is the only IEEE 754 rounding mode recognized by Java.
While this is adequate in most cases, some specialized processing, such
as interval arithmetic, would benefit greatly from more explicit control
of the rounding mode in specific instances.
-
IEEE Floating-point Exceptions
-
Currently Java admits only default actions when IEEE floating-point exceptions
occur. Thus, for example, 1/0 produces Inf, 0/0 produces NaN, etc. and,
since, arithmetic on these special quantities is completely defined, no
trapping occurs. This, again, is sufficient for most applications. However,
it is sometimes crucial to know if one of these special events has occurred.
There are two ways to do this.
-
enable trapping on selected IEEE floating-point exceptions
-
provide a standard mechanism to determine whether a given IEEE floating-point
exception has been raised
-
The former is difficult to implement and to use effectively (except when
debugging a program, when it can be quite useful).
-
The latter is easy to implement and exploit by working programs, and hence
should certainly be added.
-
NaNs
-
Java defines a single specific bit pattern to represent NaN. This is contrary
to IEEE 754, which specifies a range of values, all of which represent
NaN. This Java restriction is unnecessary and quite difficult to implement
and hence should be removed.
-
Other Arithmetic Data Types
-
Other arithmetic data types are of considerable interest to the numerical
community. The two most often cited are interval arithmetic, and (arbitrary)
multiple precision arithmetic. Ideally, such data types would be added
to the Java language spec itself. However, this is unrealistic. It would
be quite useful if standard classes for such data types were defined. If
operator overloading and efficient processing of final methods were possible,
then programs using such classes would be quite natural and fairly efficient.
Arrays
-
Multidimensional Arrays
-
Numerical software designers typically take information about the physical
layout of data in memory into account when designing algorithms to achieve
high performance. For example, LAPACK uses block-oriented algorithms and
the Level 2 and 3 BLAS to localize references for efficiency in modern
cache-based systems. The fact that Fortran requires two-dimensional arrays
be stored contiguously by columns is explicitly used in the design of these
algorithms.
-
-
Unfortunately, there is no similar requirement for multidimensional arrays
in Java. Here, a two-dimensional array is an array of one-dimensional arrays.
Although we might expect that elements of rows are stored contiguously,
one cannot depend upon the rows themselves being stored contiguously. In
fact, there is no way to check whether rows have been stored contiguously
after they have been allocated. The possible non-contiguity of rows implies
that the effectiveness of block-oriented algorithms may be highly dependent
on the particular implementation of the JVM as well as the current state
of the memory manager. In one experiment with a blocked matrix multiply,
the computation rate dropped from 82 Mflops to about 55 Mflops when rows
were randomly permuted. On the other hand, for a naive non-blocked matrix
multiply, the original 11 Mflop rate was unchanged when the rows were permuted.
-
-
An alternative would be to define a new multidimensional array data type
for Java whose memory layout would be defined (contiguous storage, rowwise,
for example). Such arrays might use multidimensional array indexing syntax
(i.e. A[i,j] rather than A[i][j]) to distinguish them from Java arrays
of arrays. Three-dimensional arrays are also quite common in applications
and could be similarly included.
-
-
If true multidimensional arrays are not added to the Java language, then
standard classes for contiguously stored arrays should be defined. It is
doubtful that such classes could be made as efficient as true arrays.
Access to Subarrays Without Copying
-
In Fortran subarrays of multidimensional arrays can be logically passed
into subprograms by passing in the address of the first element along with
subarray dimensions. Internally to the subprogram the subarray acts just
like a multidimensional array. This greatly eases the implementation of
many algorithms in numerical linear algebra, for example.
-
Subarrays cannot be delivered to methods in this way in Java. Two alternatives
are possible.
The subarray is first copied to a new appropriately sized array, and the
new array is passed to the method, or
The entire array is passed to the method along with explicit index ranges
which define the subarray.
In the first case much inefficient data copying is performed. In the second
case, additional methods that explicitly operate on subarrays must be developed;
the coding and efficiency of these methods will be complicated by complex
indexing computations.
Long Integer Indices
Some high-performance applications require extremely long arrays, and hence
support for arrays indexed by long integers is in these cases.
Array Expressions
Fortran 90 and MATLAB each allow array expressions for computation on entire
arrays. They also use a very convenient notation to denote computation
on subarrays, such as A(1:5) = B(2:6) + C(10:14). Such features greatly
ease scientific computation which are typically heavily array oriented.
They also provide an easier way for compilers to discover useful optimizations.
-
Resizing and Composing of Arrays
-
To be added
Complex numbers
Issue: complex numbers are important for numerical computing and
should be available in Java. Their operations should be asefficient
as fundamental data types (e.g. double).
Here are 4 Solutions:
-
1) Create the obvious Complex class
-
2) Create the obvious Complex class, and add operator overloading for more
natural syntax
-
3) Create special "lightweight" (or value-based) classes which do not use
"new" or the usual Object overhead (REQUIRES LANGUAGE MODIFICATION)
-
4) Modify the Java language to include complex numbers as a basic type.
PROS and CONS:
-
(1) is the easiest (can be done now), but does not provide an efficient
solution and without operator overloading leads to awkard coding.
-
(4) is best for the user, because is unlikely to take place, since it
would require a major modification to the Java language to satisfy a relatively
small segment of the Java population.
-
(3) seems to be a good compromise, because the same efficiency issues
that come with up with complex classes occur for many other small classes
in other disciplines (e.g. graphics, etc.) A proposal to allow a
value-based classes would require a major change to the language, but would
benefit a more general audience.
Note that in any solution, existing APIs must be extended, e.g. Math.abs(),
Math.cos(), etc. to work with complex numbers.
Operator overloading
-
Issue: operator overloading (e.g. +,*, etc.) for arrays and numerical
objects simplifies codes.
-
Solution: allow operator overloading for a limited subset of binary
and unary operators.
-
PROS: operator overloading is not only useful for array operations
(+, *, etc.) but also for elements of any algebraic field, including extended
precision numbers, string-based representations, rational numbers, matrices,
vectors, tensors, grids, etc. (java.Math already includes BigDecimal and
BigInter classes that could benefit from operator overloading)
-
CONS: operator overloading can be abused to generate hard-to-read
code by redefining familiar symbols into unintuitive operations.
Templates
-
Issues:
-
Templates have been successfully used in C++ to provide an effective mechanism
for code management and resuse in numerical programs.
-
-
Solution:
-
Add a mechanism similar to C++ templates to Java that provides these benefits.
It not need be identical to the C++ scheme, and could be simplified so
as to not over-complicate the language yet still provide many of its advantages.
-
-
PROS:
-
One obvious use of templates is in developing float and double versions
of the same numerical routine. But more important is the ability
to describe a numerical algorithm, say an iterative linear equation
solver, in an "generic" way, without explicit mention of the matrix structure
or storage details.
-
-
Finally, advanced techniques like template expressions can be used to deal
efficiently with array/vector operations, unroll loops, and fuse large
array expressions into a single loop.
-
CONS:
-
Template mechanisms introduce tremendous complexity into the language.
This diminishes one of the key attractive features of Java -- its simplicity.
-
-
Templates also have a series of practical problems, mostly related to their
implementation. Template codes can often lead to code bloat (since
a separate routine is generated for each instantuation). They
can be difficult to debug, depending on how well the debugger understands
instantiations. Nested templated also present implementation problems.
A quick read of the messages related to template problems in C++
news groups should give one a good idea of what to expect.
Standard interfaces for numeric objects (general issues)
-
Issue:
-
Basic data structures such as arrays, vectors, matrices, etc. are so fundamental
to numerical computing, that there should be some consistent interface
to these objects. This would avoid having each person to define their
own.
-
Solution:
-
Provide a limited set of APIs for basic numeric objects.
-
PROS:
-
This is perhaps one of the most practical contributions we can make to
the numerical community. Looking at C++, there are too many numerics
codes that define their Vector objects, etc. These yield redundant,
yet often incompatible class definitions.
-
CONS:
-
Not every one of these classes has a canonical representation. In particular,
things like "Matrix" can have a multitude of meanings: a 2D array of doubles,
an abstract linear operator, a sparse structure, etc. Some
care in design has to be taken to provide a usable framework.
Integrating with Fortran codes (general issues)
Issue: the large body of Fortran scientific apps and libraries
will not be immediately available in Java. How can we best access it?
Solutions:
-
Provide a generic Fortran->Java translator
-
PROS: This would provide a 100% Java solution and retain the portiability
advantages of language and runtime system.
-
CONS: Developing such a translator is very challenging, particularly
for separate Fortran libraries. (One complication, among many,
is that Java routines must reference the original array when passing
subarrays through functions.)
-
Use JNI to integrate native methods.
-
PROS: This is even easier with JNI component of JDK 1.1. One could aso
provide a specific API with objects like FortranArray2D, FortranArray3D,
etc., and corresonpding set(i,j)/get(i,j) methods to simplify Java codes
that interfaced with Fortran.
-
CONS: has the same portability problems Fortran/C codes: o) it makes specific
requirements on the user's environment o) still need to deal with makefiles,
configure, etc. o) cannot use reliably in Web applets o) violates security
issues
-
Port these libraries to Java
-
PROS: 100% pure Java. Performance with latest JITs about 50% of optimized
Fortran/C. (We are currently doing this with a tiny subset of the
BLAS and LAPACK -- see http://math.nist.gov/javanumerics/blas.html.)
-
CONS: lot of work.
Original Summary of Issues
-
floating point issues
-
exact bits
-
rounding
-
looseNumerics as discussed by Gosling
-
exception flags
-
exception traps
-
NaNs, Infinities
-
interval arithmetic
-
extended precision
-
complex numbers
-
operator overloading
-
templates (inlining, parameterized class types)
-
arrays
-
multidimensional (continguous) storage
-
long integer indices
-
standard classes (e.g. FortranArray3D)
-
subarrays without copying
-
resizing and composing of arrays
-
array expressions (: notation of f90, matlab)
-
aliasing
The current detailed description roughly covers the above with an intial
discussion of the first issues listed below.
-
Java numerical library interfaces (APIs)
-
standard vector, matrix classes (e.g. SymmetricPackedDouble )(?)
-
numeric (programming) exceptions (e.g. nonconforming arrays)
-
Java libraries
-
numerics
-
MPI
-
applcation-specific
-
interfaces to legacy and native methods
-
Lapack, BLAS, etc.
-
standardized schemes for f77 bindings
-
standard APIs for numeric classes (e.g. FortranArray3D)
-
Tool interfaces
-
monitors
-
debuggers
-
visualizations
-
JVM and runtime issues
-
Garbage Collection, threads
-
JVM opcode extensions
-
Java compiler optimizations
-
JIT optimizations
-
Inhibitors of compiler optimizations include:
-
Memory Hierarchy (fpu/register/cache/main/disk)
-
Interaction with Garbage Collection
-
Thread memory model
-
Weak consistency of shared variables
-
Implicit shared variables
-
Exception handling
-
FP no intermediate results rules
-
Scalability issues:
-
The VM itself
-
Thread creation, destruction and synchronization; Non-queued notifies