Draft detailed report of Numerics Working Group
(Roldan Pozo and Ron Boisvert from NIST reporting)
Back to General Java Grande Resource
Back to Report from First Meeting

Floating-point Issues

A number of issues related to the behavior of floating-point arithmetic are of concern to the numeric community. Among these are the following.
Exact Reproducibility
Requiring that every JVM compute bitwise identical results for a given Java program seems to be not only unrealizable in practice, but also inhibits efficient floating-point processing on some platforms. For example, it precludes the efficient use of floating-point hardware on Intel processors, which utilize extended precision in registers. It also prevents compiler writers from using certain kinds of optimizations when users will tolerate them.
Rounding
Round-to-nearest is the only IEEE 754 rounding mode recognized by Java. While this is adequate in most cases, some specialized processing, such as interval arithmetic, would benefit greatly from more explicit control of the rounding mode in specific instances.
IEEE Floating-point Exceptions
Currently Java admits only default actions when IEEE floating-point exceptions occur. Thus, for example, 1/0 produces Inf, 0/0 produces NaN, etc. and, since, arithmetic on these special quantities is completely defined, no trapping occurs. This, again, is sufficient for most applications. However, it is sometimes crucial to know if one of these special events has occurred. There are two ways to do this.
The former is difficult to implement and to use effectively (except when debugging a program, when it can be quite useful).
The latter is easy to implement and exploit by working programs, and hence should certainly be added.
NaNs
Java defines a single specific bit pattern to represent NaN. This is contrary to IEEE 754, which specifies a range of values, all of which represent NaN. This Java restriction is unnecessary and quite difficult to implement and hence should be removed.
Other Arithmetic Data Types
Other arithmetic data types are of considerable interest to the numerical community. The two most often cited are interval arithmetic, and (arbitrary) multiple precision arithmetic. Ideally, such data types would be added to the Java language spec itself. However, this is unrealistic. It would be quite useful if standard classes for such data types were defined. If operator overloading and efficient processing of final methods were possible, then programs using such classes would be quite natural and fairly efficient.


 

Arrays

Multidimensional Arrays
Numerical software designers typically take information about the physical layout of data in memory into account when designing algorithms to achieve high performance. For example, LAPACK uses block-oriented algorithms and the Level 2 and 3 BLAS to localize references for efficiency in modern cache-based systems. The fact that Fortran requires two-dimensional arrays be stored contiguously by columns is explicitly used in the design of these algorithms.
Unfortunately, there is no similar requirement for multidimensional arrays in Java. Here, a two-dimensional array is an array of one-dimensional arrays. Although we might expect that elements of rows are stored contiguously, one cannot depend upon the rows themselves being stored contiguously. In fact, there is no way to check whether rows have been stored contiguously after they have been allocated. The possible non-contiguity of rows implies that the effectiveness of block-oriented algorithms may be highly dependent on the particular implementation of the JVM as well as the current state of the memory manager. In one experiment with a blocked matrix multiply, the computation rate dropped from 82 Mflops to about 55 Mflops when rows were randomly permuted. On the other hand, for a naive non-blocked matrix multiply, the original 11 Mflop rate was unchanged when the rows were permuted.
An alternative would be to define a new multidimensional array data type for Java whose memory layout would be defined (contiguous storage, rowwise, for example). Such arrays might use multidimensional array indexing syntax (i.e. A[i,j] rather than A[i][j]) to distinguish them from Java arrays of arrays. Three-dimensional arrays are also quite common in applications and could be similarly included.
If true multidimensional arrays are not added to the Java language, then standard classes for contiguously stored arrays should be defined. It is doubtful that such classes could be made as efficient as true arrays.
Access to Subarrays Without Copying
In Fortran subarrays of multidimensional arrays can be logically passed into subprograms by passing in the address of the first element along with subarray dimensions. Internally to the subprogram the subarray acts just like a multidimensional array. This greatly eases the implementation of many algorithms in numerical linear algebra, for example.
Subarrays cannot be delivered to methods in this way in Java. Two alternatives are possible.
  • The subarray is first copied to a new appropriately sized array, and the new array is passed to the method, or
  • The entire array is passed to the method along with explicit index ranges which define the subarray.
  • In the first case much inefficient data copying is performed. In the second case, additional methods that explicitly operate on subarrays must be developed; the coding and efficiency of these methods will be complicated by complex indexing computations.
    Long Integer Indices
    Some high-performance applications require extremely long arrays, and hence support for arrays indexed by long integers is in these cases.
    Array Expressions
    Fortran 90 and MATLAB each allow array expressions for computation on entire arrays. They also use a very convenient notation to denote computation on subarrays, such as A(1:5) = B(2:6) + C(10:14). Such features greatly ease scientific computation which are typically heavily array oriented. They also provide an easier way for compilers to discover useful optimizations.
    Resizing and Composing of Arrays
    To be added

    Complex numbers

    Issue:  complex numbers are important for numerical computing and should be available in Java.  Their operations should be asefficient as fundamental data types (e.g. double).
    Here are 4 Solutions:
     PROS and CONS:

    Note that in any solution, existing APIs must be extended, e.g. Math.abs(), Math.cos(), etc. to work with complex numbers.



     

    Operator overloading

     

     

    Templates

    Issues:
    Templates have been successfully used in C++ to provide an effective mechanism for code management and resuse in numerical programs.
    Solution:
    Add a mechanism similar to C++ templates to Java that provides these benefits.  It not need be identical to the C++ scheme, and could be simplified so as to not over-complicate the language yet still provide many of its advantages.
    PROS:
    One obvious use of templates is in developing float  and double versions of the same numerical routine. But more  important is the ability to describe a numerical algorithm, say  an iterative linear equation solver, in an "generic" way, without explicit mention of the matrix structure or storage details.
    Finally, advanced techniques like template expressions can be used to deal efficiently with array/vector operations, unroll loops, and fuse large array expressions into a single loop.
    CONS:
    Template mechanisms introduce tremendous complexity into the language. This diminishes one of the key attractive features of Java -- its simplicity.
    Templates also have a series of practical problems, mostly related to their implementation. Template codes can often  lead to code bloat (since a separate routine is generated for  each instantuation).  They can be difficult to debug, depending on how well the debugger understands instantiations. Nested templated also present implementation problems. A quick read of the messages related to template problems  in C++ news groups should give one a good idea of what to expect.



     

    Standard interfaces for numeric objects (general issues)

     
    Issue:
    Basic data structures such as arrays, vectors, matrices, etc. are so fundamental to numerical computing, that there should be some consistent interface to these objects.  This would avoid having each person to define their own.
    Solution:
    Provide a limited set of APIs for basic numeric objects.
    PROS: 
    This is perhaps one of the most practical contributions we can make to the numerical community.  Looking at C++, there are too many numerics codes that define their Vector objects, etc.  These yield redundant, yet  often incompatible class definitions.
    CONS:
    Not every one of these classes has a canonical representation. In particular, things like "Matrix" can have a multitude of meanings: a 2D array of doubles, an abstract linear operator,  a sparse structure, etc.  Some care in design has to be taken to provide a usable framework.



     

    Integrating with Fortran codes  (general issues)

    Issue:  the large body of Fortran scientific apps and libraries will not be immediately available in Java. How can we best access it?

    Solutions:
     



     

    Original Summary of Issues

    The current detailed description roughly covers the above with an intial discussion of the first issues listed below.