JSR HTML Template

JSR HTML Template

Identification | Request | Contributions | Additional Information

General Instructions

This template has been designed to be easily filled out using an HTML editor. Please complete all sections. Don't forget to give the proposed specification a name.

E-mail the completed JSR to: jsr-submit@sun.com. Don't forget to include the name of the JSR in the subject line.

As per Section 1 of the Java Community Process, JSRs will only be accepted from Participants (and each Participant can only have 3 JSRs active at the same time).

JSR - Java Array package

Section 1. Identification

Submitting Participant: International Business Machines Corporation

Name of Contact Person: Jose E. Moreira

E-Mail Address: jmoreira@us.ibm.com

Telephone Number: 1-914-945-3987

Fax Number: 1-914-945-4425

(Optional) List of other Participants who endorse this JSR:

Java Grande Forum

Section 2: Request

2.1 Please describe the proposed Specification:

Multidimensional arrays are n-dimensional rectangular collections of elements. An array is characterized by its rank (number of dimensions or axes), its elemental data type (all elements of an array are of the same type), and its shape (the extents along its axes).

Elements of an array are identified by their indices along each axis. Let a d-dimensional array A of elemental type T have extent n_j along its j-th axis, j = 0,...,d-1. Then, a valid index i_j along the j-th axis must be greater than or equal to zero and less than n_j. An attempt to reference an element A[i₀,i₁,...,i_d-1] with any invalid index i_j causes an ArrayIndexOutOfBoundsException to be thrown.

Elements of an array are logically ordered with respect to each other according to the following definition. An element A[i₀,i₁,...,i_d-1] of a d-dimensional array A follows an element A[j₀,j₁,...,j_d-1] of the same array if and only if there exists a k greater than or equal to zero and less than d such that i_l=j_l for all l<k and i_k>j_k. In usual nomenclature, this corresponds to row-major (C-style) ordering of the elements. We call this the intrinsic ordering of the array elements. (In a column-major, i.e., Fortran-style, ordering of the elements, an element A[i₀,i₁,...,i_d-1] of a d-dimensional array A follows an element A[j₀,j₁,...,j_d-1] of the same array if and only if there exists a k greater than or equal to zero and less than d such that i_l=j_l for all l>k and i_k>j_k. This concept will be useful when we discuss native interfaces below.)

We propose the development of standard Java classes which implement multidimensional rectangular arrays. The rank and type of an array are defined by its class. That is, for each rank and type there is a different class. (This is necessary for traditional compiler optimizations, since Java does not support templates.) Supported types must include all of Java primitive types (boolean, byte, char, short, int, long, float, and double), one or more complex types (at least the Complex class in the VNI JSR - Add Complex Class to Java), and Object. Supported ranks must include 0- (scalar), 1-, 2-, 3-, 4-, 5-, 6- and 7-dimensional arrays. (Rank 7 is the current standard limit for Fortran.) The class for a d-dimensional array of type would be typeArraydD. As an example, the class for a two-dimensional array of doubles would be named doubleArray2D. Array classes are final classes.

The array classes must fully support the concept of regular array sections. A regular array section corresponds to a subarray of another array, which we call the master array. Each element of an array section corresponds to exactly one element of its master array. The correspondence is one-to-one in the section to master direction. Referencing one element of an array section (for reading or writing) has the effect of referencing the corresponding element of the master array. Regular array sections have the same type as, and rank less than or equal to, their master arrays. Regular array sections behave exactly like regular arrays for all operations, including sectioning. (In fact, there are no separate classes for array sections.)

The array classes provide methods that implement Fortran-like functionality for arrays. In particular, the following operations must be provided:

Get and set the values of an array element, array regular section, or array irregular section.
Operations to query the rank and shape of an array.
Operations to reshape and transpose an array.
Elemental conversion functions (e.g., the equivalent of Fortran REAL and AIMAG, that convert complex arrays into double arrays).
Elemental transcendental functions.
Elemental relational functions (<, >, <=, >=, ==, !=).
Array reduction functions (sum, minval, etc.).
Array construction functions (merge, pack, spread, unpack).
Array manipulation functions (shift, spread).
Array location functions (maxloc, minloc).
Array scatter and gather operations.
Operations that correspond to array expressions (addition, multiplication, etc.)

Finally, it must be possible to cast Java arrays into multidimensional array objects of the same rank and vice-versa. As an example, it must be possible to convert back and forth between doubleArray2D and double[][]. The casting operators create new copies of the data to prevent exposing the internal structure of the array classes.

Quite often, Java code using the Array package will have to interface with native code. The native code can access the elements of a multidimensional array using the following mechanism, explained through an example. Let there be a native static void foo(doubleArray2D). The C code for foo needs to do the following:

    foo(JNIEnv *env, jobject arr)
    {

        /*
         * Find the number of elements in the array.
         */
        jsize len = GetMDArrayLength(env, arr);
        ...

        /*
         * Obtain a pointer to the elements of the array.
	 * The elements are in an order defined by parameter ORDER.
         */
        jdouble *data = GetDoubleMDArrayElements(env, arr, ORDER);

	/*
         * Operate on data as desired.
         */
        ...

        /*
         * Release the array elements when done.
         */
        ReleaseDoubleMDArrayElements(env, arr, data);
        return;
    }

Note that the elements extracted by GetDoubleMDArrayElements are ordered according to some enumeration order identified by the last parameter. We propose at least two supported orderings: ROW_MAJOR returns the elements in row major order, while COL_MAJOR returns the elements in column major order. Note that ROW_MAJOR orders the element according to the intrinsic ordering previously specified. COL_MAJOR returns the elements in the same order that a Fortran array would be organized. It is convenient for the development of Java/Fortran interfaces.

The actions performed by GetDoubleMDArrayElements and ReleaseDoubleMDArrayElements are implementation dependent. In some cases it may be possible to just return a pointer to the (pinned) storage of the array elements, if that storage is already in the appropriate order. In other situations it may be necessary to copy the elements to a temporary storage. If the array arr passed as a parameter is a section of another array, then copy is in general the only alternative.

The array classes can be implemented with no changes in Java or JVM. However, it is essential that the get and set methods be implemented as efficiently as array indexing operations are in Fortran or in C. Multidimensional arrays are extremely common in numerical computing, and hence we expect that efficient multidimensional arrays classes will be heavily used.

Access to individual elements of multidimensional arrays is performed only through get and set accessor methods. As an example, consider the following simple implementation of a matrix multiply code, where each matrix is represented by a doubleArray2D class:

void matmul(doubleArray2D a, doubleArray2D b, doubleArray2D c) {

    /*
     * For simplicity, let us consider only the simple case
     * in which the matrices are conforming, that is,
     * a is m x n, b is n x p, and c is m x p.
     */
    int m = a.size(0);
    int n = a.size(1);
    int p = b.size(1);

    for (int i=0; i<m; i++) {
        for (int j=0; j<p; j++) {
            c.set(i,j,0);
            for (int k=0; k<n; k++) 
                c.set(i,j,c.get(i,j)+a.get(i,k)*b.get(k,j));
        }
    }
}

Although feasible, the development of code that manipulates individual array elements through get and set accessor methods is cumbersome and error prone. To make the Java Array package really appealing to numerical programmers, it is necessary to extend the current Java language specification to include syntax for efficient support of true multidimensional arrays.

The basic syntax proposed uses A[i,j,k] rather than A[i][j][k] to reference elements of multidimensional arrays. This differentiates multidimensional arrays from Java arrays of arrays. References of this form are syntactically mapped to the appropriate get or set method, depending on how they are used. We illustrate these ideas with simple examples showing the respective mappings:

double[*] X = new intArray1D(M);
    --> intArray1D X = new intArray1D(M};

X[i] = Y[i];
    --> X.set(i,Y.get(i));

double[*,*,*] A = new doubleArray3D(M,N,K);
    -->  doubleArray3D A = new doubleArray3D(M,N,K);

A[i,j,k] = B[i,j,k]+s*C[j,k,l];
    --> A.set(i,j,k,B.get(i,j,k)+s*C.get(j,k,l));

Note that the *s in the declaration are necessary to differentiate multidimensional arrays of rank 1 from native Java arrays. It is also tempting to use new double[M,N,K] to denote creation of a new doubleArray3D object, but note that using this notation new double[M] would be ambiguous: it could mean either a new doubleArray1D object or a new Java array of doubles.

2.2 What is the target Java platform? (i.e., desktop, server, personal, embedded, card, etc.)

The Array package is targeted at both the desktop and server platforms.

2.3 What need of the Java community will be addressed by the proposed specification?

Multidimensional arrays are the most common data structures in scientific and engineering computing. The Java Array package provides the ability to represent multidimensional arrays in Java programs. It supports a set of array operations that lead to concise representation and efficient optimization of scientific and engineering codes.

Even though the functionality of array element access can be accomplished using the conventional method syntax of Java, this approach leads to code that is unnecessarily difficult to read and maintain. This is the motivation for the array syntax part of this JSR.

2.4 Why isn't this need met by existing specifications?

Native Java arrays are strictly one-dimensional. Multidimensional arrays are simulated as "arrays of arrays". That means, for example, that each element of a double[][] array is a double[] array. Arrays of arrays are very general and therefore more difficult to optimize. For instance, for a double[][] a, rows a[i] and a[j] may have different lengths. Bounds checking optimization is an example of an optimization that can be better performed when arrays are known to be rectangular (as in true multidimensional array). Furthermore, the "array of arrays" approach opens up more possibility for aliasing. For instance, for a double[][] a and double[][] b, it is possible that a[i] and b[j] refer to the same row. In fact, it is possible that a[i] and a[j] (i != j) refer to the same row. Many advanced compiler optimizations rely on accurate aliasing disambiguation.

The java.vecmath package includes two classes that are relevant to this discussion: GVector and GMatrix. They implement one- and two-dimensional arrays of doubles. The purpose of these classes is similar in spirit to the Array package but they offer much more restricted functionality. GVector and GMatrix only support one- and two-dimensional arrays of doubles and they only offer very limited aggregate operations.

2.5 Please give a short description of the underlying technology or technologies:

Implementing multidimensional arrays as classes has been done before in the context of A++/P++ and POOMA. The mechanisms for mapping a multidimensional cartesian space into a single-dimensional address space are well understood. The main challenge is performance.

A good implementation of the Array package will deliver high performance in the execution of aggregate methods. That is, methods that operate on many elements, such as adding two array sections or multiplying two matrices. The techniques for performing efficient aggregate operations on matrices, in particular for linear algebra operations, are well described in the literature.

To deliver good performance on user codes that use elementary operations on array elements, two compiler technologies are of utmost importance:

Bounds checking optimization through versioning: This technique creates safe regions of code where all array accesses are guaranteed to be valid. Aggressive optimizations, that reorder code, can be applied in these regions.
Semantic expansion: With the array package, each elemental data access (set or get) operation is a method call. The method verifies that all the indices are valid (along each of the array axes) before proceeding to the actual data access. Exposing the semantics of these methods to the compiler is a necessary step for optimization. This is accomplished through the technique of semantic expansion.

The extended array syntax could be accomplished in several ways

The official Java language could be extended outright to support this notation. This extension is similar in concept to the "+" notation supported for Strings.
A specific pre-processor could be built that translates multidimensional array expressions into their equivalent semantics using the conventional method syntax for the Array package.
The Java language could be extended to support a limited form of operator overloading for [i,j,k] and [,,].

2.6 Is there a proposed package name for the API Specification? (i.e., javapi.something, org.something, com.something, etc.)

javax.math.array

2.7 Does the proposed specification have any dependencies on specific operating systems, CPUs, or I/O devices that you know of?

No. A strawman implementation of the Array package was implemented entirely in Java.

2.8 Are there any security issues that cannot be addressed by the current security model?

No.

2.9 Are there any internationalization or localization issues?

No.

2.10 Are there any existing specifications that might be rendered obsolete, deprecated, or in need of revision as a result of this work?

No other existing specifications are affected by this specification. The functionality of classes GVector and GMatrix of java.vecmath will be largely superseded by the functionality of the equivalent classes (doubleArray1D and doubleArray2D, respectively) in the Array package. However, the interfaces are not compatible and for the benefit of programs using the Java 3D API both GVector and GMatrix should continue to be maintained.

There is a relationship between the specification of the Array package and of the Complex class (as being submitted by VNI). Complex arrays in the Array package must support the same elemental type as defined by the Complex class.

The recent proposal to extend generic types to Java could provide a more general solution for building multidimensional arrays of user-defined types.

The extended array syntax being proposed is backwards compatible with current Java syntax and will not render any existing codes obsolete or deprecated.

Section 3: Contributions

3.1 Please list any existing documents, specifications, or implementations that describe the technology. Please include links to the documents if they are publicly available.

A strawman for the Array package is available as package

com.ibm.math.array. It
  		can be downloaded from 
		
		www.alphaWorks.ibm.com/tech/ninja.


		 A prototype research compiler that performs the optimizations of
            semantic expansion and bounds checking optimization has been implemented 
            at IBM Research.

		

		Two technical reports describing the design of the strawman Array package are
        	available:
		

		 IBM Research Division RC21369. A Standard Java Array Package for 
		Technical Computing. Jose Moreira, Sam Midkiff, Manish Gupta.
		
 IBM Research Division RC21481. Java Programming for High Performance
		Numerical Computing. Jose Moreira, Sam Midkiff, Manish Gupta, Pedro Artigas,
		Marc Snir, Rick Lawrence.
		
		Both these reports are available at 
		
		www.research.ibm.com/ninja.

3.2 Explanation of how these items might be used as a starting point for the work.

The strawman implementation for the Array package includes most of the functionality desired in the final product. The main features missing are reduction operations (sum and product of all elements of an array) and elementary math functions (sin, cos, exp, etc applied to all elements of an array.)

Section 4: Additional Information (Optional)

4.1This section contains any additional information that the submitting Participant wishes to include in the JSR.

[PLEASE_FILL_IN]

See Also
Details of the Community Process
New Specification Proposals
Open Calls for Experts

[ This page was updated: 06-May-99 ]

Feedback | Map | A-Z Index

For information, call:
(800) 786-7638
Outside the U.S. and Canada, dial your country's AT&T Direct Access Number first.