General Instructions
This template has been designed to be easily filled out using an HTML editor. Please complete all sections. Don't forget to give the
proposed specification a name.
E-mail the completed JSR to: jsr-submit@sun.com. Don't forget to include the name of the JSR in
the subject line.
As per Section 1 of the Java Community Process, JSRs will only be accepted from
Participants (and each Participant can only have 3 JSRs active at the same time).
JSR - Java Array package
Section 1. Identification
Submitting Participant: |
International Business Machines Corporation |
Name of Contact Person: |
Jose E. Moreira |
E-Mail Address: |
jmoreira@us.ibm.com |
Telephone Number: |
1-914-945-3987 |
Fax Number: |
1-914-945-4425 |
(Optional) List of other Participants who endorse this JSR:
Java
Grande Forum
Section 2: Request
2.1 Please describe the proposed Specification: |
Multidimensional arrays are n-dimensional rectangular collections
of elements.
An array is characterized by its rank (number of dimensions
or axes), its elemental data type (all elements of an array
are of the same type), and its shape (the extents along
its axes).
Elements of an array are identified by their indices along each axis.
Let a d-dimensional array A of elemental type
T have extent nj along its j-th axis,
j = 0,...,d-1.
Then, a valid index ij along the j-th axis must
be greater than or equal to zero and less than nj.
An attempt to reference an element
A[i0,i1,...,id-1]
with any invalid index ij causes an
ArrayIndexOutOfBoundsException to be thrown.
Elements of an array are logically ordered with respect to each
other according to the following definition.
An element
A[i0,i1,...,id-1]
of a d-dimensional array A follows an element
A[j0,j1,...,jd-1]
of the same array if and only if there exists a k greater
than or equal to zero and less than d such that
il=jl for all l<k
and ik>jk.
In usual nomenclature, this corresponds to row-major
(C-style) ordering of the elements.
We call this the intrinsic ordering of the array elements.
(In a column-major, i.e., Fortran-style, ordering of the elements,
an element
A[i0,i1,...,id-1]
of a d-dimensional array A follows an element
A[j0,j1,...,jd-1]
of the same array if and only if there exists a k greater
than or equal to zero and less than d such that
il=jl for all l>k
and ik>jk.
This concept will be useful when we discuss native interfaces below.)
We propose the development of standard Java classes which
implement multidimensional rectangular arrays.
The rank and type of an array are defined by its class.
That is, for each rank and type there is a different class.
(This is necessary for traditional compiler optimizations,
since Java does not support templates.)
Supported types must include all of Java primitive types
(boolean , byte , char , short ,
int , long , float ,
and double ), one or more complex types (at least the
Complex class in the VNI JSR - Add Complex Class to Java),
and Object .
Supported ranks must include 0- (scalar), 1-, 2-, 3-, 4-, 5-, 6-
and 7-dimensional arrays. (Rank 7 is the current standard limit for Fortran.)
The class for a d-dimensional array of type would
be typeArray dD .
As an example, the class for a two-dimensional array of doubles would
be named doubleArray2D .
Array classes are final classes.
The array classes must fully support the concept of
regular array sections.
A regular array section corresponds to a subarray of another
array, which we call the master array.
Each element of an array section corresponds to exactly one
element of its master array.
The correspondence is one-to-one in the section to master direction.
Referencing one element of an array section (for reading or writing)
has the effect of referencing the corresponding element of
the master array.
Regular array sections have the same type as, and rank less than
or equal to, their master arrays.
Regular array sections behave exactly like regular arrays for all operations,
including sectioning.
(In fact, there are no separate classes for array sections.)
The array classes provide methods that implement Fortran-like
functionality for arrays.
In particular, the following operations must be provided:
- Get and set the values of an array element,
array regular section, or array irregular section.
- Operations to query the rank and shape of an array.
- Operations to reshape and transpose an array.
- Elemental conversion functions (e.g., the
equivalent of Fortran
REAL and AIMAG , that
convert complex arrays into double arrays).
- Elemental transcendental functions.
- Elemental relational functions (<, >, <=, >=, ==, !=).
- Array reduction functions (sum, minval, etc.).
- Array construction functions (merge, pack, spread, unpack).
- Array manipulation functions (shift, spread).
- Array location functions (maxloc, minloc).
- Array scatter and gather operations.
- Operations that correspond to array expressions
(addition, multiplication, etc.)
Finally, it must be possible to cast Java arrays into multidimensional
array objects of the same rank and vice-versa.
As an example, it must be possible to convert back and forth
between doubleArray2D and double[][] .
The casting operators create new copies of the data to prevent
exposing the internal structure of the array classes.
Quite often, Java code using the Array package will have to interface
with native code.
The native code can access the elements of a multidimensional array
using the following mechanism, explained through an example.
Let there be a native static void foo(doubleArray2D) .
The C code for foo needs to do the following:
foo(JNIEnv *env, jobject arr)
{
/*
* Find the number of elements in the array.
*/
jsize len = GetMDArrayLength(env, arr);
...
/*
* Obtain a pointer to the elements of the array.
* The elements are in an order defined by parameter ORDER.
*/
jdouble *data = GetDoubleMDArrayElements(env, arr, ORDER);
/*
* Operate on data as desired.
*/
...
/*
* Release the array elements when done.
*/
ReleaseDoubleMDArrayElements(env, arr, data);
return;
}
Note that the elements extracted by
GetDoubleMDArrayElements are ordered according to some
enumeration order identified by the last parameter. We propose
at least two supported orderings: ROW_MAJOR returns the
elements in row major order, while COL_MAJOR returns the
elements in column major order. Note that ROW_MAJOR orders
the element according to the intrinsic ordering previously specified.
COL_MAJOR returns the elements in the same order that a Fortran array
would be organized. It is convenient for the development of Java/Fortran
interfaces.
The actions performed by GetDoubleMDArrayElements and
ReleaseDoubleMDArrayElements are implementation dependent.
In some cases it may be possible to just return a pointer to the (pinned)
storage of the array elements, if that storage is already in the appropriate
order. In other situations it may be necessary to
copy the elements to a temporary storage.
If the array arr passed as a parameter is a section of another
array, then copy is in general the only alternative.
The array classes can be implemented with no changes in Java or JVM. However,
it is essential that the get and set methods be implemented as efficiently as
array indexing operations are in Fortran or in C.
Multidimensional arrays are extremely common in numerical
computing, and hence we expect that efficient multidimensional arrays classes
will be heavily used.
Access to individual elements of multidimensional arrays is performed
only through get and set accessor methods.
As an example, consider the following simple implementation of a matrix
multiply code, where each matrix is represented by a
doubleArray2D class:
void matmul(doubleArray2D a, doubleArray2D b, doubleArray2D c) {
/*
* For simplicity, let us consider only the simple case
* in which the matrices are conforming, that is,
* a is m x n, b is n x p, and c is m x p.
*/
int m = a.size(0);
int n = a.size(1);
int p = b.size(1);
for (int i=0; i<m; i++) {
for (int j=0; j<p; j++) {
c.set(i,j,0);
for (int k=0; k<n; k++)
c.set(i,j,c.get(i,j)+a.get(i,k)*b.get(k,j));
}
}
}
Although feasible, the development of code that manipulates individual
array elements through get and set accessor
methods is cumbersome and error prone.
To make the Java Array package really appealing to numerical programmers,
it is necessary to extend the current Java language specification to
include syntax for efficient support of true multidimensional arrays.
The basic syntax proposed uses A[i,j,k]
rather than A[i][j][k] to reference elements of multidimensional
arrays. This differentiates multidimensional arrays from Java arrays
of arrays. References of this form are syntactically mapped to the
appropriate get or set method, depending on how
they are used. We illustrate these ideas with simple examples showing the
respective mappings:
double[*] X = new intArray1D(M);
--> intArray1D X = new intArray1D(M};
X[i] = Y[i];
--> X.set(i,Y.get(i));
double[*,*,*] A = new doubleArray3D(M,N,K);
--> doubleArray3D A = new doubleArray3D(M,N,K);
A[i,j,k] = B[i,j,k]+s*C[j,k,l];
--> A.set(i,j,k,B.get(i,j,k)+s*C.get(j,k,l));
Note that the * s in the declaration are necessary to differentiate
multidimensional arrays of rank 1 from native Java arrays. It is also tempting
to use new double[M,N,K] to denote creation of a new
doubleArray3D object, but note that using this notation
new double[M] would be ambiguous: it could mean either a
new doubleArray1D object or a new Java array of double s.
|
2.2 What is the target Java platform? (i.e., desktop, server, personal,
embedded, card, etc.) |
The Array package is targeted at both the desktop and server platforms.
|
2.3 What need of the Java community will be addressed by the proposed
specification? |
Multidimensional arrays are the most common data
structures in scientific and engineering computing.
The Java Array package provides the ability to represent multidimensional arrays in
Java programs.
It supports a set of array operations that lead to concise representation and
efficient optimization of scientific and engineering codes.
Even though the functionality of array element access can be accomplished
using the conventional method syntax of Java, this approach leads to
code that is unnecessarily difficult to read and maintain.
This is the motivation for the array syntax part of this JSR.
|
2.4 Why isn't this need met by existing specifications? |
Native Java arrays are strictly one-dimensional. Multidimensional arrays
are simulated as "arrays of arrays". That means, for example,
that each element of a double[][] array is a
double[] array.
Arrays of arrays are very general and therefore more difficult to optimize.
For instance, for a double[][] a , rows
a[i] and a[j] may have different lengths.
Bounds checking optimization is an example of an optimization that can be better
performed when arrays are known to be rectangular (as in true multidimensional
array).
Furthermore, the "array of arrays" approach opens up more possibility for aliasing.
For instance, for a double[][] a and
double[][] b , it is possible that a[i] and
b[j] refer to the same row.
In fact, it is possible that a[i] and a[j] (i != j)
refer to the same row.
Many advanced compiler optimizations rely on accurate aliasing disambiguation.
The java.vecmath package includes two classes that are relevant to this
discussion: GVector and GMatrix. They implement one- and two-dimensional arrays of
doubles. The purpose of these classes is similar in spirit to the Array package
but they offer much more restricted functionality.
GVector and GMatrix only support one- and two-dimensional arrays of doubles and
they only offer very limited aggregate operations.
|
2.5 Please give a short description of the underlying technology or
technologies: |
Implementing multidimensional arrays as classes has been done before in the
context of A++/P++ and POOMA. The mechanisms for mapping a multidimensional
cartesian space into a single-dimensional address space are well understood.
The main challenge is performance.
A good implementation of the Array package will deliver high performance in
the execution of aggregate methods. That is, methods that operate on many
elements, such as adding two array sections or multiplying two matrices.
The techniques for performing efficient aggregate operations on matrices,
in particular for linear algebra operations, are well described in the
literature.
To deliver good performance on user codes that use elementary operations
on array elements, two compiler technologies are of utmost importance:
- Bounds checking optimization through versioning: This technique creates
safe regions of code where all array accesses are guaranteed to be valid.
Aggressive optimizations, that reorder code, can be applied in these regions.
- Semantic expansion: With the array package, each elemental data access
(set or get) operation is a method call. The method verifies that all the
indices are valid (along each of the array axes) before proceeding to the
actual data access. Exposing the semantics of these methods to the compiler
is a necessary step for optimization. This is accomplished through
the technique of semantic expansion.
The extended array syntax could be accomplished in several ways
-
The official Java language could be extended outright
to support this notation. This extension is similar in concept to the
"+" notation supported for
String s.
-
A specific pre-processor could be built that translates
multidimensional array expressions into their equivalent semantics using
the conventional method syntax for the Array package.
-
The Java language could be extended to support a
limited form of operator overloading
for [i,j,k] and [,,].
|
2.6 Is there a proposed package name for the API Specification? (i.e.,
javapi.something, org.something, com.something, etc.) |
javax.math.array |
2.7 Does the proposed specification have any dependencies on specific
operating systems, CPUs, or I/O devices that you know of? |
No.
A strawman implementation of the Array package was
implemented entirely in Java.
|
2.8 Are there any security issues that cannot be addressed by the current
security model? |
No.
|
2.9 Are there any internationalization or localization issues? |
No.
|
2.10 Are there any existing specifications that might be rendered obsolete,
deprecated, or in need of revision as a result of this work? |
No other existing specifications are affected by this specification.
The functionality of classes GVector and GMatrix of java.vecmath will be largely
superseded by the functionality of the equivalent classes (doubleArray1D and
doubleArray2D, respectively) in the Array package.
However, the interfaces are not compatible and for the benefit of programs
using the Java 3D API both GVector and GMatrix should continue to be maintained.
There is a relationship between the specification of the Array package and
of the Complex class (as being submitted by VNI). Complex arrays in the Array
package must support the same elemental type as defined by the Complex class.
The recent proposal to extend generic
types to Java could provide a more general solution for building multidimensional
arrays of user-defined types.
The extended array syntax being proposed is backwards
compatible with current Java syntax
and will not render any existing codes obsolete or deprecated.
|
Section 3: Contributions
3.1 Please list any existing documents, specifications, or implementations
that describe the technology. Please include links to the documents if they are publicly available. |
A strawman for the Array package is available as package
com.ibm.math.array. It
can be downloaded from
www.alphaWorks.ibm.com/tech/ninja.
A prototype research compiler that performs the optimizations of
semantic expansion and bounds checking optimization has been implemented
at IBM Research.
Two technical reports describing the design of the strawman Array package are
available:
- IBM Research Division RC21369. A Standard Java Array Package for
Technical Computing. Jose Moreira, Sam Midkiff, Manish Gupta.
- IBM Research Division RC21481. Java Programming for High Performance
Numerical Computing. Jose Moreira, Sam Midkiff, Manish Gupta, Pedro Artigas,
Marc Snir, Rick Lawrence.
Both these reports are available at
www.research.ibm.com/ninja.
|
3.2 Explanation of how these items might be used as a starting point for the
work. |
The strawman implementation for the Array package includes most of the
functionality desired in the final product.
The main features missing are reduction operations (sum and product of all elements
of an array) and elementary math functions (sin, cos, exp, etc applied to all
elements of an array.)
|
Section 4: Additional Information (Optional)
4.1This section contains any additional information that the submitting
Participant wishes to include in the JSR. |
[PLEASE_FILL_IN] |
|