In this section we will discuss the other option for representing complex data buffers in the Java API of [4]--introduction of an MPJ.OBJECT datatype.
It is natural to assume that the elements of buffers passed to send and other output operations are objects whose classes implement the Serializable interface. There are at least two ways one may consider communicating object types in the MPI interface
Evidently the first implementation scheme is more straightforward, and this approach will be considered in the remainder of this section. We discuss an implementation based on the mpiJava wrappers, combining standard JDK object serialization methods with a JNI interface to native MPI. Benchmark results presented in the next suggest that something like the second approach (or some suitable combination of the two) deserves serious consideration, hence section 5 describes one realization of this scheme.
The original version of mpiJava was a direct Java wrapper for standard MPI. Apart from adopting an object-oriented framework, it added only a modest amount of code to the underlying C implementation of MPI. Derived datatype constructors, for example, simply called the datatype constructors of the underlying implementation and returned a Java object containing a representation of the C handle. A send operation or a wait operation, say, dispatched a single C MPI call. Even exploiting standard JDK object serialization and a native MPI package, uniform support for the MPJ.OBECT basic type complicates the wrapper code significantly.
In the new version of the wrapper, every send, receive, or collective communication operation tests if the base type of the datatype argument describing a buffer is OBJECT. If not--if the buffer element type is a primitive type--the native MPI operation is called directly, as in the old version. If the buffer is an array of objects, special actions must be taken in the wrapper. If the buffer is a send buffer, the objects must be serialized. To support MPI derived datatypes as described in the previous section, we must also take account of the possibility that the message is actually some subset of the of array of objects passed in the buffer argument, selected according to the displacement sequence of the derived datatype. Making the Java wrapper responsible for handling derived data types when the base type is OBJECT requires additional state in the Java-side Datatype class. In particular the Java object may have to explicitly maintain the displacement sequence as an array of integers.
A further set of changes to the implementation arises because the size of the serialized data is not known in advance, and cannot be computed at the receiving end from type information available there. Before the serialized data is sent, the size of the data must be communicated to the receiver, so that a byte receive buffer can be allocated. We send two physical messages--a header containing size information, followed by the data. This, in turn, complicates the implementation of the various wait and test methods on communication request objects, and the start methods on persistent communication requests, and ends up requiring extra state to the Java Request class. Comparable changes are needed in the collective communication wrappers. A gather operation, for example, involving object types is implemented as an MPI_GATHER operation to collect all message lengths, followed by an MPI_GATHERV to collect possibly different-sized data vectors.
These changes were made throughout the mpiJava API, and will be included in the next release of the software.