Given by Bryan Carpenter,Geoffrey C. Fox, Sung-Hoon Ko, Sang Lim at ACM Java Grande Meeting on June 12-13 99. Foils prepared July 6 99
Outside Index
Summary of Material
Review the mpiJava wrapper interface. |
Discuss incorporation of derived datatypes in the Java API, and limitations. |
Adding object serialization at the API level. |
Describe implementation using JDK serialization. |
Benchmarks for naïve implementation. |
Ongoing work: optimizing serialization. |
Outside Index Summary of Material
Bryan Carpenter, Geoffrey Fox, Sung-Hoon Ko, and Sang Lim |
www.npac.syr.edu/projects/pcrc/HPJava/mpiJava.html |
Class hierarchy. MPI is already object-based. "Standard" class hierarchy exists for C++. |
Detailed argument lists for methods. Properties of Java language imply various superficial changes from C/C++. |
Mechanisms for representing message buffers. |
Two natural options: |
Follow the MPI standard route: derived datatypes describe buffers consisting of mixed primitive fields scattered in local memory. |
Follow the Java standard route: automatic marshalling of complex structures through object serialization. |
Review the mpiJava wrapper interface. |
Discuss incorporation of derived datatypes in the Java API, and limitations. |
Adding object serialization at the API level. |
Describe implementation using JDK serialization. |
Benchmarks for naïve implementation. |
Ongoing work: optimizing serialization. |
mpiJava (Syracuse) |
JavaMPI (Getov et al, Westminster) |
JMPI (MPI Software Technology) |
MPIJ (Judd et al, Brigham Young) |
jmpi (Dincer et al) |
Implements a Java API for MPI suggested in late `97. |
Builds on work on Java wrappers for MPI started at NPAC about a year earlier. |
People: Bryan Carpenter, Yuh-Jye Chang, Xinying Li, Sung Hoon Ko, Guansong Zhang, Mark Baker, Sang Lim. |
Fully featured Java interface to MPI 1.1 |
Object-oriented API based on MPI 2 standard C++ interface |
Initial implementation through JNI to native MPI |
Comprehensive test suite translated from IBM MPI suite |
Available for Solaris, Windows NT and other platforms |
import mpi.* |
class Hello { |
static public void main(String[] args) { |
MPI.Init(args) ; |
int myrank = MPI.COMM_WORLD.Rank() ; |
if(myrank == 0) { |
char[] message = "Hello, there".toCharArray() ; |
MPI.COMM_WORLD.Send(message, 0, message.length, MPI.CHAR, 1, 99) ; |
} |
else { |
char[] message = new char [20] ; |
MPI.COMM_WORLD.Recv(message, 0, 20, MPI.CHAR, 0, 99) ; |
System.out.println("received:" + new String(message) + ":") ; |
} |
MPI.Finalize() ; |
} |
} |
Interfacing Java to MPI not always trivial, eg, see low-level conflicts between the Java runtime and interrupts in MPI. |
Situation improving as JDK matures. |
Now reliable on Solaris MPI (SunHPC, MPICH), shared memory, NT (WMPI). |
Other ports in progress. |
Send and receive members of Comm: |
void send(Object buf, int offset, int count, |
Datatype type, int dst, int tag) ; |
Status recv(Object buf, int offset, int count, |
Datatype type, int src, int tag) ; |
buf must be an array. offset is the element where message starts. Datatype class describes type of elements. |
MPI derived datatypes have two roles: |
Non-contiguous data can be transmitted in one message. |
MPI_TYPE_STRUCT allows mixed primitive types in one message. |
Java binding doesn't support second role. All data come from a homogeneous array of elements (no MPI_Address). |
A derived datatype consists of |
A base type. One of the 9 basic types. |
A displacement sequence. A relocatable pattern of integer displacements in the buffer array: |
{disp , disp , . . . , disp } |
0 1 n-1 |
Can't mix primitive types or fields from different objects. |
Displacements only operate within 1d arrays. Can't use MPI_TYPE_VECTOR to describe sections of Java multidimensional arrays. |
If type argument is MPI.OBJECT, buf should be an array of objects. |
Allows to send fields of mixed primitive types, and fields from different objects, in one message. |
Allows to send multidimensional arrays, because they are arrays of arrays (and arrays are effectively objects). |
Send buf should be an array of objects implementing Serializable. |
Receive buf should be an array of compatible reference types (may be null). |
Java serialization paradigm applied. Output objects (and objects referenced through them) converted to a byte stream. Object graph reconstructed at the receiving end. |
Initial implementation in mpiJava used ObjectOutputStream and ObjectInputStream classes from JDK. |
Data serialized and sent as a byte vector, using MPI. |
Length of byte data not known in advance. Encoded in a separate header so space can be allocated dynamically in receiver. |
All mpiJava communications, including non-blocking modes and collective operations, now allow objects as base types. |
Header + data decomposition complicates, eg, wait and test family. |
Derived datatypes complicated. |
Collective comms involve two phases if base type is OBJECT. |
Assume in "Grande" applications, critical case is arrays of primitive element. |
Consider N x N arrays: |
float [] [] buf = new float [N] [N] ; |
MPI.COMM_WORLD.send(buf, 0, N, MPI.OBJECT, |
dst, tag) ; |
float [] [] buf = new float [N] [] ; |
MPI.COMM_WORLD.recv(buf, 0, N, MPI.OBJECT, |
src, tag) ; |
For comparision time float [NxN] (no serialization), |
float [1] [NxN] (1d serialization), and |
byte and int versions. |
Cluster of 2-processor, 200 Mhz Ultrasparc nodes |
SunATM-155/MMF network |
Sun MPI 3.0 |
"non-shared memory" = inter-node comms |
"shared memory" = intra-node comms |
byte float |
t = 0.043 t = 2.1 |
ser ser |
byte float |
t = 0.027 t = 1.4 |
unser unser |
byte float |
t = 0.062 t = 0.25 (non-shared) |
com com |
byte float |
t = 0.008 t = 0.038 (shared) |
com com |
Cost of serializing and unserializing an individual float one to two orders of magnitude greater than communication! |
Serializing subarrays also expensive: |
vec vec |
t = 100 t = 53 |
ser unser |
Sources of ObjectOutputStream, ObjectInputStream are available, and format of serialized stream is documented. |
By overriding performance-critical methods in classes, and modifying critical aspects of the stream format, can hope to solve immediate problems. |
Customized ObjectOutputStream replaces primitive arrays with short ArrayProxy object. Separate Vector holding the Java arrays is produced. |
"Data-less" byte stream sent as header. |
New ObjectInputStream yields Vector of allocated arrays, not writing elements. |
Elements then sent in one comm using MPI_TYPE_STRUCT from vector info. |
Class ArrayOutputStream extends ObjectOutputStream { |
Vector dataVector ; |
public Object replaceObject(Object obj) { |
if obj is a primitive array, then |
if obj is (eg) an int[], then |
int len = ((int []) obj).length ; |
dataVector.addElement(new ArrayInfo(INT, len, obj)) ; |
return new ArrayProxy(INT, len) ; |
etc (deal with other primitive types). |
else, if not a primitive array, then |
return obj ; |
} |
} |
Class ArrayInputStream extends ObjectInputStream { |
Vector dataVector ; |
public Object resolveObject(Object obj) { |
if (obj instanceof ArrayProxy) then |
ArrayProxy proxy = (ArrayProxy) obj ; |
switch (proxy.type) { |
case INT : |
int [] dat = new int [proxy.length] ; |
dataVector.addElement(new ArrayInfo(INT, dat.length, dat)) ; |
return dat ; |
etc (deal with other primitive types). |
else return obj ; |
} |
} |
Relatively easy to get dramatic improvements. |
Have only truly optimized one dimensional arrays embedded in stream. |
Next look at direct optimizations for rectangular multi-dimensional arrays---replace wholesale in stream? |
Derived datatypes workable for Java, but slightly limited. |
Object basic types attractive on grounds of simplicity and generality. |
Naïve implementation too slow for bulk data transfer. |
Optimizations should bring asymptotic performance in line with C/Fortran MPI. |
www.npac.syr.edu/projects/pcrc/HPJava/mpiJava.html |