Subject: C427 JGSI Review
Resent-Date: Thu, 30 Sep 1999 22:39:16 -0400
Resent-From: Geoffrey Fox <gcf@npac.syr.edu>
Resent-To: p_gcf@npac.syr.edu
Date: Fri, 10 Sep 1999 07:45:29 +0200
From: Michael Philippsen <phlipp@ira.uka.de>
To: Geoffrey Fox <gcf@npac.syr.edu>
CC: phlipp@ira.uka.de

Title:  Object-Serialization for Marshalling Data in a Java Interface to MPI
Authors:Bryan Carpenter, Geoffrey Fox, Suang Hoon Ko, and Sang Lim

a)Overall Recommendation
------------------------
ACCEPT

b)Words suitable for authors
----------------------------

- On p6 you describe that you need two messages: the first sends the
size, the second sends the data. Why is this idea better than to use a
bigger first message that sends the size and the first segment of the
data. The second message can be optimized away if the total array has
been sent in the first message.

- From the text I cannot see which JIT has been used on the Solaris
machines? Was it HotSpot? HotSpot-people mentioned that they have
built in better support for arrays of primitive types. I'm curious.

- Since I've seen nice performance numbers for Symmantec's JIT on
Wintel machines, I would like to see the results of your benchmarks
benchmarks between PCs.

- Figures: The lines in Fig 3 are not explained. Which line
corresponds to which equation. I don't see any equations numbered 1 to
3 in Table 1.  Figures 6/byte should have the same axis
(0,50,100,150,200,...) as Fig 3.  Same for Fig 7, moreover shared/byte
should use 300 as an upper end of the axis (instead of 500)

Figures 6 and 7 are hard to digest. The reader must remember what the
lines and the open icons are. Only with this information in mind, the
reader is able to see the improvement. What is the reason for showing
the lines again?  An important problem is that the reader does no
longer see the goal: From my understanding it is the goal to approach
the triangulars.

- Please try to give an average improvement factor for both versions
of you streams in the text.

- Please explain why the byte[][] got *slower* in Figure 6/byte

- A main cost factor when sending arrays of primitive types is to
access them in the JNI. Most likely the JVM will copy the data from an
Java-internal area into a C-array. Please discuss this problem. Please
avoid "zero-copy" since there is *copying*.

- What do you do in case of heterogeneous clusters where you have big &
little endian machines? You will need some form of processing of float
arrays. It is insufficient to just pass the C-float[] around.

- KaRMI tries to reduce the copying as well. KaRMI can use specific
communication hardware (e.g. VIA) and could use native MPI-routines
for communication. One can plug in the mechanism of choice.

Some minor bugs:
----------------

- 3rd line of 2nd paragraph of section 1.1: "serialiation"

- end of item 2 on section 3: "the this buffer descriptor"

- two paragraphs below: "presented in the next suggest that ..."

- 2nd line on p6: "subset of the of array"

- end of 1st paragraph of Section 4: "elements.2."

- Footnote 2: "that there some debate" ...
  "various proposals for for optimized"

- Remove "All timings are in microseconds" from the 1st paragraph on
p8. Instead, put it behind all numbers in Table 1. I would prefer to
see all t's lined up in Table 1.

- find a consistent spelling for: ping-pong, Pingpong

- Caption of Figure 4: "...for handling arrays" please add "of
primitive type"

- 3rd paragraph on p11: "Figure 3 shows the effect". Should be Figure
6.

References:
-----------
- [5] an updated version will appear in the same issue of CPE.
  same for [17] and [18]
- [6] "jmpi"? I'm not sure, it might be "JMPI"