Subject: I send my paper From: minami Date: Mon, 05 Mar 2001 16:56:38 +0900 To: fox@csit.fsu.edu Dear Professor Fox, This is Kazuo Minami of RIST, Japan. I am sorry about the delay of my paper for special issue of "Concurrency and Computation". I send my paper in PDF format. Best Regards, Kazuo Minami -------------------------------------------------- E-mail: minami@tokyo.rist.or.jp -------------------------------------------------- We have a prospect that Geofem get good parallel performance on various computer architecture by the research which has been made in GeoFEM team. In this research, we focus a target to performance with a single processor, and common data structure / coding manner to make GeoFEM running high performance on various computer architecture was researched. Test coding for Solver Part of the structure and the fluid analysis code Data structure and coding manner was evaluated on scalar/vector/pseudo vector architecture. A new data structure and a new direct access manner was introduced in the fluid analysis solver. As a result, 21% performance for peak performance was obtained on pseudo vector architecture. And We got as good performance as the pseudo-vector execution performance in scalar execution. 25% performance for peak performance was obtained on vector architecture. A new direct access manner was introduced in the structure code solver. As a result, 27% performance for peak performance was obtained on pseudo vector architecture. 34% performance for peak performance was obtained on vector architecture. Test Code for Matrix Assemble Part of the structure analysis code Coding of removing dependency was finished and performance was evaluated on vector/scalar machine. 736.8MFlops was obtained at matrix assembling process on SX-4. 900.7MFlops was obtained at whole test Code on SX-4, and 2.06GFlops was obtained on VPP5000 (peak : 9.6GFlops). 124MFlops was obtained at matrix assembling process on Alpha system (Alpha 21164 533MHz). should first explain what the original data structure was, and devote more space to how and why the transformation is beneficial. The beginning of section 3.8 is a good example of the author's writing style. "Both (1) and (2) of performance factor [...] was implemented to the model coding of Fig.4." Apart from this being execrable English, it would be so much easier to understand this if the author would simply substitute what performance factors 1 and 2 and figure 4 are about. Now the reader has to leaf back and forth through the paper to understand this sort of comment. According to its caption, figure 12 is the code of "Model coding of direct access". The author should mention that the algorithm is the substitution part of the solver. One more general comment: performance tests are done on two vector pipeline machines and one superscalar chip. It looks to me as though all the transformations are inspired solely by the pipelines. The author in no place remarks on architectural differences, and whether for the Alpha chip different optimisations would have been preferable. In sum, the code transformations presented here are moderately interesting, and could be of use to readers, but the presentation precludes any such usefulness.