Subject: RE: C492: PoLAPACK: Parallel Factorization Routines with Algorithmic Blocking From: "Jaeyoung Choi" Date: Thu, 8 Feb 2001 22:25:45 +0900 To: fox@csit.fsu.edu Hi, Prof. Fox, Thank you for sending the referee's report on my paper, "C492: PoLAPACK: Parallel Factorization Routines with Algorithmic Blocking." Enclosed are my answer to the referee's report and the original paper. If it is required for me to modify or update the paper, please let me know. Sincerely, Jaeyoung Choi choi@comp.ssu.ac.kr ========================================================= Refereee Report C492: PoLAPACK: Parallel Factorization Routines with Algorithmic Blocking This is an important paper which should be published. Please discuss the impact of recent work by Dongarra on the ATLAS Scheme. Further discuss the level of effort (compsred to that done so far) to convert all the LAPACK routines. Do you expect your good results to be generally applicable? ========================================================= I think I understand well the ATLAS scheme, which had been developed for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units by R. Clint Whaley and J. J. Dongarra at Univ. of Tennessee at Knoxville. It is possible to obtain the optimal computational block size from the ATLAS, and to obtain the near maximum performance if the data matrix is distributed with the optimal block size. However, the PoLAPACK assumes that the data matrix is already distributed with arbitrary block sizes before starting one of factorizations. With the algorithmic blocking of the PoLAPACK, it is possible to compute the optimal block size, and always obtain the near optimal performance irrespective of the physical block size. I am currently searching to apply the algorithmic blocking scheme to other applications. The best applications are the factorizations, which are shown in the paper.