1 | Fox, Johnson, Lyzenga, Otto, Salmon and Walker, "Solving Problems on Concurrent Processors", Vol. 1, 1988, p. 167, for the broadcast, multiply, roll algorithm. |
2 | Cannon, L., Ph.D. thesis, Montana State University, Bozeman, MN, 1969. |
3 | Demmel, Heath, and van der Vorst, "Parallel Numerical Linear Algebra", 1993, for a detailed survey of algorithms. |
4 | Agarwal, Gustavson, and Zubair, "A high-performance matrix-multiplication algorithm on a distributed-memory parallel computer, using overlapped communication", IBM Journal of Research and Development, which discusses issues of overlapping computing and communication in block matrix multiply. |