Matrix Multiplication Example

IBM (Marc Snir) considers C = A*B where all are 64 by 64 matrices with triple DO loop to execute C[i][j] = C[i][j] + A[i][k] * B[k][j] running on RS6000 model 590

Initial disappointment:
- IBM ESSL Library runs at 253 megaflops
- JDK1.1 plus JIT runs at 3.8 megaflops -- off by a factor of 65

Runtime checks (array indices in bounds, null pointer checks) could essentially be removed by better compiler
- removing checks gives you 33.3 megaflops (only a factor of 8)
- C and Java with same rules give SAME performance

Using rectangular arrays (not vector of vectors) gives 44 megaflops (off by a factor of 6)

Using Hardware fused multiply add gives 64 megaflops (factor of 4)

Remaining factor requires use of associativity to block computation

Previous slide Next slide Back to first slide View graphic version