Matrix Multiplication Example
IBM (Marc Snir) considers C = A*B where all are 64 by 64 matrices with triple DO loop to execute C[i][j] = C[i][j] + A[i][k] * B[k][j] running on RS6000 model 590
Initial disappointment:
- IBM ESSL Library runs at 253 megaflops
- JDK1.1 plus JIT runs at 3.8 megaflops -- off by a factor of 65
Runtime checks (array indices in bounds, null pointer checks) could essentially be removed by better compiler
- removing checks gives you 33.3 megaflops (only a factor of 8)
- C and Java with same rules give SAME performance
Using rectangular arrays (not vector of vectors) gives 44 megaflops (off by a factor of 6)
Using Hardware fused multiply add gives 64 megaflops (factor of 4)
Remaining factor requires use of associativity to block computation