Base Code: JIT 3.8 and IBM Compiler 2.1 mflops |
Remove runtime checks 33.3 mflops
|
C: Use rectangular array -- not array of pointers -- 44 mflops |
C: Use Hardware fused multiply-add -- 64 mflops |
C: Use standard compiler optimizations (associativity) 138 mflops |
Fortran: 205 megaflops |
ESSL: 253 megaflops |