1 | Base Code: JIT 3.8 and IBM Compiler 2.1 mflops |
2 |
Remove runtime checks 33.3 mflops
|
3 | C: Use rectangular array -- not array of pointers -- 44 mflops |
4 | C: Use Hardware fused multiply-add -- 64 mflops |
5 | C: Use standard compiler optimizations (associativity) 138 mflops |
6 | Fortran: 205 megaflops |
7 | ESSL: 253 megaflops |