1 | Program for random surfaces is very irregular, with many linked list and pointer trees to traversed. Original code performed very poorly. Completely rewrote and simplified code and gained a substantial speed-up. Still performed poorly on some RISC processors such as the Intel i860. |
2 | Optimal use of cache memory is vital for processors such as the i860, which have a small cache memory, and the time to access external memory is much greater than the time to perform a floating point operation |
3 | Due to the dynamic nature of the code, the neighbors of a site which are used for the update of that site are constantly changing. This means we cannot make good use of the cache unless we keep a new array which constantly readjusts pointers to neighbors, so that data for neighboring sites is kept in neighboring regions of memory. |
4 | Other optimizations on i860 and use of non-IEEE arithmetic gave us a substantial improvement, but still nowhere near 80 Mflops peak performance! |
5 | Program performs much better on more recent processors (IBM RS/6000, Hewlett-Packard). |