1 |
Today's processors can achieve high-performance, but this requires extensive machine-specific hand tuning.
|
2 |
Routines have a large design space w/many parameters
-
blocking sizes, loop nesting permutations, loop unrolling depths, software pipelining strategies, register allocations, and instruction schedules.
-
Complicated interactions with the increasingly sophisticated microarchitectures of new microprocessors.
|
3 |
A few months ago no tuned BLAS for Pentium for Linux.
|
4 |
Need for quick deployment of optimized routines.
|
5 |
ATLAS - Automatic Tuned Linear Algebra Software
|