1 | Now to understand performance of general algorithms in parallel or in caches, one must understand storage of the arrays. |
2 | Note first that labeling is not necessarily same as storage |
3 | if p(l) is any permutation of l=0...d-1, then we can store |
4 | This can be used to improve sequential performance by ensuring that when a given digit l is being processed, its storage digit p(l) is in range 0? p(l) < L, where the cache can hold 2L entries |