1 |
Data Parallel approach is really only useful for the simple O(N2) case and even here it is quite tricky to express algorithm so that it is
-
both space efficient and
-
captures factor of 2 savings from Newton's law of action and reaction
-
Fij = - Fji
-
We have discussed these issues in previous foils
|
2 |
The shared memory approach is effective for a modest number of processors in both algorithms.
-
It is only likely to scale properly for O(N2) case as the compiler will find it hard to capture clever ideas needed to make fast multipole efficient
|
3 |
Message Parallel approach gives you very efficient algorithms in both cases
-
O(N2) case has very low communication
-
O(NlogN) has quite low communication
|