1 |
Data Parallel approach is really only useful for the simple O(N2) case and even here it is quite tricky to express algorithm so that it is
-
both Space Efficient and
-
Captures factor of 2 savings from Newton's law of action and reaction Fij = - Fji
-
We have discussed these issues in a different foilset
|
2 |
The shared memory approach is effective for a modest number of processors in both algorithms.
-
It is only likely to scale properly for O(N2) case as the compiler will find it hard to capture clever ideas needed to make fast multipole efficient
|
3 |
Message Parallel approach gives you very efficient algorithms in both O(N2) and O(NlogN)
-
O(N2) case has very low communication
-
O(NlogN) has quite low communication
|