Status of Parallelism in Various N Body Cases
Data Parallel approach is really only useful for the simple O(N2) case and even here it is quite tricky to express algorithm so that it is
- both Space Efficient and
- Captures factor of 2 savings from Newton's law of action and reaction Fij = - Fji
- We have discussed these issues in a different foilset
The shared memory approach is effective for a modest number of processors in both algorithms.
- It is only likely to scale properly for O(N2) case as the compiler will find it hard to capture clever ideas needed to make fast multipole efficient
Message Parallel approach gives you very efficient algorithms in both O(N2) and O(NlogN)
- O(N2) case has very low communication
- O(NlogN) has quite low communication