Parallel Algorithm in Fast Multipole II
Having fetched all needed data, can update all particles in processors with high efficiency seen in O(N2) problem.
- Namely communicated data is re-used many times as needed by many particles in processor
- In practice have some 10,000 or more particles per processor
In astrophysics, also need to construct tree and rebalance load at essentially each time step
There is a straightforward parallel tree building algorithm
These methods outperform O(N2) when more than a few thousand particles (unless very inhomogeneous)