At the most abstract level, conjugate gradient has minimal parallelism
-
u, r, a can be updated independently
|
The real potential parallelism is in the matrix and vector operations, however.
-
r × r is a reduction of size N
-
u + a p is a vector update of size N
-
A * p is a (sparse) matrix-vector multiply, in this case of size O(N)
-
It looks a lot like the operator in Jacobi
|
Conclusions:
-
Stick to the matrix/vector operators
-
Task parallelism (and pipelining) may improve some
|