Suppose that the matrix A is banded with bandwidth b and half-width w, where b=2w-1. |
The sequential algorithm is similar to the full matrix case, except at each stage only those elements within a computational "window" of w rows and w columns are updated. |
Partial pivoting can cause the number of columns in the computational window to be greater than m. This necessitates some extra bookkeeping in both the sequential and parallel algorithms. |
The parallel banded and full algorithms are similar, but use a different decomposition. To get better load balancing, a scattered decomposition over both rows and columns is used in the banded algorithm. In the full case, a scattered decomposition just over rows was used. |