Consider the decomposition into roughly equal blocks of the N x N+1 augmented matrix: c = [ a | b ], given here in a 9 x 10 example where some processors have 9 elements and the last column of processors has 12 elements: |
In the first iteration of the algorithm, consider that each processor has roughly n x n elements and find the running time of each part of the algorithm:
|