CPS 615- Introduction to Computational Science
Final Project

German VEGA
e-mail: gvega@top.cis.syr.edu


Solution to the Poisson's equation with Dirichlet boundary condictions


1. The poisson equation with Dirichlet boundary conditions

We consider solving Poisson's equation

d2U/dx2 + d2U/dy2 = F(x,y)

on a rectangular region R= { (x,y) : a<= x <= b, c <= y <= d } with Dirichlet boundary conditions:

U(x,c) = south(x) , a<=x<=b

U(x,d) = north(x), a<=x<=b

U(a,y) = west(y), c<=y<=d

U(b,y) = east(y), c<=y<=d

We want to compute an approximation uij to U(xi,yj) at each of the interior points of a grid defined by

a= x0 < x1< ... < x n1=b, xi = x0+i*hx where hx = (b-a)/n1

c= y0 < y1< ... < y n2=d, yj = y0+j*hy where hy = (d-c)/n2

Replacing the derivatives by appropiate second-order central-difference approximations, we can approximate Poisson's equation at the interior points by

(ui,j-1-2ui,j+ui,j+1)/hx2 + (ui-1,j-2ui,j+ui+1,j)/hy2 = F(xi,yj)

where some of the ui,j are the known values at the boundary of the region R.

This defines a set of linear equations that can be expressed in matrix terms as:

(1/hx2)Tn1-1u(n1-1)x(n2-1) + (1/hy2)u(n1-1)x(n2-1) Tn2-1 = g

where u is comprised of the unknown points at the interior of the grid, g is made up of F-values from the interior of R and samplings of south/(hy2), north/(hy2), west/(hx2), east/(hx2) around the boundary, and Tn is the tridiagonal matrix


Tn =
           -2   1   0  0 ...             0
            1  -2   1  0 ...             0
            0   1  -2  1 ...             0
            .   .   .  . ...             .       n-by-n
            .   .   .  . ...             .
            0            ...      1  -2  1
            0            ...      0   1 -2

Other boundary conditions (Dirichlet-Neumann,Neumann-Neumann,Periodic) give rise to very similar systems of equations. For a general discussion on the systems arising from other kinds of Partial Differential Equations see [1].

We can find a solution of this set of linear equations based on the eigensystem of the matrix Tn-1 with the following procedure:

If Vn-1 and Dn-1 are, respectively, the matrix of eigenvectors and the diagonal matrix of the eigenvalues of Tn-1

Tn-1=Vn-1Dn-1Vn-1-1

and the system of linear equations becomes :

(1/hx2)Vn1-1Dn1-1Vn1-1-1 u(n1-1)x(n2-1) + (1/hy2)u(n1-1)x(n2-1) Vn2-1Dn2-1Vn2-1-1 = g

Dn1-1 Vn1-1-1 u(n1-1)x(n2-1) Vn2-1 + (h2) Vn1-1-1 u(n1-1)x(n2-1) Vn2-1 Dn2-1 = Vn1-1-1 hx2 g Vn2-1

Dn1-1 u~(n1-1)x(n2-1) + (h2) u~(n1-1)x(n2-1) Dn2-1 = g~

Where h = hx/hy, u(n1-1)x(n2-1) = Vn1-1 u~(n1-1)x(n2-1) Vn2-1-1 and g~ = Vn1-1-1 hx2 g Vn2-1

The algorithm to solve the linear system is:

  1. Calculate ( hx2)* g from F, east, west, north, south
  2. Apply Vn2-1 to the rows of g
  3. Apply Vn1-1-1 to the columns of g
  4. Calculate u~ using the relationship
    u~i,j = g~i,j / ( lambdain1-1 +(h2lambdajn2-1))
  5. Apply Vn2-1-1 to the rows of u
  6. Apply Vn1-1 to the columns of u

Of course this procedure is worthy only if there is an eficient way of calculating Vn-1 and Vn-1-1, and applying them to the rows and columns of a matrix.

2. The eigensystem of Tn-1

We can verify that the eigenvectors and eigenvalues of the matrix Tn-1 are:

Di,i = lambdai = 2 *(cos (i*pi/n) -1) 1<= i <= n-1

Vi,j = sin (i*j*pi/n) 1 <= i,j <= n-1

and applying Vn-1 to an array is equivalent to calculate its Discrete Sine Transform (DSTn-1).

Several algorithms, based on the Fourier Transform, for efficiently calculate the DST of a vector are formally presented in [2] and I will only outline here the basic idea.

From the definition of the Discrete Fourier Transform Fm

Fi,j = cos (i*j*pi*2/m) - i sin (i*j*2*pi/m) 0<=i,j<=m-1

we can see that we can "extract" the Sine Transform DSTn-1 from the imaginary parts of the Fourier Transform F2n of an extended vector.

The simples approach, presented in [3], is to construct, from the input vector, an extended vector

xe2n = (0,x1,..,xn-1,0,0...0)

and the relationship between the DST and the Fourier transform is

DSTn-1 xn-1 = i(F2n xe2n) (1:n-1)

An alternative method is to construct another extended vector

xe2n = (0,x1,..,xn-1,0,-xn-1...-x1)

in this case, the relationship between the DST and the Fourier transform is

DSTn-1 xn-1 = (i/2)(F2n xe2n) (1:n-1)

Many of the algorithms presented in [2] exploit the symmetries of the Fourier Transform, and of this particular extended vector, to reduce the number of operations.

Another important result, that can be verified from the relationship to the Fourier Transform, is that

DSTn-1-1 = (2/n)DSTn-1

Then, the problem of applying the eigenvectors matrix of Tn-1, and its inverse, is reduced to apply the Sine Transfrom DSTn-1, wich in turns, is reduced to a Fourier Transform F2n. So, to efficiently implement the Poisson solver, we need an efficient way to calculate the Fourier Transform.

3. Fast Fourier Transform

Most of the algorithms to perform a "fast" Fourier Transform are based on a divide-and-conquer idea, wich consist in expressing Fmxm in terms of smaller Fourier Transforms Fm1 xm1and Fm2xm2.

One the most used algorithm, is the binary version in wich Fm is expressed in terms of Fm/2. This is the 2-radix algotithm, wich is based on the following result, wich relates the Fourier Transform of a sequence to the Fourier Transforms of its even and odd subsequences:

If x e Cm, then Fmx(0:m-1) can be calculated as

(Fmx )(0:m/2-1) = Fm/2x(0:2:m-1) + omegam/2Fm/2x(1:2:m-1)

(Fmx )(m/2:m-1) = Fm/2x(0:2:m-1) - omegam/2Fm/2x(1:2:m-1)

Where omegap is a diagonal matrix wich components are

omegap(i,i) = (wp)i 1<=i<=p-1

and wp = cos(2*pi/p) - i sin(2*pi/p)

If m is a power of 2, we can apply this result down to a 1-point Fourier Transform, and we obtain the following recursive algorithm, wich is 0(m log2 m):

function y = FFT(x,m)
   if (m=1) 
      y = x
   else
      p = m/2
      w = cos(2*pi/p) - i sin(2*pi/p)
      omega = diag(1,w,...,wm-1)
      u = FFT(x(0:2:m-1),p)
      z = omega * FFT(x(1:2:m-1),p)
      y(0:p-1) = u+z
      y(p:m-1) = u-z

Different methods to implement iterative versions of this algorithm are extensivily discussed in [2], as well as the general discussion of other so-called general-radix algorithms wich uses the transform of smaller subsequences of the input vector to calculate its transform.

4. Description of the implementation

I implemented a very "direct translation" of the algorithms, so that the solution presented is not certainly the most efficient version. Many improvements can be done, especially by avoiding duplicated calculations, but I think that this version clearly illustrates the method itself. The complete program can be found here.

The poisson solver

The first part of the program implements the algorithm described in section 1.

This two arrays will hold all the intermediate results

      DOUBLE PRECISION, DIMENSION(1:N1-1,1:N2-1) :: PHI
      DOUBLE PRECISION, DIMENSION(1:N2-1,1:N1-1) :: PHI_T

!HPF$ DISTRIBUTE PHI   (*,BLOCK) ONTO PROCS_ARRAY
!HPF$ DISTRIBUTE PHI_T (*,BLOCK) ONTO PROCS_ARRAY

The first step is to store the value of h2g in array PHI

C     ****************************************************************
C       THE POISSON'S EQUATION FUNCTION F 
C     ****************************************************************
      FORALL (I=1:N1-1,J=1:N2-1) 
         PHI(I,J) = F(I,J,HX,HY) * (HX*HX)
      END FORALL

C     ****************************************************************
C       THE BOUNDARY TERMS FROM THE FINITE-DIFFERENCE EQUATION
C     ****************************************************************

      FORALL (I = 1:N1-1) 
         PHI(I,1) = PHI(I,1)-UVAL(I,0)
      END FORALL
      FORALL (I = 1:N1-1) 
         PHI(I,N2-1) = PHI(I,N2-1)-UVAL(I,N2)
      END FORALL
      FORALL (J = 1:N2-1)
         PHI(1,J) = PHI(1,J)-(UVAL(0,J)*(H*H))
      END FORALL
      FORALL (J = 1:N2-1)
         PHI(N1-1,J) = PHI(N1-1,J)-(UVAL(N1,J)*(H*H))
      END FORALL

Then we have to apply an inverse DSTn1-1 to the columns of PHI, and a DSTn2-1 to the rows of PHI. Because the matrix is block-distributed by columns, operations performed on the columns don't require any communication and can be performed in parallel.

On the other hand, operations performed on the rows involve a great deal of communication. By transposing the matrix, one can cluster all the communications and then perform the operation without communication.

C     ****************************************************************
C       APPLY INVERSE EIGEN-VECTORS TO THE COLUMNS OF PHI
C     ****************************************************************
!HPF$ INDEPENDENT
      DO J = 1,N2-1
         CALL FAST_SINE(PHI(:,J),N1,.TRUE.)
      END DO

C     ****************************************************************
C       APPLY EIGEN-VECTORS TO THE ROWS OF PHI
C     ****************************************************************
      PHI_T = TRANSPOSE(PHI)
!HPF$ INDEPENDENT      
      DO I = 1,N1-1
         CALL FAST_SINE(PHI_T(:,I),N2,.FALSE.)
      END DO

At this point, we have the value of g~ stored in PHI_T, and we can calculate the value of u~. This is an embarrasing parallel computation.

C     *****************************************************************
C       SCALE PHI USING THE EIGEN-VALUES
C     *****************************************************************
      FORALL (I=1:N1-1,J=1:N2-1) 
         PHI_T(J,I) = PHI_T(J,I)/ (LAMBDAX(I)+LAMBDAY(J))
      END FORALL

Now, PHI_T has the value of u~, if we apply a DSTn1-1 to the columns of PHI, and an inverse DSTn2-1 to the rows of PHI, we get the sought values of u

C     ****************************************************************
C       APPLY INVERSE EIGEN-VECTORS TO THE ROWS OF PHI
C     ****************************************************************
!HPF$ INDEPENDENT
      DO I = 1,N1-1
         CALL FAST_SINE(PHI_T(:,I),N2,.TRUE.)
      END DO

C     ****************************************************************
C       APPLY EIGEN-VECTORS TO THE COLUMNS OF PHI
C     ****************************************************************
      PHI = TRANSPOSE(PHI_T)
!HPF$ INDEPENDENT      
      DO J = 1,N2-1
         CALL FAST_SINE(PHI(:,J),N1,.FALSE.)
      END DO

The Fast Sine Transform

This is a direct implementation of the relationship between DST and FT described in section 2.. As mentioned before, this can be improved by exploiting the symmetries of the extended vector.

C     ****************************************************************
C       IF WE EXTEND THE INPUT VECTOR TO THE 2*N VECTOR
C           XE = [0,X(1),...,X(N-1),0,-X(N-1),-X(N-2),...,-X(1)]
C       THEN
C           SINE TRANSFORM = [ i/2 FFT(XE) ](1:N-1)
C     ****************************************************************

         KK = 0
         Z(KK)  = CMPLX(0D0,0D0)
         KK = KK+1
         DO K = 1,N-1
           Z(KK) = CMPLX(VECTOR(K),0D0)
           KK    = KK+1
         END DO
         Z(KK) = CMPLX(0D0,0D0)
         KK = KK+1
         DO K = N-1,1,-1
           Z(KK) = CMPLX((-1)*VECTOR(K),0D0)
           KK    = KK+1
         END DO

C     ****************************************************************
C        APPLY FAST FOURIER TRANSFORM AND EXTRACT SINE TRANSFORM
C     ****************************************************************
         
         CALL FFT(Z,2*N)
         Z  = Z * (0d0,5d-1)
         VECTOR(1:N-1) = DBLE(Z(1:N-1))

C     ****************************************************************       
C       INVERSE SINE TRANSFORM = (2/N)SINE TRANSFORM
C     ****************************************************************
        IF (INVERSE) THEN
           VECTOR = (2 * VECTOR) / N
        END IF

The Fast Fourier Transform

This is an iterative implementation of the algorithm described in section 3.

The algorithm advances by "levels", at each level L, two FFTL/2 are combined to produce an FFTL.

In every level, m/L of this updates (usually called "butterflies") are performed, this is a source of parallelism that can be exploited. Parallel algorithms to perform a single FFT in a distributed memory machine are presented in [2] and [3]. This improve the speedup of the parallel solution, but to better arrange the communication patterns it's necessary to merge the Poisson solver and the FFT procedure and this somewhat "obscures" the described method.

I choose to implement the following serial algorithm of a 2-radix FFT

C     ****************************************************************
C       BIT REVERSE PERMUTATION OF THE VECTOR
C     ****************************************************************

        DO INDEX = 0,M-1

C          BIT REVERSE THE INDEX

           LOGCOUNTER = M
           RINDEX     = 0
           Q          = INDEX
           DO           
              IF (LOGCOUNTER .EQ. 1) EXIT

              S = FLOOR(REAL(Q/2))
              RINDEX = (2*RINDEX)+ (Q - (2*S))
              Q = S
              LOGCOUNTER = LOGCOUNTER/2
           END DO

C          SWAP ENTRIES

           IF (RINDEX > INDEX) THEN
              TEMP        = X(INDEX)
              X(INDEX)    = X(RINDEX)
              X(RINDEX)   = TEMP
           END IF
        END DO

C     ****************************************************************
C       PERFORM BUTTERFLY UPDATE
C     ****************************************************************
        HALFLEVEL  = 1
        LEVEL      = 2
        DO
          IF (LEVEL .GT. M) EXIT

          COLS    = M/LEVEL

C     ****************************************************************
C         APPLY LEVEL-BUTTERFLY UPDATE TO KTH ROW OF X(LEVEL:COLS)
C     ****************************************************************
          DO K = 0,COLS-1
            DO J = 0,HALFLEVEL-1
              OMEGA = CMPLX(COS(2*PI*J/LEVEL),(-1)*SIN(2*PI*J/LEVEL))
              CTEMP = OMEGA *  X((K*LEVEL)+ J +HALFLEVEL)
              X((K*LEVEL)+ J +HALFLEVEL) = X((K*LEVEL)+ J) - CTEMP
              X((K*LEVEL)+ J )           = X((K*LEVEL)+ J) + CTEMP
            END DO
          END DO

          HALFLEVEL  = LEVEL
          LEVEL      = LEVEL * 2
        END DO

5. Test and Results

The program was tested with the following Poisson equation

d2U/dx2 + d2U/dy2 = (x*x+y*y) exp(x*y)

on a rectangular region R= { (x,y) : 0<= x <= 1, 0 <= y <= 1 } with Dirichlet boundary conditions:

U(x,0) = south(x) = 1 , 0<=x<=1

U(x,1) = north(x) = exp(x), 0<=x<=1

U(0,y) = west(y) = 1 , 0<=y<=1

U(1,y) = east(y) = exp(y), 0<=y<=1

One can verify that the analytic solution is

U(x,y) = exp(x*y)

The results of different runnings are available here.

The following graph summarizes the results for the sequential runnings (one processor). We can observe the logarithmic behaviour of the Poisson solver based on the 2-radix Fourier transform.

ONE PROCESSOR

In the following graph, I compare the parallel runnings to the sequential runnings, we can observe an almost linear speedup. The figures for n=32 are somehow bogus, but we see a common trend in all cases.

RELATIVE SPEEDUP

References.

[1] An introduction to Fast Fourier Transform methods for Partial Differential Equations with applications. Morgan Pickering. Research Studies Press, Letchworth, Hertfordshire, England. Applied and Engineering Mathematics Series, Vol. 4, 1986

[2]Computational Frameworks to the Fast Fourier Transform. Charles Van Loan, Cornell University. Society for Industrial and Applied Mathematics, Philadelphia. Frontiers in Applied Mathematics Series, Vol. 10, 1992

[3]Lecture Notes CS267, Applications of Parallel Computers. Prof. Jim Demmel. U.C. Berkeley. Spring 1996. Lectures 15 and 16: Solving the Discrete Poisson Equation using Jacobi, SOR, Conjugate Gradients and the FFT. Available at http://HTTP.CS.Berkeley.EDU/~demmel/cs267/


Created: Monday, December 23, 1996, 9:18:45 AM by German VEGA ( gvega@top.cis.syr.edu) Last Updated: Monday, December 23, 1996, 9:18:45 AM