From kamala@spica.npac.syr.edu Mon Jun 20 00:01:50 1994
Date: Mon, 2 May 94 02:23:44 EDT
From: Kamala Anupindi <kamala@spica.npac.syr.edu>
To: paulc@spica.npac.syr.edu
Subject: report

Hi Paul,

Here's the report.
----------------------------------

In ETMSP, the routine IMPINT is the main driver for the implicit integration
methods (using the trapezoidal method). The major computations involved are:

 1. Obtain the dc solution for the next time step by calling subroutines 
    DCNET1 or DCNETS, if DC links are present.
 2. Predict the system states and voltages by calling subroutine PREDIC. 
 3. Calculate the system state derivatives for all dynamic devices by calling 
    subroutines GXDOT, MXDOT, TCXDOT, SXDOT, RXDOT, TPXDOT, and DBXDOT.
 4. Calculate the residues of system states based on trapezoidal rule.
 5. Calculate the current injection vector, including all dynamic devices,
    and nonlinear loads.
 6. Solve the network equations with the calculated current injection
    vector, by calling subroutines AUPDSC and ASOLSC.
 7. Update the system state variables and the associated algebraic variables
    (Y-variables, internal currents, etc.), by calling subroutine UPSTAT.
 8. After convergence of the calculations at current time, call again
    subroutine ASOLSC to compute all necessary network voltages and other
    quantities for output purpose.

 Steps 3 to 7 above constitute the VDHN iteration loop. The convergence 
 test is performed at the end of step 5 for all current injections. 

 Steps 3, 4 and 7 involve the differential equations formation and
 solution. A profile done on ETMSP clearly indicates that about 55%
 of the total time is being spent in the setting up and the solution
 of differential equations involved. Of this 55%, almost equal amount
 of time is being spent in setting up the equations and in solving them.

 GXDOT, MXDOT, TCXDOT, SXDOT, RXDOT, TPXDOT, and DBXDOT are the routines
 that are used to calculate the derivatives of the dynamic devices for
 use in Trapezoidal methods. DGEN, DEMOTL, DETSCL, DESVC, DERAN, DESTP
 and DESDB are the corresponding routines used to calculate the derivatives
 for use in Runge Kutta method.

 Of all these routines, GXDOT, which calculates the derivatives of detailed
 and classical machines for use in trapezoidal method, takes almost 15-20% 
 of the total time, depending on the number of generators present. Also, it
 is the most time consuming routine in ETMSP as a whole. Part of GXDOT 
 routine's code reads as follows:

(   Line # 85    )
      DO 850 K=1,IPNODE
C
      KN=3*K
      NLF=NOLD(K)
      VOLTK=VOLT(K)
      GTV=CABS(VOLTK)
      AT=ATAN2(VIM,VR)
C
      L=0
 100  L=L+1
      N=N+1
      JG=JCONTA(1,N)
      JYG=MYVAR(1,N)    

      more program deleted

(    Line # 1401  )
C
      IF (L.LT.KTHGE(K)) GO TO 100
 850  CONTINUE

 Here, IPNODE is the last detailed synchronous machine bus, and N is the
 sequential number of the detailed generator. KTHGE is the cross reference
 array for generator internal sequential number and bus number, i.e

 KTHGE(K) = N where K is the program bus number and N is the internal
 sequential number.

 From the code given above, it can be seen that it is not written in 
 a format that is compatible for parallel programming. Rewriting this
 code with minor changes (as given below), it can be made completely
 parallel.

      DO 850  K=1,IPNODE  
C
      KN=3*K
      NLF=NOLD(K)
      VOLTK=VOLT(K)
      GTV=CABS(VOLTK)
      AT=ATAN2(VIM,VR)
C
100   DO 101 L = 1 , KTHGE(K)
      N=N+1
      JG=JCONTA(1,N)
      JYG=MYVAR(1,N)

     (part of program deleted)
C
101  CONTINUE
850  CONTINUE

 Since all processors have all the data to begin with, each one of them
 has the array KTHGE(K) with them. Hence, the above described code is
 now embarassingly parallel at two levels: one at the IPNODE level and
 one at the N level, i.e., one at the synchronous machine bus level and 
 one at the detailed generator level. Similar changes were made in other 
 routines relevant to the setting up of generator equations. 

 On parallelizing the above described routine, the following problems
 were encountered:

 * ETMSP is an interactive program. Hence, when running on SP1, as many
   windows as the number of processor nodes allotted, pop up, each with
   an interactive session for ETMSP. This had to be changed to set values
   of the input in the ETMSP itself, for the purpose of preliminary testing.

 * Once each processor node completes its job, the arrays involved in the
   computation need to be concatenated. The number of such arrays/variables
   for this routine (and for other routines too) runs into hundreds. As
   of now, attempts are being made at 'crude concatenation', i.e. for each
   variable, explicit concatenation statements need to be added. This will
   lead to enormous communication overhead, which is bound to slow down
   the program. (Paul, do we need to make a suggestion here for improvement
   or not?)...

  Is this detailed enough? Would you like anything specific to be added??

  Kamala