subroutine ga_initialize()
Allocate and initialize internal data structures for Global Arrays.
This is a collective operation.
subroutine ga_initialize_ltd(limit) integer limit - amount of memory in bytes per process [input]
Allocate and initialize internal data structures for global arrays and set limit for memory used in global arrays.
limit < 0 means "allow unlimited memory usage"
This is a collective operation.
logical function ga_create(type, dim1, dim2, array_name, chunk1, chunk2, g_a) integer type, dim1, dim2, chunk1, chunk2, g_a character*(*) array_name array_name - a unique character string [input] type - MA type [input] dim1/2 - array(dim1,dim2) as in FORTRAN [input] chunk1/2 - minimum size that dimensions should be chunked up into [input] g_a - integer handle for future references [output]
Creates an array.
Setting chunk1=dim1 gives distribution by vertical strips (chunk2*columns); setting chunk2=dim2 gives distribution by horizontal strips (chunk1*rows). Actual chunks will be modified so that they are at least the size of the minimum and each process has either zero or one chunk. Specifying chunk1/2 as <= 1 will cause that dimension to be distributed evenly.
Note: current distribution algorithm is primitive.
This is a collective operation.
logical function ga_create_irreg(type, dim1, dim2, array_name, map1, nblock1, map2, nblock2, g_a) integer type, map1(*), map2(*), nblock1, nblock2, g_a character*(*) array_name array_name - a unique character string [input] type - MA type [input] dim1/2 - array(dim1,dim2) as in FORTRAN [input] nblock1 - no. of blocks first dimension is divided into [input] nblock2 - no. of blocks second dimension is divided into [input] map1 - ilo for each block [input] map2 - jlo for each block [input] g_a - integer handle for future references [output]
Creates an array by following the user-specified distribution.
This is a collective operation.
logical function ga_duplicate(g_a, g_b, array_name) integer g_a, g_b; character*(*) array_name; array_name - a character string [input] g_a - Integer handle for reference array [input] g_b - Integer handle for new array [output]
Creates a new array by applying all the properties of another existing array.
This is a collective operation.
logical function ga_destroy(g_a) integer g_a [input]
Deallocates the array and frees any associated resources.
This is a collective operation.
subroutine ga_terminate()
Delete all active arrays and destroy internal data structures.
This is a collective operation.
subroutine ga_sync()
Synchronize processes (a barrier) and ensure that all global array operations are complete (or at least appear complete).
This is a collective operation.
subroutine ga_zero(g_a) integer g_a [input]
Sets value of all elements in the array to zero.
This is a collective operation.
double precision function ga_ddot(g_a, g_b) integer g_a, g_b [input]
Computes the element-wise dot product of the two arrays which must be double precision, the same shape and identically aligned:
ga_ddot = SUM_ij a(i,j)*b(i,j)
This is a collective operation.
double complex function ga_zdot(g_a, g_b) integer g_a, g_b [input]
Computes the element-wise dot product of the two arrays which must be double complex, the same shape and identically aligned:
ga_zdot = SUM_ij a(i,j)*b(i,j)
This is a collective operation.
double precision function ga_ddot_patch(g_a, ta, ailo, aihi, ajlo, ajhi, g_b, tb, bilo, bihi, bjlo, bjhi) integer g_a, g_b [input] integer ailo, aihi, ajlo, ajhi [input] g_a patch coordinates integer bilo, bihi, bjlo, bjhi [input] g_b patch coordinates character*1 ta, tb [input] transpose flags
Computes the element-wise dot product of the two (possibly transposed) patches which must be double precision and have the same number of elements.
This is a collective operation.
double complex function ga_zdot_patch(g_a, ta, ailo, aihi, ajlo, ajhi, g_b, tb, bilo, bihi, bjlo, bjhi) integer g_a, g_b [input] integer ailo, aihi, ajlo, ajhi [input] g_a patch coordinates integer bilo, bihi, bjlo, bjhi [input] g_b patch coordinates character*1 ta, tb [input] transpose flags
Computes the element-wise dot product of the two (possibly transposed) patches which must be double complex and have the same number of elements.
This is a collective operation.
subroutine ga_matmul_patch(transa, transb, alpha, beta, g_a, ailo, aihi, ajlo, ajhi, g_b, bilo, bihi, bjlo, bjhi, g_c, cilo, cihi, cjlo, cjhi) integer g_a, ailo, aihi, ajlo, ajhi [input] patch of g_a integer g_b, bilo, bihi, bjlo, bjhi [input] patch of g_b integer g_c, cilo, cihi, cjlo, cjhi [input] patch of g_c double precision/complex alpha, beta [input] character*1 transa, transb [input]
ga_matmul_patch is a patch version of ga_dgemm:
C[cilo:cihi,cjlo:cjhi] := alpha* AA[ailo:aihi,ajlo:ajhi] * BB[bilo:bihi,bjlo:bjhi] ) + beta*C[cilo:cihi,cjlo:cjhi],
where AA = op(A), BB = op(B), and op( X ) is one of
op( X ) = X or op( X ) = X',
Valid values for transpose arguments: 'n', 'N', 't', 'T'. It works for both double precision and double complex data tape.
This is a collective operation.
subroutine ga_scale(g_a, s) integer g_a [input] double precision/complex/integer s [input]
Scales an array by the constant s.
This is a collective operation.
subroutine ga_scale_patch(g_a, ailo, aihi, ajlo, ajhi, s) integer g_a [input] double precision/complex/integer s [input] integer ailo, aihi, ajlo, ajhi [input] g_a patch coordinates
Scales a patch of an array by the constant s.
This is a collective operation.
subroutine ga_add(alpha, g_a, beta, g_b, g_c) integer g_a, g_b, g_c [input] double precision/complex/integer alpha, beta [input]
The arrays (which must be the same shape and identically aligned) are added together element-wise
c = alpha * a + beta * b.
The result (c) may replace one of the input arrays (a/b).
This is a collective operation.
subroutine ga_add_patch (alpha, g_a, ailo, aihi, ajlo, ajhi, beta, g_b, bilo, bihi, bjlo, bjhi, g_c, cilo, cihi, cjlo, cjhi) integer g_a, g_b, g_c [input] double precision/complex/integer alpha, beta [input] integer ailo, aihi, ajlo, ajhi [input] g_a patch coordinates integer bilo, bihi, bjlo, bjhi [input] g_b patch coordinates integer cilo, cihi, cjlo, cjhi [input] g_c patch coordinates
Patches of arrays (which must have the same number of elements) are added together element-wise.
This is a collective operation.
subroutine ga_fill_patch(g_a, ailo, aihi, ajlo, ajhi, s) integer g_a [input] double precision/complex/integer s [input] integer ailo, aihi, ajlo, ajhi [input] g_a patch coordinates
Fills the patch of an array with value s. Type of s and array must match.
This is a collective operation.
subroutine ga_summarize(verbose) logical verbose [input]! If true print distribution info
Prints info about allocated arrays.
subroutine ga_symmetrize(g_a) integer g_a [input/output]
Symmetrizes matrix A represented with handle g_a: A:= .5 * (A+A').
This is a collective operation.
subroutine ga_transpose(g_a, g_b) integer g_a [input] ! remains unchanged integer g_b [output]
Transposes a matrix: B = A', where A and B are represented by handles g_a and g_b.
This is a collective operation.
subroutine ga_diag(g_a, g_s, g_v, eval) integer g_a [input] ! Matrix to diagonalize integer g_s [input] ! Metric integer g_v [output] ! Global matrix to return evecs double precision eval(*) [output] ! Local array to return evals
Solve the generalized eigen-value problem returning all eigen-vectors and values in ascending order. The input matrices are not overwritten or destroyed.
This is a collective operation.
subroutine ga_diag_reuse(control, g_a, g_s, g_v, eval) integer control [input] ! Control flag integer g_a [input] ! Matrix to diagonalize integer g_s [input] ! Metric integer g_v [output] ! Global matrix to return evecs double precision eval(*) [output] ! Local array to return evals
Solve the generalized eigen-value problem returning all eigen-vectors and values in ascending order. Recommended for REPEATED calls if g_s is unchanged. Values of the control flag:
value action/purpose 0 indicates first call to the eigensolver >0 consecutive calls (reuses factored g_s) <0 only erases factorized g_s; g_v and eval unchanged (should be called after previous use if another eigenproblem, i.e., different g_a and g_s, is to be solved)
The input matrices are not destroyed.
This is a collective operation.
subroutine ga_diag_std(g_a, g_v, eval) integer g_a [input] ! Matrix to diagonalize integer g_v [output] ! Global matrix to return evecs double precision eval(*) [output] ! Local array to return evals
Solve the standard (non-generalized) eigenvalue problem returning all eigenvectors and values in the ascending order. The input matrix is neither overwritten nor destroyed.
This is a collective operation.
subroutine ga_lu_solve(trans, g_a, g_b) character trans [input] ! transpose or not transpose integer g_a [input] ! coefficient matrix integer g_b [output/output] ! rhs/solution matrix
Solve the system of linear equations op(A)X = B based on the LU factorization.
op(A) = A or A' depending on the parameter trans:
Matrix A is a general real matrix. Matrix B contains possibly multiple rhs vectors. The array associated with the handle g_b is overwritten by the solution matrix X.
This is a collective operation.
integer function ga_cholesky(uplo, g_a) character uplo [input] ! lower/upper triangle integer g_a [input/output] ! matrix
Computes Cholesky factorization for an SPD matrix. The result overwrites the original matrix. The input flag uplo ('L' or 'U') determines if lower or upper triangular matrix is created.
It returns
= 0 : successful exit > 0 : the leading minor of this order is not positive definite and the factorization could not be completed
This is a collective operation.
integer function ga_spd_invert(g_a) integer g_a [input/output] ! matrix
It computes the inverse of a double precision using the Cholesky factorization of a NxN double precision symmetric positive definite matrix A stored in the global array represented by g_a. On successful exit, A will contain the inverse.
It returns
= 0 : successful exit > 0 : the leading minor of this order is not positive definite and the factorization could not be completed < 0 : it returns the index i of the (i,i) element of the factor L/U that is zero and the inverse could not be computed
This is a collective operation.
integer function ga_llt_solve(g_a, g_b) integer g_a [input] ! coefficient matrix integer g_b [output/output] ! rhs/solution matrix
Solves a system of linear equations
A * X = B
using the Cholesky factorization of an NxN double precision symmetric positive definite matrix A (epresented by handle g_a). On successful exit B will contain the solution X.
It returns:
= 0 : successful exit > 0 : the leading minor of this order is not positive definite and the factorization could not be completed
This is a collective operation.
integer function ga_solve(g_a, g_b) integer g_a [input] ! coefficient matrix integer g_b [output/output] ! rhs/solution matrix
Solves a system of linear equations
A * X = B
It first will call the Cholesky factorization routine and, if sucessfully, will solve the system with the Cholesky solver. If Cholesky will be not be able to factorize A, then it will call the LU factorization routine and will solve the system with forward/backward substitution. On exit B will contain the solution X.
It returns
= 0 : Cholesky factoriztion was succesful > 0 : the leading minor of this order is not positive definite, Cholesky factorization could not be completed and LU factoriztion was used
This is a collective operation.
subroutine ga_dgemm(transa, transb, m, n, k, alpha, g_a, g_b, beta, g_c ) Character*1 transa, transb [input] Integer m, n, k [input] Double Precision alpha, beta [input] Integer g_a, g_b, [input] Integer g_c [output]
Performs one of the matrix-matrix operations:
C := alpha*op( A )*op( B ) + beta*C,
where op( X ) is one of
op( X ) = X or op( X ) = X',
alpha and beta are scalars, and A, B and C are matrices, with op( A ) an m by k matrix, op( B ) a k by n matrix and C an m by n matrix.
On entry, transa specifies the form of op( A ) to be used in the matrix multiplication as follows:
transa = 'N' or 'n', op( A ) = A. transa = 'T' or 't', op( A ) = A'. m - On entry, m specifies the number of rows of the matrix op( A ) and of the matrix C. m must be at least zero. n - On entry, n specifies the number of columns of the matrix op( B ) and the number of columns of the matrix C. n must be at least zero. k - On entry, k specifies the number of columns of the matrix op( A ) and the number of rows of the matrix op( B ). K must be at least zero.
This is a collective operation.
subroutine ga_copy(g_a, g_b) integer g_a, g_b [input]
Copies elements in array represented by g_a into the array represented by g_b. The arrays must be the same type, shape, and identically aligned.
This is a collective operation.
subroutine ga_copy_patch(trans, g_a, ailo, aihi, ajlo, ajhi, g_b, bilo, bihi, bjlo, bjhi) character trans [input] transpose operator integer g_a, g_b [input] integer ailo, aihi, ajlo, ajhi [input] g_a patch coordinates integer bilo, bihi, bjlo, bjhi [input] g_b patch coordinates
Copies elements in a patch of one array into another one. The patches of arrays must have the same number of elements.
This is a collective operation.
subroutine ga_get(g_a, ilo, ihi, jlo, jhi, buf, ld) integer g_a [input] integer ilo, ihi, jlo, jhi [input] double precision/complex/integer buf [output] integer ld [input]
Performs the equivalent of the following Fortran-90.
dimension A(1:dim1,1:dim2) dimension buf(1:ld, 1:*) buf(1:ihi-ilo+1, 1:jhi-jlo+1) = A(ilo:ihi, jlo:jhi)
where g_a represents a handle to the array A.
Array bounds are always checked.
subroutine ga_put(g_a, ilo, ihi, jlo, jhi, buf, ld) integer g_a [input] integer ilo, ihi, jlo, jhi [input] double precision/complex/integer buf [output] integer ld [input]
Performs the equivalent of the following Fortran-90:
dimension g_a(1:dim1,1:dim2) dimension buf(1:ld, 1:*) g_a(ilo:ihi, jlo:jhi) = buf(1:ihi-ilo+1, 1:jhi-jlo+1)
where g_a represents a handle to the array A.
Array bounds are always checked.
subroutine ga_acc(g_a, ilo, ihi, jlo, jhi, buf, ld, alpha) integer g_a [input] integer ilo, ihi, jlo, jhi [input] double precision/complex buf [input] integer ld [input] double precision/complex alpha [input]
Performs the equivalent of the following Fortran-90. Array bounds are always checked. Double precision/complex types are supported. The operation is atomic.
dimension g_a(1:dim1,1:dim2) dimension buf(1:ld, 1:*) A(ilo:ihi, jlo:jhi) = A(ilo:ihi, jlo:jhi) + alpha*buf(1:ihi-ilo+1, 1:jhi-jlo+1) where g_a represents a handle to the array A.
subroutine ga_distribution(g_a, iproc, ilo, ihi, jlo, jhi) integer g_a, iproc [input] integer ilo, ihi, jlo, jhi [output] g_a - array handle [input] iproc - process number [input] ilo/ihi/jlo/jhi - range held by process iproc [output]
If no array elements are owned by process iproc, the range is returned as [0, -1] for the i and [0,-1] for the j dimensions.
logical function ga_compare_distr(g_a, g_b) integer g_a, g_b [input]
Compares distributions of two global arrays. Returns .TRUE. if distributions are identical.
This is a collective operation.
subroutine ga_access(g_a, ilo, ihi, jlo, jhi, index, ld) integer g_a [input] integer ilo, ihi, jlo, jhi [input] integer index [output] integer ld [output]
Provides access to the specified patch of array. Returns leading dimension ld and and MA-like index for the data. This routine is intended for writing new GA operations. Call to ga_access should normally follow a call to ga_distribution that returns coordinates of the patch associated with a processor. You need to make sure that the coordinates of the patch are valid (test values returned from ga_distribution).
Your code should include a MA include file, mafdecls.h. The addressing convention:
dbl_mb(index) - for double precision data int_mb(index) - for integer data
dcpl_mb(index) - for double complex data
provides access to the first element (ilo,jlo) of the patch.
For a given subroutine:
subroutine foo(A, nrows, ncols lda) double precision A(lda,*) integer nrows, ncols .... end
you can reference A(ilo:ihi,jlo:jhi) in the following way:
call foo(dbl_mb(index), ihi-ilo+1, jhi-jlo+1, lda)
NOTE: You have to worry about mutual exclusion in simultaneous overlapping access to the data by multiple processors (critical sections are not protected).
Each call to ga_access has to be followed by a call to either ga_release or ga_release_update. You can access only local data.
subroutine ga_release(g_a, ilo, ihi, jlo, jhi) integer g_a [input] integer ilo, ihi, jlo, jhi [input]
Releases access to a global array when the data was read only.
Your code should look like:
call ga_distribution(g_a, myproc, ilo,ihi,jlo,jhi) call ga_access(g_a, ilo, ihi, jlo, jhi, index, ld) < operate on the data > call ga_release(g_a, ilo, ihi, jlo, jhi)
NOTE: see restrictions specified for ga_access
subroutine ga_release_update(g_a, ilo, ihi, jlo, jhi) integer g_a [input] integer ilo, ihi, jlo, jhi [input]
Releases access to the data. It must be used if the data was accessed for writing. NOTE: see restrictions specified for ga_access.
integer function ga_read_inc(g_a, i, j, inc) integer g_a [input] integer i, j, inc [input]
Atomically read and increment an element in an integer array.
*BEGIN CRITICAL SECTION* return value = a(i,j) a(i,j) += inc *END CRITICAL SECTION*
subroutine ga_scatter(g_a, v, i, j, n) integer g_a [input] double precision v(n) [input] integer i(n), j(n), n [input]
Scatters the array elements into the array. The contents of the input arrays (v, i, j) are preserved, but their contents might be (consistently) shuffled on return.
do k = 1, nv a(i(k),j(k)) = v(k) enddo
subroutine ga_gather(g_a, v, i, j, n) integer g_a [input] double precision v(n) [output] integer i(n), j(n), n [input]
Gathers specified elements (i,j) from the global array into a local single-dimesional array. The contents of the arrays (v, i, j) might be (consistently) shuffled on return.
do k = 1, nv v(k) = a(i(k),j(k)) enddo
subroutine ga_error(message, code) character*1 message(*) [input] integer code [input]
To be called in case of an error. Print an error message and an integer value that is intended to be the error code. Releases some system resources. This is the recomended way of aborting the program execution.
logical function ga_locate(g_a, i, j, owner) integer g_a, i, j [input] integer owner [output]
Return in owner the GA compute process id that 'owns' the data. If i/j are out of bounds .FALSE. is returned, otherwise .TRUE..
logical function ga_locate_region(g_a, ilo, ihi, jlo, jhi, map, np ) integer g_a, ilo, ihi, jlo, jhi [input] integer map(5,*), np [output]
Parts of the specified patch might be actually 'owned' by several (precisely np) processes. Return the list of the GA processes id that 'own' the data. If i/j are out of bounds .FALSE. is returned, otherwise .TRUE..
map(1,*) - ilo map(2,*) - ihi map(3,*) - jlo map(4,*) - jhi
specify coordinates of a subpatch 'held' by the process which number is stored in map(5,*)
subroutine ga_inquire(g_a, type, dim1, dim2) integer g_a [input] integer type [output] integer dim1 [output] integer dim2 [output]
Returns type and dimensions of array.
integer function ga_inquire_memory()
Returns amount of memory (in bytes) used in the allocated global arrays on the calling processor.
subroutine ga_inquire_name(g_a, array_name) integer g_a [input] character*(*) array_name [output]
Returns the name of an array represented by the handle g_a.
integer function ga_memory_avail()
Returns amount of memory (in bytes) left for allocation of new global arrays on the calling processor.
Note: If ga_uses_ma returns true, then ga_memory_avail returns the lesser of the amount available under the GA limit and the amount available from MA (according to ma_inquire_avail operation). If no GA limit has been set, it returns what MA says is available.
If ( ! ga_uses_ma() && ! ga_memory_limited() ) returns < 0, indicating that the bound on currently available memory cannot be determined.
logical function ga_uses_ma()
Returns .true. if memory in global arrays comes from the Memory Allocator (MA). .false. means that memory comes from another source, for example System V shared memory is used.
logical function ga_memory_limited()
Indicates if limit is set on memory usage in Global Arrays on the calling processor.
subroutine ga_proc_topology(g_a, proc, prow, pcol) integer g_a [input] integer proc [input] integer prow, pcol [output]
Based on the distribution of the array associated with handle g_a, determines block row and column coordinates for the array section 'owned' by specified processor. The numbering starts from 0. The value of -1 means that the processor doesn't 'own' any section of array represented by g_a.
subroutine ga_print_patch(g_a,ilo,ihi,jlo,jhi,pretty) integer g_a [input] integer ilo,ihi,jlo,jhi [input] coordinates of the patch integer pretty [input]
Prints a patch of g_a array to the standard output. If pretty has the value 0 then output is printed in a dense fashion. If pretty has the value 1 then output is formatted and rows/columns labeled.
This is a collective operation.
subroutine ga_print(g_a) integer g_a [input]
Prints an entire array to the standard output.
This is a collective operation.
subroutine ga_print_stats()
This non-collective (MIMD) operation prints information about:
subroutine ga_check_handle(g_a, string) integer g_a [input] character *(*) string [input]
Check that the global array handle g_a is valid ... if not call ga_error with the string provided and some more info.
subroutine ga_init_fence()
Initializes tracing of completion status of data movement operations.
subroutine ga_fence()
Blocks the calling process until all the data transfers corresponding to GA operations called after ga_init_fence complete. For example, since ga_put might return before the data reaches the final destination ga_init_fence and ga_fence allow process to wait until the data is actually moved:
call ga_init_fence() call ga_put(g_a, ...) call ga_fence()
ga_fence must be called after ga_init_fence. A barrier, ga_sync, assures completion of all data transfers and implicitly cancels outstanding ga_init_fence. ga_init_fence and ga_fence must be used in pairs, multiple calls to ga_fence require the same number of corresponding ga_init_fence calls. ga_init_fence/ga_fence pairs can be nested.
ga_fence works for multiple GA operations. For example:
call ga_init_fence() call ga_put(g_a, ...) call ga_scatter(g_a, ...) call ga_put(g_b, ...) call ga_fence()
The calling process will be blocked until data movements initiated by two calls to ga_put and one ga_scatter complete.
logical function ga_create_mutexes(number)
integer number [input]
Creates a set containing the number of mutexes. Returns .true. if the opereation succeeded or .false. when failed. Mutex is simple synchronization object used to protect Critical Section. Only one set of mutexes can exist at a time. Mutexes can be created and destroyed as many times as needed.
Mutexes are numbered: 0, ..., number -1.
This is a collective operation.
logical function ga_destroy_mutexes()
Destroys the set of mutexes created with ga_create_mutexes. Returns .true. if the operation succeeded or .false. when failed.
This is a collective operation.
subroutine ga_lock(mutex)
integer mutex [input] ! mutex id
Locks a mutex object identified by the mutex number. It is a fatal error for a process to attempt to lock a mutex which was already locked by this process.
subroutine ga_unlock(mutex)
integer mutex [input] ! mutex id
Unlocks a mutex object identified by the mutex number. It is a fatal error for a process to attempt to unlock a mutex which has not been locked by this process.
integer function ga_nodeid()
Returns the GA process id (0, ..., ga_nnodes()-1) of the requesting compute process.
NOTE: the GA process id might not be the same as message-passing node id.
integer function ga_nnodes()
Returns the number of the GA compute (user) processes.
NOTE: the number of the GA processes might not be the same as the number of the message-passing nodes.
subroutine ga_brdcst(type, buf, lenbuf, root) integer type [input] ! message type for broadcast byte buf(lenbuf) [input/output] integer lenbuf [input] integer root [input]
Broadcast from process root to all other processes a message of length lenbuf.
This is a convenience operation available regardless of the message-passing library that GA is running with.
This is a collective operation.
subroutine ga_dgop(type, x, n, op) integer type [input] double precision x(n) [input/output] character*(*) op [input]
Double Global OPeration.
X(1:N) is a vector present on each process. DGOP 'sums' elements of X accross all nodes using the commutative operator OP. The result is broadcast to all nodes. Supported operations include '+', '*', 'max', 'min', 'absmax', 'absmin'. The use of lowerecase for operators is necessary.
This is a convenience operation, available regardless of the message-passing library that GA is running with.
This is a collective operation.
subroutine ga_igop(type, x, n, op) integer type [input] integer x(n) [input/output] character*(*) OP [input]
Integer Global OPeration. The integer version of ga_dgop described above, also include the bitwise OR operation.
This is a convenience operation available regardless of the message-passing library that GA is running with.
This is a collective operation.
subroutine ga_list_nodeid(list, n) integer n [input] integer list(n) [output]
Returns message-passing process ID (rank in MPI) for the GA processes in the range 0 .. n-1. For MPI, the ranks are in MPI_COMM_WORLD.
void ga_mpi_communicator(GA_COMM) MPI_Comm *GA_COMM; [output]
Available in C only if MPI is used. Returns communicator handle for GA processes (group).