In the precomputation stage, there are six arrays to be computed. They are
position
vectors, basis functions, and the divergence of the basis functions at each
integration point on the
source patches and field patches, respectively. These arrays are three or
four dimensional arrays. One of the
dimensions of each of these arrays has the size of the maximum number of
patches. For the problem we are interested in, the maximum number of the
patches could be in the range from one thousand to several thousand. Each
array is required by the filling algorithm in each node. We can simply let
each node compute these arrays independently. However, it is too
computationally expensive and inefficient, so instead these arrays will be
computed in parallel.
Let and
denote the position
arrays of the source patch and field patch, respectively. Also
and
respectively denote the basis
function arrays
on the source patches and the field patches, and
and
denote the divergences of the basis function on the source
patches and the field patches, respectively where
,
or 7,
, and
, the number of the patches in the model.
To compute these arrays,
we first divide the computational work almost evenly for each node. Each
node works on its portion, and when it has completed its
work the node broadcasts the result to the rest of nodes.
The partition of the arrays is done along the largest dimension.
Let , and
, then these nodes whose
index values are less than q compute
components of the arrays, and the rest of the
nodes compute
components. That is, the node 0 computes
, the node 1 computes
, node
computes
, node
computes
, and node p
computes
.
The details of the algorithm are shown in Figure 4.5.
The CMMD library provides a global broadcast function. In a broadcast, a message is sent from a single source to all nodes. All nodes must take part in the broadcast. One node must signal its intention to send the broadcast message; all other nodes must signal their readiness to receive the message. From the hardware aspect, once a broadcast signal is up, the system begins checking the responses. When all the receiving nodes have received the broadcast message, the system considers the broadcast completed and returns the broadcast call. Keep in mind that all nodes receive data simultaneously and all receive the same amount of data in every broadcast. Therefore, every node must have sufficient buffer space to hold the broadcast message.