% ACL presentation for PUMA final review---outline. Task 5.1.1 Array Communication Library -------------------------------------- Puma architecture provides efficient point-to-point comms. In data-parallel applications many more elaborate comm patterns recur frequently. Can be implemented in terms of message-passing, but better regarded as more global operations: rearrangements of distributed data (eg, permutation) or simple arithmetic transformations (eg, prefix) on distributed data. ``Array'' in ACL refers to *process* array: library operates within arbitrary-dimensional arrays of processes. Level is somewhere between low-level communication libraries and libraries providing specialised support for particular applications, or specific distribution strategies for data arrays. -----*----- Process arrays -------------- A process array is a logically multi-dimensional process configuration. Each direction has a structure called an ``index'' associated with it, eg, $x$, $y$ and $z$. % figure `array' goes here. -----*----- Interface --------- Index structures are passed to communication routines in much the same way as channels in occam. For example PERM.SHIFT (x, result, data, 3) shifts the values in `data' 3 sites in the `x' direction, and returns the shifted values in `result'. The index structure also contains the coordinate value for the local process associated with the index direction: my_x = INDEX.coord (x) -----*----- Implementations --------------- In the occam/VCR implementation of ACL any two sites connected by an index have a virtual channel between them. The data structure embodying an index contains vectors of these channels. The channels are hidden from the user of the communication library. Thus the same library interface can be provided in non-channel-oriented communication environments. -----*----- Selected entry points --------------------- PERM.DS -- arbitrary permutation. PERM.SHIFT -- a particular permutation, see above RELATE.DS -- a generalised permutation BROADCAST -- and variants PREFIX.REAL32.SUM -- and several other prefixes GLOB.REAL32.SUM -- and several other reductions. -----*----- IO -- As well as operations for communication *within* a particular process array, the library has a sophisticated set of routines for sequential IO. These allow a process array to talk to an external process (eg, a master or host process, or other process arrays---perhaps an array of graphics processors, etc). For these purposes a second type of communication structure is introduced---a logical ``distributed channel''. -----*----- Configuration ------------- The communication library was documented and put into a releasable form earlier this year. Since then concentrated on dynamic configuration procedures for process arrays. With static process configuration the channel connectivity of the networks is exposed to the user in the configuration file. This is a failure of abstraction---the underlying virtual channels are supposed to be hidden (transparent). By configuring ``procedurally''---using the dynamic features of VCR---have possibility of hiding channels *completely*. Also, this run-time configuration is more flexible. -----*----- Dynamic configuration --------------------- The implemented ACL primitives for dynamic configuration use the remote procedure call, dynamic channel creation, and channel-moving operations provided in VCR. One of the most important operations for dynamic configuration is ``channel-copying''. Given a virtual channel connecting two processes, the channel-copying operation creates a new, independent, virtual channel connecting the same processes. -----*----- Recursive spawning of an ``all-to-all'' network ----------------------------------------------- % figure `life' goes here.