When discussing this topic there are two issues that need to be addressed:
In a nutshell, what we end up having is a two-level parallelism, more or less. The first, a highly concurrent level covering the entire domain or available nodes, and the second, is also distributed/parallel level of computation running in each locality to deal with an independent part of the overall problem.
As far as I know this has not be done yet.
Paradigms such as neural networks are proven to be good tools for tackling learning and recognition problems. Certainly, this can be done in parallel, how?
Well, there is an inherent parallelism in those networks, making their mapping into either a cluster of processors or a multiprocessor system quite natural. For example, for feedforward type nets, backprop can be used to train it to recognize a class of objects or images. Object recognition and function approximation are perhaps the two most important tasks that these class of networks are good at. Training can easily be done in parallel and there have been great deal of this type of work in the past, especially on "special-purpose" SIMD type architectures. Later, the same process was carried out on MIMD hardware, such as the late CM-5, and on networks of workstations.
Not only feedforward nets were used but also others, such as feedback or the Hopfield-style nets. Parallelism can play quite an important role here. Well, for one thing, we can vary the mapping of networks on the hardware. So, instead of using one large network and map it into one large machine, we can have multiple nets distributed over a cluster of SMPs and then train it on whatever training set we have. Throughout the training process, the cluster of nets can communicate between them regarding training patterns or perhaps on how to go about optimizing their training parameters to speed up the learning cycle. Another option is that, one need not to train all of them using the same set of patterns but cluster these patterns into various groups based on some common characteristics, etc. and then concurrently train each member of the herd with one or more cluster of the samples. Hints and other useful aspects can be communicated between the networks during training. I believe this is the most useful and sensible way of training a colony of nets to handle difficult jobs such as function approximation or pattern recognition.