5: Java and Fortran Issues for a Parallel CFD

CFD in Fortran: The CFD simulation was in HPF. It uses a 4-stage Runge-Kutta time stepping algorithm and a finite volume central-difference technique to find the solution. In addition, it uses a numerical dissipation model to dampen any spurious oscillations and prevent the solution from blowing up in the presence of shock waves.
CFD in Fortran: For parallelism, the code uses many "Forall" statements. This construct guarantees no dependencies, so the compiler doesn't get confused. Also, arrays were distributed along their longest dimension, which corresponds to the lengthwise direction of the simulated geometric shape (i.e. nozzle). This kind of distribution allows any number of processors to be used. Statements such as DISTRIBUTE and ALIGN can be used for this sort of mapping.
CFD in Fortran: Communication is required especially in the artifical dissipation routine, where 2nd and 4th order derivatives are computed. This and the presence of any load imbalance in computations are the two obstacles against perfect scaling.
CFD in Fortran: Coding was simple optimization according to some intermediate output was not easy. Debugging HPF code is not that easy either.
CFD in Java: Uses the same numerical algorithm as in the HPF implementation. Coding was not a problem at all as long as the user is familiar with aspects of the language. The object-oriented paradigm fits well with these kind of engineering problems. Also, there is more or less an aspect of self-organization. Visualization not only can help understand various properties of the algorithm but also can help modify, tune up, change classes, methods, etc. Therefore, with some help from the AWT library, the code can be optimized to even do better.
CFD in Java: Parallelism was achieved by adding message-passing (MPI) to the sequential version. There is a big advantage to this approach and that is there was very few modifications needed to the original sequential version. In all fairness we could probably say the same thing for HPF.
CFD in Java: Message-passing was initiated between methods computing artificial dissipation, where 2nd and 4th order derivatives are computed. Those are the methods that used continuously throughout the simulation. Because of that, I thought that those are the only methods in the code that could benefit from parallelism. We confined our message-passing to simple send/receive and bcast.
CFD in Java: The parallel version benefited greatly from the efficient and well-designed sequential version, in which inlining and other tune ups were already done. So the result was a robust parallel version
CFD in Java: The inlining helped great deal to speed up execution without creating any degradation to the quality of the solution. Depending on the network and computation nodes, we have noticed that the JavaMPI version was running close to even faster than the CFD HPF version using the same number of processors. I noticed that it definitely ran much faster on a dedicated network of nodes. But the architecture of these nodes are different than those used for the HPF tests.
CFD in Java: There is still more room for improvements. In addition to inlining, JARing can greatly help improve the speed of downloading. Another thing to do is to use thread pooling. This is basically to create a ready supply of sleeping threads at the beginning of execution. Because of the thread startup process is expensive in terms of the system resources, thread pooling makes the startup process a little slower, but improves runtime performance because sleeping (or suspended) threads are awakened only when they are needed to perform new tasks.
CFD in Java: Method inlining, which I already have done, increases performance by reducing the number of method calls the program makes. Other things that can be done is perhaps to try streamlined synchronization, JIT, and other performance enhancing approaches. These are all quite useful tools to try specifically for large scale applications.

Saleh Elmohamed