Applications of Parallel High Performance Computing in Computational -------------------------------------------------------------------- Structural Biology ------------------ A fundamental research issue in computational structural biology is the prediction or determination of the three-dimensional structures of various macromolecules such as proteins. Many approaches to the problem, theoretical or experimental, have been taken, all requiring intensive computation. Parallel high-performance computing has been key to the implementation of the approaches. For example, in the potential energy minimization approach, a semi-empirical function is minimized to find a minimal potential energy conformation of a molecule. The function usually has thousands of variables and the search for a global minimum is almost impossible if without using parallel high-performance computing. Most notable work in the area is done by Scheraga's group at Cornell, who obtained the best structures for several large proteins on various parallel platforms. Reports on this group's work can be found in the web-page http://www.tc.cornell.edu. Efforts have also been made in computer science and applied math communities for developing efficient parallel search algorithms for the potential energy minimization problem. Byrd and Schnabel at University of Colorado have developed parallel stochastic global optimization algorithms and applied them to protein polymers. More' and Wu of Argonne National Lab developed parallel global continuation software on SP2 for the determination of protein structures using distance data from various sources. Another example is the molecular dynamics simulation approach, which studies structural changes of molecules in certain time period. The structural changes are governed by the Newton's Second Law of Motion, and are described by a system of ordinary differential equations, each corresponding to the movement of one of the atoms in the molecule. Given any initial condition, the system of equations can be solved by numerically following the solution trajectory with very small time steps. Millions of time steps may be required even for a nanosecond time period. With current technology, the simulation can only reach several hundreds of nanoseconds, while many of the long-time dynamics are not possible to obtain. For example, protein folding may take several hundreds of milliseconds to several seconds. The protein-folding problem would have been solved if the simulation were feasible for it. Parallel high-performance computing has been used to speedup molecular dynamics simulation. For example, Duan and Kollman at UCSF have been able to simulate protein folding for a small protein with 36 amino acids in a millisecond scale on Cray T3E, which have been considered a major computational breakthrough in structural molecular biology. The result was published in Science. A CRPC-related activity in this area was participated by McCammon and Scott at University of Houston for parallelizing molecular dynamics simulation with particle and space distribution. Two parallel packages, UHGromos and EulerGromos, have been developed as a result of the effort. Details about the work can be found in the CRPC report CRPC-TR93356. A final example in computational structural biology is X-ray crystallography computing for solving protein crystal structures. X-ray crystallography is so far the most successful approach to structure determination. It is responsible for more than 80% of protein structures solved to date. X-ray crystallography structure determination relies on obtaining protein crystals, producing their X-ray diffraction images, and then deriving their structures from the images. The last step requires a lot of computation, from processing the images, to determining the phases for the diffraction patterns, to computing the electron density mapping for the crystal. One of the most difficult and computationally intensive parts is the solution of the phase problem: The electron density distribution of a crystal can be expanded as a Fourier series with a set of complex coefficients called structure factors. The amplitudes of the structure factors can be obtained from the X-ray diffraction data, while the phases are unknown. The phase problem is to determine the phases of the structure factors given their amplitudes from X-ray diffraction experiments. The problem is difficult to solve, requires intensive computation, and has been recognized as one of the grand challenges in the DOE's recent research initiatives in computational sciences. Research efforts on using supercomputers to provide faster and more reliable solutions to the phase problem have been taken in various labs and institutions such as Argonne National Lab and Hauptman-Wooward Institute. References "Global Minimization of Nonconvex Energy Functions: Molecular Conformation and Protein Folding", Panos Pardalos, David Shalloway, and Guoliang Xue, DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Vol 23, 1996. "Pathways to a Protein Folding Intermediate Observed in a 1-Microsecond Simulation in Aqueous Solution", Yong Duan and Peter A. Kollman, Department of Pharmaceutical Chemistry, University of California, San Francisco