MPI Java Performance Evaluation
Project Information
- Discipline
- Computer Science (401)
- Orientation
- Research
In the last few years, Java gain popularity in processing “big data” mostly with Apache big data stack – a collection of open source frameworks dealing with abundant data, which includes several popular systems such as Hadoop, Hadoop Distributed File System (HDFS), and Spark. Efforts have been made to introduce Java to High Performance Computing (HPC) as well in the past, but were not embraced by the community due to performance concerns. However, with continuous improvements in Java performance an increasing interest has been placed on Java message passing support in HPC. We support this idea and show its feasibility in solving real world data analytics problems. This includes performance evaluation of two MPI Java frameworks - OpenMPI and FastMPJ - for real life machine learning problems.
Intellectual MeritOur analysis will serve as proof that large scale data analytic problems are efficiently solvable using Java ecosystem while benefiting the parallel capabilities of MPI
Broader ImpactsIf we get positive results as expected in this study then we'll be able to take our algorithms to the outside scientific community giving them the opportunity to efficiently analyse their data. The algorithms contain a suite of data clustering and multi-dimensional scaling implementations.
Project Contact
- Project Lead
- Saliya Ekanayake (sekanaya)
- Project Manager
- Saliya Ekanayake (sekanaya)
- Project Members
- Nigel Pugh, Tori Wilbon
Resource Requirements
- Hardware Systems
-
- india (IBM iDataPlex at IU)
- bravo (large memory machine at IU)
- delta (GPU Cloud)
FutureGrid will serve as the test bed where we can evaluate the performance of our applications.
Scale of UseAround 20 nodes for few weeks
Project Timeline
- Submitted
- 06/17/2014 - 10:48