MPI Java Performance Evaluation

Project Information

Discipline
Computer Science (401) 
Orientation
Research 
Abstract

In the last few years, Java gain popularity in processing “big data” mostly with Apache big data stack – a collection of open source frameworks dealing with abundant data, which includes several popular systems such as Hadoop, Hadoop Distributed File System (HDFS), and Spark. Efforts have been made to introduce Java to High Performance Computing (HPC) as well in the past, but were not embraced by the community due to performance concerns. However, with continuous improvements in Java performance an increasing interest has been placed on Java message passing support in HPC. We support this idea and show its feasibility in solving real world data analytics problems. This includes performance evaluation of two MPI Java frameworks - OpenMPI and FastMPJ - for real life machine learning problems.

Intellectual Merit

Our analysis will serve as proof that large scale data analytic problems are efficiently solvable using Java ecosystem while benefiting the parallel capabilities of MPI

Broader Impacts

If we get positive results as expected in this study then we'll be able to take our algorithms to the outside scientific community giving them the opportunity to efficiently analyse their data. The algorithms contain a suite of data clustering and multi-dimensional scaling implementations.

Project Contact

Project Lead
Saliya Ekanayake (sekanaya) 
Project Manager
Saliya Ekanayake (sekanaya) 
Project Members
Nigel Pugh, Tori Wilbon  

Resource Requirements

Hardware Systems
  • india (IBM iDataPlex at IU)
  • bravo (large memory machine at IU)
  • delta (GPU Cloud)
 
Use of FutureGrid

FutureGrid will serve as the test bed where we can evaluate the performance of our applications.

Scale of Use

Around 20 nodes for few weeks

Project Timeline

Submitted
06/17/2014 - 10:48