MPI Java Performance Evaluation

Abstract

In the last few years, Java gain popularity in processing “big data” mostly with Apache big data stack – a collection of open source frameworks dealing with abundant data, which includes several popular systems such as Hadoop, Hadoop Distributed File System (HDFS), and Spark. Efforts have been made to introduce Java to High Performance Computing (HPC) as well in the past, but were not embraced by the community due to performance concerns. However, with continuous improvements in Java performance an increasing interest has been placed on Java message passing support in HPC. We support this idea and show its feasibility in solving real world data analytics problems. This includes performance evaluation of two MPI Java frameworks - OpenMPI and FastMPJ - for real life machine learning problems.

Intellectual Merit

Our analysis will serve as proof that large scale data analytic problems are efficiently solvable using Java ecosystem while benefiting the parallel capabilities of MPI

Broader Impact

If we get positive results as expected in this study then we'll be able to take our algorithms to the outside scientific community giving them the opportunity to efficiently analyse their data. The algorithms contain a suite of data clustering and multi-dimensional scaling implementations.

Use of FutureGrid

FutureGrid will serve as the test bed where we can evaluate the performance of our applications.

Scale Of Use

Around 20 nodes for few weeks

Publications


FG-442
Saliya Ekanayake
Indiana University
Active

Project Members

Nigel Pugh
Tori Wilbon

Keywords

Timeline

16 weeks 6 days ago