Investigating the Apache Big Data Stack

Abstract

This project aims investigating Apache's Big Data technologies which can be called as Apache Big Data Stack. These projects are both replaceable and compatitive, and at the same time they can work compatible to each other. 

The steps for this project will be:

1. Deploying a cloud: Several virtual machines will be used in a small cloud preferably OpenStack which will be deployed on FutureGrid. On these virtual machines FutureGrid's platform will be explored by the accessing, managing, snapshotting them etc.

2. Installation of the Apache components: 
After having a small cloud of machines and completing the pre-installations need like JDK,SSH etc. 
--Apache Mesos will be explored by running K-Means algorithm on some sample data.
--Mahout also will be used on top of Hadoop for the purpose of comparison.

3.Results and Future Work
After the tests and small applications a knowledge of Apache's Big Data solutions and FutureGrid platform experience will be gained.  This knowledge will be used in finding  the place of Machine Learning  based Applications in the Big Data and Cloud Computing ecosystem. 

Intellectual Merit

Apache's Big Data stack keeps expanding and we will discover these technologies for the purpose of learning which/where/when to use especially for machine learning applications. This project will help us to see the big picture of Apache's Big Data projects.

Broader Impact

This project can be a guide for the researchers who needs a platform to develop their parallel machine learning programs. Apache has many solutions and machine learning experiments using these solutions will contribute to the scientists who solve problems which consist big data.

Use of FutureGrid

FutureGrid platform is needed for deploying a cloud environment. Using a ready to use platform for this purpose will let us focus on the Apache Big Data technologies instead of spending time and effort for hardware equipment, installation and their management.

Scale Of Use

We will use several VMs for an experiment. VMs will be managed by an IaaS tool.

Publications


FG-389
ibrahim hallac
Firat University
Active

Project Members

Galip Aydin

Timeline

48 weeks 3 days ago