Investigating the Apache Big Data Stack
Abstract
This project aims investigating Apache's Big Data technologies which can be called as Apache Big Data Stack. These projects are both replaceable and compatitive, and at the same time they can work compatible to each other.
The steps for this project will be:
1. Deploying a cloud: Several virtual machines will be used in a small cloud preferably OpenStack which will be deployed on FutureGrid. On these virtual machines FutureGrid's platform will be explored by the accessing, managing, snapshotting them etc.
2. Installation of the Apache components:
After having a small cloud of machines and completing the pre-installations need like JDK,SSH etc.
--Apache Mesos will be explored by running K-Means algorithm on some sample data.
--Mahout also will be used on top of Hadoop for the purpose of comparison.
3.Results and Future Work
After the tests and small applications a knowledge of Apache's Big Data solutions and FutureGrid platform experience will be gained. This knowledge will be used in finding the place of Machine Learning based Applications in the Big Data and Cloud Computing ecosystem.
Intellectual Merit
Apache's Big Data stack keeps expanding and we will discover these technologies for the purpose of learning which/where/when to use especially for machine learning applications. This project will help us to see the big picture of Apache's Big Data projects.
Broader Impact
This project can be a guide for the researchers who needs a platform to develop their parallel machine learning programs. Apache has many solutions and machine learning experiments using these solutions will contribute to the scientists who solve problems which consist big data.
Use of FutureGrid
FutureGrid platform is needed for deploying a cloud environment. Using a ready to use platform for this purpose will let us focus on the Apache Big Data technologies instead of spending time and effort for hardware equipment, installation and their management.
Scale Of Use
We will use several VMs for an experiment. VMs will be managed by an IaaS tool.