Using SalsaHadoop on FutureGrid
PLEASE NOTE: THIS MANUAL PAGE IS A DRAFT, PLEASE PROVIDE FEEDBACK IN THE COMMENT SECTION.
SalsaHadoop Introduction
Apache Hadoop is widely used by domain scientists for running their scientific applications in parallel fashion. For our research convenience, SalsaHPC research group is developing SalsaHadoop, an automatic method to start Hadoop without worrying the Hadoop configuration. SalsaHadoop can be running on any general cluster, and on multiple machines. It has been used by SalsaHPC research group and a graduate-level course CSCI B649 Cloud Computing for Data Intensive Sciences.Running SalsaHadoop on FutureGrid
SalsaHadoop can be run in various modes within FG, in either FutureGrid HPC or FutureGrid Cloud/IaaS environments. The following tutorials provide step-by-step instructions on using SalsaHadoop on these modes, and also show some examples of running Hadoop applications after starting Hadoop. In general, the HPC environment is easier if you do not have experience with IaaS Eucalyptus.-
SalsaHadoop on FutureGrid
- SalsaHadoop with FutureGrid HPC [recommended]
-
SalsaHadoop with FutureGrid Cloud Eucalyptus
- Get VM compute nodes
- Hadoop Configuration (same as above with different masters and slaves hostname)
- Verify Hadoop HDFS and MapReduce Daemon status
- Run SalsaHadoop Applications
- Run Hadoop with static FutureGrid-Bravo HDFS*