One-click Hadoop WordCount on Eucalyptus FutureGrid

I. Introduction

This tutorial shows how to run a one-click Hadoop WordCount job on the Eucalyptus platform of FutureGrid.

II. Prerequisite

1. FutureGrid HPC account: please apply via FutureGrid portal and request a HPC account.
2. FutureGrid Eucalyptus account: please see FutureGrid Eucalyptus Tutorial for detailed instructions.
3. FutureGrid Eucalyptus credentials zip file (euca2-[username]-x509.zip) stored under user's home directory
4. Key pair created and added for use with Eucalyptus virtual machines

The following sections assume a user has created both an HPC account and a Eucalpytus account under the username of gaoxm.

III. Login to india.futuregrid.org

[gaoxm@129-79-49-98 ~]$ ssh -i .ssh/id_rsa_fg india.futuregrid.org                      
Enter passphrase for key '.ssh/id_rsa_fg':
Last login: Sat May  5 02:17:33 2012 from c-71-194-153-252.hsd1.in.comcast.net
...
torque/2.5.5 version 2.5.5 loaded
moab version 5.4.0 loaded
euca2ools version 1.2 loaded
[gaoxm@i136 ~]$ cd eucalyptus/
[gaoxm@i136 eucalyptus]$ ls
cloud-cert.pem                 euca2-gaoxm-d108375b-pk.pem  eucarc         hosts        nodes
euca2-gaoxm-d108375b-cert.pem  euca2-gaoxm-x509.zip         gaoxm.private  jssecacerts  tmp.out


IV. Download and unzip the “hadoopOneClick.zip” package

[gaoxm@i136 test]$ wget http://mypage.iu.edu/~gao4/data/hadoopOneClick.zip
...
[gaoxm@i136 test]$ ls
hadoopOneClick.zip
[gaoxm@i136 test]$ unzip hadoopOneClick.zip


V. Run hadoop-one-click.sh

[gaoxm@i136 test]$ cd hadoopOneClick
[gaoxm@i136 hadoopOneClick]$ ls
deploy-hadoop.sh     instanceIds.txt  publicIps.txt            stop-hadoop.sh
hadoop-one-click.sh  ipHosts.txt      run-hadoop-wordcount.sh  terminate-instances.sh
hosts                nodes.txt        start-instances.sh
[gaoxm@i136 hadoopOneClick]$ chmod +x *.sh
[gaoxm@i136 hadoopOneClick]$ ./hadoop-one-click.sh -n 2 -t m1.small -i emi-D778156D -k gaoxm -p ~/eucalyptus/gaoxm.private -l http://mypage.iu.edu/~gao4/data/grexp10.txt -s http://salsahpc.indiana.edu/tutorial/apps/hadoop-0.20.203.0-for-EucaVm.tar.gz


This will run a MapReduce word-count job on a dynamically created virtual Hadoop cluster on FutureGrid. The user needs to replace the –k and –p parameter values with his/her key-pair name and private key path. For detailed usage information, try

[gaoxm@i136 hadoopOneClick]$ ./hadoop-one-click.sh -h

VI. Verify output

[gaoxm@i136 hadoopOneClick]$ ls outputs/
_logs  part-r-00000  _SUCCESS
[gaoxm@i136 hadoopOneClick]$ vim outputs/part-r-00000


VII. Extensions

To run other MapReduce jobs, replace run-hadoop-wordcount.sh with new scripts, and change hadoop-one-click.sh to call the corresponding scripts.