One-click Twister K-means on Eucalyptus FutureGrid

I. Introduction

This tutorial shows how to run a one-click Twister K-means job on the Eucalyptus platform of FutureGrid.

II. Prerequisite

1. FutureGrid HPC account, please apply via FutureGrid portal and request a HPC account.
2. FutureGrid Eucalyptus account, please see FutureGrid Eucalyptus Tutorial for detailed instructions.
3. FutureGrid Eucalyptus credentials zip file (euca2-[username]-x509.zip) stored under user's home directory.
4. Key pair created and added for use with Eucalyptus virtual machines.

The following sections assume a user has created both HPC account and Eucalpytus account under the username of “gaoxm”.

III. Login to india.futuregrid.org

[gaoxm@129-79-49-98 ~]$ ssh -i .ssh/id_rsa_fg india.futuregrid.org                      
Enter passphrase for key '.ssh/id_rsa_fg':
Last login: Sat May  5 02:17:33 2012 from c-71-194-153-252.hsd1.in.comcast.net
...
torque/2.5.5 version 2.5.5 loaded
moab version 5.4.0 loaded
euca2ools version 1.2 loaded
[gaoxm@i136 ~]$ cd eucalyptus/
[gaoxm@i136 eucalyptus]$ ls
cloud-cert.pem                 euca2-gaoxm-d108375b-pk.pem  eucarc         hosts        nodes
euca2-gaoxm-d108375b-cert.pem  euca2-gaoxm-x509.zip         gaoxm.private  jssecacerts  tmp.out


IV. Download and unzip the “twisterOneClick.zip” package

[gaoxm@i136 test]$ wget http://mypage.iu.edu/~gao4/data/twisterOneClick.zip
...
[gaoxm@i136 test]$ ls
hadoopOneClick  hadoopOneClick.zip  twisterOneClick.zip

[gaoxm@i136 test]$ unzip twisterOneClick.zip

V. Run twister-one-click.sh

[gaoxm@i136 test]$ cd twisterOneClick
[gaoxm@i136 twisterOneClick]$ ls
deploy-twister.sh  instanceIds.txt  publicIps.txt          stop-twister.sh
hostnames.txt      ipHosts.txt      run-twister-kmeans.sh  terminate-instances.sh
hosts              nodes.txt        start-instances.sh     twister-one-click.sh
[gaoxm@i136 twisterOneClick]$ chmod +x *.sh
[gaoxm@i136 twisterOneClick]$ ./twister-one-click.sh -n 2 -t m1.small -i emi-D778156D -k gaoxm -p ~/eucalyptus/gaoxm.private -lhttp://salsahpc.indiana.edu/tutorial/apps/Twister-0.9.tar.gz -a http://www.iterativemapreduce.org/apache-activemq-5.4.2-bin.tar.gz


This will run a MapReduce K-means job on a dynamically created virtual Twister cluster on  FutureGrid. The user needs to replace the “–k” and “–p” parameter values with his/her key-pair name and private key path. For detailed usage information, try

[gaoxm@i136 twisterOneClick]$ ./twister-one-click.sh -h

VI. Verify results in the standard output of the scripts

Calling run_kmeans.sh on 149.165.159.140...
JobID: kmeans-map-reduce9ec9eaa2-9731-11e1-80d7-156f25bd362a
May 6, 2012 4:11:57 AM org.apache.activemq.transport.failover.FailoverTransport doReconnect
INFO: Successfully connected to tcp://master:61616
0    [main] INFO  cgl.imr.client.TwisterDriver  - Configure Mappers through the partition file, please wait....
1975 [main] INFO  cgl.imr.client.TwisterDriver  - Configuring Mappers through the partition file is completed.
250.77056136584878 , 125.15021341387315 , 249.21561041359857 ,
246.74715176402833 , 375.350251646343 , 249.17570173022511 ,
Total Time for kemeans : 6.808
Total loop count : 15
6260 [main] INFO  cgl.imr.client.TwisterDriver  - MapReduce computation termintated gracefully.
------------------------------------------------------
Kmeans clustering took 6.841 seconds.
------------------------------------------------------


VII. Extensions

To run other iterative MapReduce jobs, replace run-twister-kmeans.sh with new scripts, and change twister-one-click.sh to call the corresponding scripts.