Virtual Globus
Globus Provision on FutureGrid
Globus provides a provisioning Framework that allows to run Globus on Clouds.
If you have any questions, feel free to contact the FutureGrid at http://archive.futuregrid.org/help.
This documentation was prepared by Andrew Younge and includes much information from the Globus Provisioning project conducted by Borja Sotomajor and the Globus team.
What is Globus Provision?
Globus Provision is a tool for deploying fully-configured Globus systems on Cloud IaaS. For example, you could use Globus Provision to deploy a cluster on FutureGrid with various Globus services, and a Condor pool, installed on it:
Globus Provision would also take care of setting up a shared NFS filesystem and an NIS authentication domain, and creating user accounts on the cluster. X.509 certificates for the hosts and users are also created automatically. Globus Provision is designed to be easy to use, so deploying the above cluster would basically involve writing the following file:
[domain-mycluster] users: alice bob carol dave nfs-nis: yes gridftp: yes gram: yes condor: yes condor-nodes: 3
Next, you would just have to run a few simple commands to instruct Globus Provision to deploy this cluster on a FutureGrid cloud. Once the cluster is running, Globus Provision allows you to dynamically add and remove hosts (e.g., to increase the size of the Condor pool) and to add/remove user accounts. You can also shut down your cluster while not in use (to avoid paying Amazon EC2 for resources that are just idling), and resume it at a later time.
Terminology
Topology
Earlier, we said that Globus Provision can deploy fully-configured “Globus systems”. Throughout the documentation, we’ll refer to the specification of such a system as a topology. For example, the “My Cluster” topology includes a GridFTP server, a Condor cluster, four users, etc.
Domains
A topology can be composed of one or more domains. The “My Cluster” topology has a single domain, whereas the “Mini-clusters” topology has multiple domains (one for each mini-cluster). The main distinguishing characteristic of a domain is that it has its own set of users, and that only the users in a given domain will be able to access the resources in that domain.
Host (or Node)
A domain contains one or more hosts (which we will occasionally refer to as nodes). Take into account that, when using the simple configuration file format shown earlier, you don’t specify the domain’s hosts directly (for example, we specified options gram: yes and condor: yes, and Globus Provision allocated a single host for both). Globus Provision’s lower-level interface does allow you to specify each individual host, and what software and services must run in each of them.
Instance
When a topology is actually deployed, we refer to it as a Globus Provision instance. You can think of the topology as the specification of what you want to run, and an instance as the actual running system. Notice how the commands shown above perform operations on your instance: starting it, adding a new host to it, etc.
Defining a simple topology
Globus Provision allows you to deploy fully-configured Globus systems “in the cloud” (more specifically, we will be deploying them on Amazon EC2). The system you deploy could be something as simple as a single GridFTP server, or something as complicated as 20 Condor clusters, each with a GRAM and MyProxy server (e.g., if you were teaching a workshop, and wanted to give each student access to their own cluster to play with).
In Globus Provision lingo, the specification of such a system is called a topology. As you’ll see in the following chapters, Globus Provision allows you to define many aspects of a topology: whether it should have a shared filesystem, what users should be created, whether those users must have X.509 certificates, what software should be installed on each machine, etc.
However, we’ll start with the simplest possible topology: a single GridFTP server with two users (user1 and user2) that are authorized to access that server. We can specify this topology using the following file:
[general] deploy: ec2 domains: simple [domain-simple] users: user1 user2 gridftp: yes
This is an example of Globus Provision’s simple topology file format. It provides a simple and high-level format for specifying topologies. In this particular case, we are doing the following:
We are specifying a single domain called simple. A topology can be divided into multiple domains, each with its own set of users, Globus services, etc. For example, if we wanted to deploy 20 separate Condor clusters (each with its own GRAM and MyProxy server), we would need to define 20 separate domains.
The simple domain is configured to have two users (user1 and user2) and a GridFTP server. The simple topology format is a good way of getting started, but it can be too constrained for more complex topologies. As you’ll see in the following chapters, Globus Provision also provides a much richer and versatile JSON format for specifying topologies (in fact, the simple topology format gets translated internally into the JSON format).
Running Globus on FutureGrid
The topology file shown above specifies that the topology must be deployed using FutureGrid's Eucalyptus cloud (deploy: ec2), so we need to provide some euca parameters that will allow Globus Provision to use your FutureGrid Eucalyptus account to deploy this topology. If you do not already have a Eucalyptus account, please go to [[1]]. More specifically, you will need an Access Key ID and Secret Access Key and an SSH Keypair. We suggest that you create a keypair called gp-key, and save the keypair file as ~/.euca/gp-key.pem, since many of the sample files assume that naming.
You will need to add the following to the topology file:
[ec2] keypair: gp-key keyfile: ~/.euca/gp-key.pem username: ubuntu ami: emi-87A41434 instance-type: c1.medium availability_zone = india server-hostname = 149.165.146.135 server-port = 8773 server-path = /services/Eucalyptus
Notice how we are telling Globus Provision to use your gp-key keypair, to use c1.medium instances , and to use Globus Provision’s latest 32-bit EMI, along with the necessary pointers to the Eucalyptus Cloud on india. This is an Ubuntu machine image that we provide with many software packages preinstalled, which considerably speeds up the deployment of topologies.
Finally, save the entire topology file (with the [general] and [domain-simple] sections shown earlier and the [ec2] section shown above) as single-gridftp-ec2.conf. You will also need to export your Access Key ID and Secret Key as environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, respectively. For example:
export AWS_ACCESS_KEY_ID=FOOBAR123FOOBAR123 export AWS_SECRET_ACCESS_KEY=FoOBaZ123/FoOBaZ456FoOBaZ789FoOBaZ012FoOBaZ345
If you have your eucalyptus environment properly setup , you may just need to source your ~/.euca/eucarc file in order to ensure this is properly setup.
Creating and launching a Globus Provision instance
Ok, we’re ready to actually launch this topology. The first step is to create a Globus Provision instance with that topology:
gp-instance-create -c single-gridftp-ec2.conf
This should immediately return the following:
Created new instance: gpi-52d4c9ec
The gp-instance-create command doesn’t actually deploy the topology, but simply validates that the topology is correct, and creates an entry for it in a database. This entry is called an instance. You can think of the topology as a specification of what you want to deploy and the instance as one particular deployment of that topology.
To actually launch this instance, we use the gp-instance-start command (make sure you use the identifier returned by gp-instance-create, not the one used in these examples):
gp-instance-start gpi-52d4c9ec
This command will take a few minutes to do its job and, for a while, all you will see is the following:
Starting instance gpi-52d4c9ec...
In a separate console, you can track the progress of the deployment using this command:
gp-instance-describe gpi-52d4c9ec
You should first see something like this:
gpi-52d4c9ec: Starting
Domain 'simple' simple-gridftp Starting
This command is telling us not just the status of the entire instance (Starting) but also of each individual host in the topology’s domains. In this case, Globus Provision “translated” our topology into a single host called simple-gridftp.
After a while, the output of gp-instance-describe will look like this:
gpi-52d4c9ec: Configuring Domain 'simple' simple-gridftp Configuring 149.165.XX.XX 10.128.XX.XX
At this point, the simple-gridftp host has started, and Globus Provision is in the process of configuring it. Since the host has started, we now know what its actual hostname is. We will use this later to connect to that host.
When gp-instance-start finishes deploying the instance, it will show the following:
Starting instance gpi-52d4c9ec... done! Started instance in 1 minutes and 22 seconds
And gp-instance-describe will look like this:
gpi-52d4c9ec: Running
Domain 'simple' simple-gridftp Running 149.165.XX.XX 10.128.XX.XX
Now that the instance is running, we are going to connect to the GridFTP server host as one of the users we defined in the topology. When using the simple topology file, your public SSH key will be authorized by default in all the users (in fact, their passwords will be disabled, and using an SSH key will be the only way of logging into the hosts).
So, you should be able to log into the GridFTP host like this (make sure you substitute the hostname with the one returned by gp-instance-describe):
ssh user1@149.165.XX.XX
Once you’ve logged in, you will actually be able to play around with some Globus tools. By default, Globus Provision will create user certificates for all users, which means you should be able to create a proxy certificate by running the following:
grid-proxy-init
You should see the following output:
Your identity: /O=Grid/OU=Globus Provision (generated)/CN=user1 Creating proxy ..................................................................... Done Your proxy is valid until: Wed Aug 17 04:30:07 2011
Next, you can try doing a simple GridFTP transfer:
globus-url-copy gsiftp://`hostname --fqdn`/etc/hostname ./
Once you’re done, just log out of the host, and terminate your instance like this:
gp-instance-terminate gpi-52d4c9ec
You will see the following:
Terminating instance gpi-52d4c9ec... done!