Sierra

UCSD iDataPlex User Manual

  Hostname

• sierra.futuregrid.org

  Login

ssh sierra.futuregrid.org

  Storage

Hardware:  Two Sun Fire x4540 Servers @ 48 TB each.  Specifications for each Sun Fire X4540 Server are: 

• 2 x 6-Core AMD Opteron Model 2435, 2.6 GHz Processors 
• 32 GB (16 x 2 GB DIMMs) Memory 
• 48 x 1 TB 7200 rpm 3.5-Inch SATA Disks

Capacity:  76.8 TB raid2 and 5.4 TB of raid0 (for scratch)

Storage Interconnect:  Currently mounted to cluster over GigE ethernet.  Our long-term plan is to mount over Infiniband. 

Filesystem Type:  ZFS 

Filesystem Layout:

• Home directories mounted to Sierra at /N/u/<username>, snapshots taken nightly, quota set at 50 GB 
• Scratch directories mount to Sierra at /N/scratch/<username>, no backup, quota at 100 GB 
• Project directories, software directory mounted to Sierra as /N/soft, snapshots taken nightly, quota set at 50 GB 
• Image directory (internal), mounts to Sierra at /images, snapshots taken nightly, quota set at 6 TB

Overview of ZFS Data Snapshots

A zfs snapshot is a read-only copy of a Solaris ZFS file system or volume. Snapshots can be created almost instantly and initially consume no additional disk space within the pool. All users on Sierra have access to their ZFS hidden file system at

$HOME/.zfs/

ZFS supports the ability to restore lost files with the standard UNIX copy command. See the example below.

Users are expected to make their own permanent backups of valuable data on the home file system. ZFS Snapshots are NOT permanent backups. Users are currently limited to a quota of 50 GB of snapshots. 

Example of ZFS Data Snapshot Restore Session

$ ls
1G  4G
$ ls .zfs
snapshot/
$ ls .zfs/snapshot
SNAPSHOT2009-06-22-1245668520/  SNAPSHOT2009-06-24-1245841320/
SNAPSHOT2009-06-22-1245668674/  SNAPSHOT2009-06-25-1245927720/
SNAPSHOT2009-06-23-1245754920/  SNAPSHOT2009-06-26-1246014120/
$ rm 1G
$ ls
4G
$ ls .zfs/snapshot/SNAPSHOT2009-06-26-1246014120/
1G  4G
$ cp .zfs/snapshot/SNAPSHOT2009-06-26-1246014120/1G .
$ ls
1G  4G

 Dynamic Provisioning sing Moab

THIS FEATURE IS NOT YET SUPPORTED OFFICIALLY

FutureGrid now supports dynamic provisioning through Moab, and in Sierra, some instructions are listed below:

• The executable tools are installed at /usr/local/bin, and the $PATH should have been all set. So a user could start running from his home directory. 
• Command qnodes will list all the nodes and status. As for now, node s36~s39 is up and running and should be available for test/experiment. 
• checknode s36 will list info on node s36. Os info can be found at line such as:

 Opsys:      statelessrhels5.5  Arch:      x86_64

Now we have statelessrhels5.5 and statefulrhels5 as two options.

• If all the four nodes are running stateless os, submit a command like this:

 msub -l os=statelessrhels5.5 testcmd.sh

will schedule the job in some node.

• showq will list the current queue info, and you could see the submitted job in the active jobs section. 
• Check job running status using command checkjob <jobid>. The resource allocation info could be found in some lines like these:

 Allocated Nodes:
 [s36:1]

In this case the job is scheduled at node s36.

• In the case where all four nodes are running stateless os, submitting a job like this:

 msub -l os=statefulrhels5 testcmd.sh

will try first to dynamically provision the requested os at some node, and then schedule the job.

• showq once again could list the jobs. 
• checkjob <jobid> will first tell you the dependency job provisioning os is not completed.

  NOTE:  job cannot run  (dependency provision-73 jobsuccessfulcomplete not met)

• checkjob provision-68(the provision job id) will list the provisioning status. 
• Once the provisioning is done, checkjob <jobid> will show the job is scheduled, and also allow us to see where is it scheduled, for example s37. 
• By running checknode s37 again, we could see that the running os was changed from statelessrhels5.5 tostatefulrhels5.

• testcmd.sh used in the example:

 $ cat testcmd.sh 
 #!/bin/bash
 /bin/date
 sleep 300
 /bin/date