Xray
IU Cray User Manual
Hostname
• xray.futuregrid.org
Login
ssh xray.futuregrid.org
Filesystem
Compiler
For MPI jobs, use cc (pgcc).
For best performance, add the xtpe-barcelona module
% module add xtpe-module
Cray Programming Environment Manuals
- http://docs.cray.com/cgi-bin/craydoc.cgi?q=&mode=Search&hw=%22Cray+XT5%22
- http://docs.cray.com/cgi-bin/craydoc.cgi?mode=View;id=S-2396-21
Queue
Currently there is only one queue (batch) available to users on the Cray, and all jobs are automatically routed to that queue.
Listing Queues on Xray
qstat -Q
The primary queue for running jobs on Xray is batch. To obtain details of running jobs and available processors, use the showq command.
/opt/moab/default/bin/showq
Submitting a job
MPI run cmd: aprun
Example job script (16 processors / 2 nodes):
% cat job.sub
Looking at the Queue#!/bin/sh #PBS -l mppwidth=16 #PBS -l mppnppn=8 #PBS -N hpcc-16 #PBS -j oe #PBS -l walltime=7:00:00 #cd to directory where job was submitted from cd $PBS_O_WORKDIR export MPICH_FAST_MEMCPY=1 export MPICH_PTL_MATCH_OFF=1 aprun -n 16 -N 8 -ss -cc cpu hpcc % qsub job.sub
% qstat
How Do I Submit a Job to the Cray XT5m on FutureGrid?
http://kb.iu.edu/data/azse.html
The XT5m is a 2D mesh of nodes. Each node has two sockets, and each socket has four cores.
The batch scheduler interfaces with a Cray resource scheduler called APLS. When you submit a job, the batch scheduler talks to ALPS to find out what resources are available, and ALPS then makes the reservation.
Currently ALPS is a "gang scheduler" and only allows one "job" per node. If a user submits a job in the format aprun -n 1 a.out , ALPS will put that job on one core of one node and leave the other seven cores empty. When the next job comes in, either from the same user or a different one, it will schedule that job to the next node.
If the user submits a job with aprun -n 10 a.out , then the scheduler will put the first eight tasks on the first node and the next two tasks on the second node, again leaving six empty cores on the second node. The user can modify the placement with -N , -S , and -cc .
A user might also run a single job with multiple treads, as with OpenMP. If a user runs this job aprun -n 1 -d 8 a.out , the job will be scheduled to one node and have eight threads running, one on each core.
You can run multiple, different binaries at the same time on the same node, but only from one submission. Submitting a script like this will not work:
OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 0 ./my-binary OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 1 ./my-binary OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 2 ./my-binary OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 3 ./my-binary OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 4 ./my-binary OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 5 ./my-binary OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 6 ./my-binary OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 7 ./my-binary
This will run a job on each core, but not at the same time. To run all jobs at the same time, you need to first bury all the binaries under one aprun command:
$ more run.sh ./my-binary1 ./my-binary2 ./my-binary3 ./my-binary4 ./my-binary5 ./my-binary6 ./my-binary7 ./my-binary8 $ aprun -n 1 run.sh
Alternatively, use the command aprun -n 1 -d 8 run.sh . To run multiple serial jobs, you must build a batch script to divide the number of jobs into groups of eight, and the