Xray

IU Cray User Manual

Hostname

• xray.futuregrid.org

Login

ssh xray.futuregrid.org

Filesystem

Compiler

For MPI jobs, use cc (pgcc).

For best performance, add the xtpe-barcelona module

% module add xtpe-module

Cray Programming Environment Manuals

Queue

Currently there is only one queue (batch) available to users on the Cray, and all jobs are automatically routed to that queue.

Listing Queues on Xray

 qstat -Q

The primary queue for running jobs on Xray is batch. To obtain details of running jobs and available processors, use the showq command.

/opt/moab/default/bin/showq

Submitting a job

MPI run cmd: aprun

Example job script (16 processors / 2 nodes):

% cat job.sub

#!/bin/sh
#PBS -l mppwidth=16 
#PBS -l mppnppn=8 
#PBS -N hpcc-16 
#PBS -j oe 
#PBS -l walltime=7:00:00 
#cd to directory where job was submitted from 
cd $PBS_O_WORKDIR 
export MPICH_FAST_MEMCPY=1 
export MPICH_PTL_MATCH_OFF=1 
aprun -n 16 -N 8 -ss -cc cpu hpcc
% qsub job.sub

Looking at the Queue

% qstat

How Do I Submit a Job to the Cray XT5m on FutureGrid?

http://kb.iu.edu/data/azse.html

The XT5m is a 2D mesh of nodes. Each node has two sockets, and each socket has four cores.

The batch scheduler interfaces with a Cray resource scheduler called APLS. When you submit a job, the batch scheduler talks to ALPS to find out what resources are available, and ALPS then makes the reservation.

Currently ALPS is a "gang scheduler" and only allows one "job" per node. If a user submits a job in the format aprun -n 1 a.out , ALPS will put that job on one core of one node and leave the other seven cores empty. When the next job comes in, either from the same user or a different one, it will schedule that job to the next node.

If the user submits a job with aprun -n 10 a.out , then the scheduler will put the first eight tasks on the first node and the next two tasks on the second node, again leaving six empty cores on the second node. The user can modify the placement with -N , -S , and -cc .

A user might also run a single job with multiple treads, as with OpenMP. If a user runs this job aprun -n 1 -d 8 a.out , the job will be scheduled to one node and have eight threads running, one on each core.

You can run multiple, different binaries at the same time on the same node, but only from one submission. Submitting a script like this will not work:

OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 0 ./my-binary
OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 1 ./my-binary
OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 2 ./my-binary
OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 3 ./my-binary
OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 4 ./my-binary
OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 5 ./my-binary
OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 6 ./my-binary
OMP_NUM_THREADS=1 aprun -n 1 -d 1 -cc 7 ./my-binary

This will run a job on each core, but not at the same time. To run all jobs at the same time, you need to first bury all the binaries under one aprun command:

$ more run.sh
./my-binary1
./my-binary2
./my-binary3
./my-binary4
./my-binary5
./my-binary6
./my-binary7
./my-binary8
$ aprun -n 1 run.sh

Alternatively, use the command aprun -n 1 -d 8 run.sh . To run multiple serial jobs, you must build a batch script to divide the number of jobs into groups of eight, and the