Accessing the NICIS CHPC server

This page provide specific information to access the CHPC. Please do read the CHPC Documentation Wiki for more information.

Command Line Access

Before one gains access to the command line, you should have an account. In order to get an account you and your PI should both follow the instructions to apply for resources.

Once your registration has been approved then Linux and OSX users can simply open a terminal and connect via ssh to the server using a command of the form:

localuser@my_linux:~ $ ssh username@lengau.chpc.ac.za
Last login: Mon Feb 29 14:05:35 2016 from 10.128.23.235
username@login1:~ $

where user is the username you are assigned upon registration.

Once connected users can: use the modules system to get access to bioinformatics programs; create job scripts using editors such as vim or nano; and finally submit and monitor their jobs.

Using Modules

The Modules package is a tool that simplifies shell initialization and lets users easily modify their environment during a session using modulefiles. Each modulefile contains the information needed to configure the shell for an application. Once the Modules package is initialized, the environment can be modified on a per-module basis using the module command which interprets modulefiles.

For now a quick and simple way of getting access to the bioinformatics software is using the module function. Running:

username@login2:~ $ module avail

will present you with the various modules available on the system and you should see something like:

------------------------------------------------ /cm/local/modulefiles ------------------------------------------------
cluster-tools/7.1         freeipmi/1.4.8            mvapich2/mlnx/gcc/64/2.1  use.own
cluster-tools-dell/7.1    gcc/5.1.0                 null                      version
cmd                       ipmitool/1.8.15           openldap
cmsh                      module-git                openmpi/mlnx/gcc/64/1.8.8
dot                       module-info               shared

----------------------------------------------- /cm/shared/modulefiles ------------------------------------------------
acml/gcc/64/5.3.1                            chpc/python/anaconda/2
acml/gcc/fma4/5.3.1                          chpc/python/anaconda/3
acml/gcc/mp/64/5.3.1                         chpc/qespresso/5.3.0/openmpi-1.8.8/gcc-5.1.0
acml/gcc/mp/fma4/5.3.1                       chpc/R/3.2.3-gcc5.1.0
acml/gcc-int64/64/5.3.1                      chpc/vasp/5.3/openmpi-1.8.8/gcc-5.1.0
acml/gcc-int64/fma4/5.3.1                    chpc/zlib/1.2.8/intel/16.0.1
acml/gcc-int64/mp/64/5.3.1                   cmgui/7.1
acml/gcc-int64/mp/fma4/5.3.1                 default-environment
acml/open64/64/5.3.1                         gdb/7.9
acml/open64/fma4/5.3.1                       hdf5/1.6.10
acml/open64/mp/64/5.3.1                      hdf5_18/1.8.14
acml/open64/mp/fma4/5.3.1                    hpl/2.1
acml/open64-int64/64/5.3.1                   hwloc/1.9.1
acml/open64-int64/fma4/5.3.1                 intel/compiler/64/15.0/2015.5.223
acml/open64-int64/mp/64/5.3.1                intel-cluster-checker/2.2.2
acml/open64-int64/mp/fma4/5.3.1              intel-cluster-runtime/ia32/3.7
blas/gcc/64/3.5.0                            intel-cluster-runtime/intel64/3.7
blas/open64/64/3.5.0                         intel-cluster-runtime/mic/3.7
bonnie++/1.97.1                              intel-tbb-oss/ia32/43_20150424oss
chpc/amber/12/openmpi-1.8.8/gcc-5.1.0        intel-tbb-oss/intel64/43_20150424oss
chpc/amber/14/openmpi-1.8.8/gcc-5.1.0        iozone/3_430
chpc/BIOMODULES                              iperf/3.0.11
chpc/cp2k/2.6.2/openmpi-1.8.8/gcc-5.1.0      lapack/gcc/64/3.5.0
...

Bioinformatics modules are expected to add a lot to that list and so have a list of their own. Running

username@login2:~ $ module add chpc/BIOMODULES

followed by

username@login2:~ $ module avail

will result in the following being added to the above list (this should be much expanded as various applications are added to the system):

----------------------------------------- /apps/chpc/scripts/modules/bio/app ------------------------------------------
anaconda/2             doxygen/1.8.11         java/1.8.0_73          ncbi-blast/2.3.0/intel R/3.2.3-gcc5.1.0
anaconda/3             git/2.8.1              mpiblast/1.6.0         python/2.7.11          texlive/2015
cmake/3.5.1            htop/2.0.1             ncbi-blast/2.3.0/gcc   python/3.5.1

Now to make use of, blast say, one can type:

username@login2:~ $ module add ncbi-blast/2.3.0/gcc

The appropriate environmental variables are set (usually as simple as adding a directory to the search path). Running:

username@login2:~ $ module list

will show which modules have been loaded. Whereas:

username@login2:~ $ module del modulename

will unload a module. And finally:

username@login2:~ $ module show modulename

will show what module modulename actually does.

Job scheduling on the CHPC server using PBS

The CHPC cluster uses PBSPro as its job scheduler. With the exception of interactive jobs, all jobs are submitted to a batch queuing system and only execute when the requested resources become available. All batch jobs are queued according to priority. A user's priority is not static: the CHPC uses the “Fairshare” facility of PBSPro to modify priority based on activity. This is done to ensure the finite resources of the CHPC cluster are shared fairly amongst all users.

Below is some of the available queues with their nominal parameters are given in the following table. Please take note that these limits may be adjuste dynamically to manage the load on the system.

Queue Name	Max. cores	Min. cores	Max. jobs		Max. time	Notes	Access
Queue Name	per job		in queue	running	hrs	Notes	Access
serial	23	1	24	10	48	For single-node non-parallel jobs.
seriallong	12	1	24	10	144	For very long sub 1-node jobs.
normal	240	25	20	10	48	The standard queue for parallel jobs
large	2400	264	10	5	96	For large parallel runs	Restricted
xlarge	6000	2424	2	1	96	For extra-large parallel runs	Restricted
express	2400	25	N/A	100 total nodes	96	For paid commercial use only	Restricted
bigmem	280	28	4	1	48	For the large memory (1TiB RAM) nodes.	Restricted
gpu_1	10	1		2	12	Up to 10 cpus, 1 GPU

PBS Pro commands

`qstat`	View queued jobs.
`qsub`	Submit a job to the scheduler.
`qdel`	Delete one of your jobs from queue.

Job script parameters

Parameters for any job submission are specified as #PBS comments in the job script file or as options to the qsub command. The essential options for the CHPC cluster include:

 -l select=10:ncpus=24:mpiprocs=24:mem=120gb

sets the size of the job in number of processors:

`select=N`	number of nodes needed.
`ncpus=N`	number of cores per node
`mpiprocs=N`	number of MPI ranks (processes) per node
`mem=Ngb`	amount of ram per node

 -l walltime=4:00:00

sets the total expected wall clock time in hours:minutes:seconds. Note the wall clock limits for each queue.

The job size and wall clock time must be within the limits imposed on the queue used:

 -q normal

to specify the queue.

Each job will draw from the allocation of cpu-hours granted to your Research Programme:

 -P PRJT1234

specifies the project identifier short name, which is needed to identify the Research Programme allocation you will draw from for this job. Ask your PI for the project short name and replace PRJT1234 with it.

Restricted queues

The large and bigmem queues are restricted to users who have need for them. If you are granted access to these queues then you should specify that you are a member of the largeq or bigmemq groups. For example:

#PBS -q large
#PBS -W group_list=largeq

#PBS -q bigmem
#PBS -W group_list=bigmemq

Next one must create a job script such as the one below:

my_job.qsub

#!/bin/bash
#PBS -l select=1:ncpus=2
#PBS -l walltime=10:00:00
#PBS -q serial
#PBS -P SHORTNAME
#PBS -o /mnt/lustre/users/username/my_data/stdout.txt
#PBS -e /mnt/lustre/users/username/my_data/stderr.txt
#PBS -N TophatEcoli
#PBS -M myemailaddress@someplace.com
#PBS -m b
 
module add chpc/BIOMODULES
module add tophat/2.1.1
 
NP=`cat ${PBS_NODEFILE} | wc -l`
 
EXE="tophat"
ARGS="--num-threads ${NP} someindex reads1 reads2 -o output_dir"
 
cd /mnt/lustre/users/username/my_data
${EXE} ${ARGS}

Note that username should be your username and SHORTNAME should be your research programme's code. More details on the job script file can be found in our PBS quickstart guide.

Submit Job Script

Finally submit your job using:

qsub

username@login2:~ $ qsub my_job.qsub
 
192757.sched01
username@login2:~ $

where 192757.sched01 is the jobID that is returned.

Monitor jobs

Jobs can then be monitored/controlled in several ways:

qstat

check status of pending and running jobs

username@login2:~ $ qstat -u username
 
sched01: 
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
192759.sched01  username serial   TophatEcol     --   1  24    --  00:02 Q   -- 
 
username@login2:~ $

check status of particular job

username@login2:~ $ qstat -f 192759.sched01
Job Id: 192759.sched01
    Job_Name = TophatEcoli
    Job_Owner = username@login2.cm.cluster
    resources_used.cpupercent = 0
    resources_used.cput = 00:00:00
    resources_used.mem = 0kb
    resources_used.ncpus = 96
    resources_used.vmem = 0kb
    resources_used.walltime = 00:00:00
    job_state = R
    queue = serial
    server = sched01
    Checkpoint = u
    ctime = Mon Oct 10 06:57:13 2016
    Error_Path = login2.cm.cluster:/mnt/lustre/users/username/my_data/stderr.txt
    exec_host = cnode0962/0*24
    exec_vnode = (cnode0962:ncpus=24)
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Mon Oct 10 06:57:15 2016
    Output_Path = login2.cm.cluster:/mnt/lustre/users/username/my_data/stdout.txt
    Priority = 0
    qtime = Mon Oct 10 06:57:13 2016
    Rerunable = True
    Resource_List.ncpus = 24
    Resource_List.nodect = 1
    Resource_List.place = free
    Resource_List.select = 1:ncpus=24
    Resource_List.walltime = 00:02:00
    stime = Mon Oct 10 06:57:15 2016
    session_id = 36609
    jobdir = /mnt/lustre/users/username
    substate = 42
    Variable_List = PBS_O_SYSTEM=Linux,PBS_O_SHELL=/bin/bash,
        PBS_O_HOME=/home/dane,PBS_O_LOGNAME=username,PBS_O_WORKDIR=/mnt/lustre/users/username/my_data,
        PBS_O_LANG=en_ZA.UTF-8,
        PBS_O_PATH=/apps/chpc/bio/anaconda3/bin:/apps/chpc/bio/R/3.3.1/gcc-6.2
        .0/bin:/apps/chpc/bio/bzip2/1.0.6/bin:/apps/chpc/bio/curl/7.50.0/bin:/a
        pps/chpc/bio/lib/png/1.6.21/bin:/apps/chpc/bio/openmpi/2.0.0/gcc-6.2.0_
        java-1.8.0_73/bin:...
    comment = Job run at Mon Oct 10 at 06:57 on (cnode0962:ncpus=24)+(cnode0966
        :ncpus=24)+(cnode0971:ncpus=24)+(cnode0983:ncpus=24)
    etime = Mon Oct 10 06:57:13 2016
    umask = 22
    run_count = 1
    eligible_time = 00:00:00
    Submit_arguments = my_job.qsub
    pset = rack=cx14
    project = SHORTNAME
 
username@login01:~ $

Cancel jobs

qdel

username@login01:~ $ qdel 192759.sched01
username@login01:~ $

Example interactive job request

To request an interactive session on a single core, the full command for qsub is:

qsub -I -P PROJ0101 -q serial -l select=1:ncpus=1:mpiprocs=1:nodetype=haswell_reg

To request an interactive session on a full node, the full command for qsub is:

qsub -I -P PROJ0101 -q smp -l select=1:ncpus=24:mpiprocs=24:nodetype=haswell_reg

Note:

Please think carefully about whether you really need a full node, or if 1, 2 or 3 cores might be sufficient
-I selects an interactive job
You can add -X to get X-forwarding
you still must specify your project
the queue must be smp, serial or test
interactive jobs only get one node: select=1
for the smp queue you can request several cores: ncpus=24
you can run MPI code: indicate how many ranks you want with mpiprocs=

If you find your interactive session timing out too soon then add -l walltime=4:0:0 to the above command line to request the maximum 4 hours.

Basic examples

bowtie

Things to note about this script – bowtie currently does not run across multiple nodes. So using anything other than select=1 will result in compute resources being wasted.

Job script

Then your job script called bowtie_script.qsub will look something like this:

bowtie_script.qsub

#! /bin/bash
#PBS -l select=1:ncpus=24
#PBS -l place=excl
#PBS -l walltime=06:00:00
#PBS -q workq
#PBS -o /home/username/lustre/some_reads/stdout.txt
#PBS -e /home/username/lustre/some_reads/stderr.txt
#PBS -M youremail@address.com
#PBS -m be
#PBS -N bowtiejob
 
##################
MODULEPATH=/opt/gridware/bioinformatics/modules:$MODULEPATH
source /etc/profile.d/modules.sh
 
#######module add
module add bowtie2/2.2.2
 
NP=`cat ${PBS_NODEFILE} | wc -l`
 
EXE="bowtie2"
 
forward_reads="A_reads1.fq,B_reads_1.fq"
reverse_reads="A_reads1.fq,B_reads_1.fq"
output_file="piggy_hits.sam"
ARGS="sscrofa --shmem --threads ${NP} --sam -q -1 ${forward_reads} -2 ${reverse_reads} ${output_file}"
 
${EXE} ${ARGS}

Note: username should contain your actual user name!

Submit your job

Finally submit your job using:

user@login01:~ $ qsub bowtie_script.qsub