Accessing the NICIS CHPC server
This page provide specific information to access the CHPC. Please do read the CHPC Documentation Wiki for more information.
Command Line Access
Before one gains access to the command line, you should have an account. In order to get an account you and your PI should both follow the instructions to apply for resources.
Once your registration has been approved then Linux and OSX users can simply open a terminal and connect via ssh to the server using a command of the form:
localuser@my_linux:~ $ ssh username@lengau.chpc.ac.za Last login: Mon Feb 29 14:05:35 2016 from 10.128.23.235 username@login1:~ $
where user is the username you are assigned upon registration.
Once connected users can: use the modules system to get access to bioinformatics programs; create job scripts using editors such as vim or nano; and finally submit and monitor their jobs.
Using Modules
The Modules package is a tool that simplifies shell initialization and lets users easily modify their environment during a session using modulefiles. Each modulefile contains the information needed to configure the shell for an application. Once the Modules package is initialized, the environment can be modified on a per-module basis using the module command which interprets modulefiles.
For now a quick and simple way of getting access to the bioinformatics software is using the module function. Running:
username@login2:~ $ module avail
will present you with the various modules available on the system and you should see something like:
------------------------------------------------ /cm/local/modulefiles ------------------------------------------------ cluster-tools/7.1 freeipmi/1.4.8 mvapich2/mlnx/gcc/64/2.1 use.own cluster-tools-dell/7.1 gcc/5.1.0 null version cmd ipmitool/1.8.15 openldap cmsh module-git openmpi/mlnx/gcc/64/1.8.8 dot module-info shared ----------------------------------------------- /cm/shared/modulefiles ------------------------------------------------ acml/gcc/64/5.3.1 chpc/python/anaconda/2 acml/gcc/fma4/5.3.1 chpc/python/anaconda/3 acml/gcc/mp/64/5.3.1 chpc/qespresso/5.3.0/openmpi-1.8.8/gcc-5.1.0 acml/gcc/mp/fma4/5.3.1 chpc/R/3.2.3-gcc5.1.0 acml/gcc-int64/64/5.3.1 chpc/vasp/5.3/openmpi-1.8.8/gcc-5.1.0 acml/gcc-int64/fma4/5.3.1 chpc/zlib/1.2.8/intel/16.0.1 acml/gcc-int64/mp/64/5.3.1 cmgui/7.1 acml/gcc-int64/mp/fma4/5.3.1 default-environment acml/open64/64/5.3.1 gdb/7.9 acml/open64/fma4/5.3.1 hdf5/1.6.10 acml/open64/mp/64/5.3.1 hdf5_18/1.8.14 acml/open64/mp/fma4/5.3.1 hpl/2.1 acml/open64-int64/64/5.3.1 hwloc/1.9.1 acml/open64-int64/fma4/5.3.1 intel/compiler/64/15.0/2015.5.223 acml/open64-int64/mp/64/5.3.1 intel-cluster-checker/2.2.2 acml/open64-int64/mp/fma4/5.3.1 intel-cluster-runtime/ia32/3.7 blas/gcc/64/3.5.0 intel-cluster-runtime/intel64/3.7 blas/open64/64/3.5.0 intel-cluster-runtime/mic/3.7 bonnie++/1.97.1 intel-tbb-oss/ia32/43_20150424oss chpc/amber/12/openmpi-1.8.8/gcc-5.1.0 intel-tbb-oss/intel64/43_20150424oss chpc/amber/14/openmpi-1.8.8/gcc-5.1.0 iozone/3_430 chpc/BIOMODULES iperf/3.0.11 chpc/cp2k/2.6.2/openmpi-1.8.8/gcc-5.1.0 lapack/gcc/64/3.5.0 ...
Bioinformatics modules are expected to add a lot to that list and so have a list of their own. Running
username@login2:~ $ module add chpc/BIOMODULES
followed by
username@login2:~ $ module avail
will result in the following being added to the above list (this should be much expanded as various applications are added to the system):
----------------------------------------- /apps/chpc/scripts/modules/bio/app ------------------------------------------ anaconda/2 doxygen/1.8.11 java/1.8.0_73 ncbi-blast/2.3.0/intel R/3.2.3-gcc5.1.0 anaconda/3 git/2.8.1 mpiblast/1.6.0 python/2.7.11 texlive/2015 cmake/3.5.1 htop/2.0.1 ncbi-blast/2.3.0/gcc python/3.5.1
Now to make use of, blast say, one can type:
username@login2:~ $ module add ncbi-blast/2.3.0/gcc
The appropriate environmental variables are set (usually as simple as adding a directory to the search path). Running:
username@login2:~ $ module list
will show which modules have been loaded. Whereas:
username@login2:~ $ module del modulename
will unload a module. And finally:
username@login2:~ $ module show modulename
will show what module modulename actually does.
Job scheduling on the CHPC server using PBS
The CHPC cluster uses PBSPro as its job scheduler. With the exception of interactive jobs, all jobs are submitted to a batch queuing system and only execute when the requested resources become available. All batch jobs are queued according to priority. A user's priority is not static: the CHPC uses the “Fairshare” facility of PBSPro to modify priority based on activity. This is done to ensure the finite resources of the CHPC cluster are shared fairly amongst all users.
Below is some of the available queues with their nominal parameters are given in the following table. Please take note that these limits may be adjuste dynamically to manage the load on the system.
| Queue Name | Max. cores | Min. cores | Max. jobs | Max. time | Notes | Access | |
|---|---|---|---|---|---|---|---|
| per job | in queue | running | hrs | ||||
| serial | 23 | 1 | 24 | 10 | 48 | For single-node non-parallel jobs. | |
| seriallong | 12 | 1 | 24 | 10 | 144 | For very long sub 1-node jobs. | |
| normal | 240 | 25 | 20 | 10 | 48 | The standard queue for parallel jobs | |
| large | 2400 | 264 | 10 | 5 | 96 | For large parallel runs | Restricted |
| xlarge | 6000 | 2424 | 2 | 1 | 96 | For extra-large parallel runs | Restricted |
| express | 2400 | 25 | N/A | 100 total nodes | 96 | For paid commercial use only | Restricted |
| bigmem | 280 | 28 | 4 | 1 | 48 | For the large memory (1TiB RAM) nodes. | Restricted |
| gpu_1 | 10 | 1 | 2 | 12 | Up to 10 cpus, 1 GPU | ||
PBS Pro commands
qstat | View queued jobs. |
qsub | Submit a job to the scheduler. |
qdel | Delete one of your jobs from queue. |
Job script parameters
Parameters for any job submission are specified as #PBS comments in the job script file or as options to the qsub command. The essential options for the CHPC cluster include:
-l select=10:ncpus=24:mpiprocs=24:mem=120gb
sets the size of the job in number of processors:
select=N | number of nodes needed. |
ncpus=N | number of cores per node |
mpiprocs=N | number of MPI ranks (processes) per node |
mem=Ngb | amount of ram per node |
-l walltime=4:00:00
sets the total expected wall clock time in hours:minutes:seconds. Note the wall clock limits for each queue.
The job size and wall clock time must be within the limits imposed on the queue used:
-q normal
to specify the queue.
Each job will draw from the allocation of cpu-hours granted to your Research Programme:
-P PRJT1234
specifies the project identifier short name, which is needed to identify the Research Programme allocation you will draw from for this job. Ask your PI for the project short name and replace PRJT1234 with it.
Restricted queues
The large and bigmem queues are restricted to users who have need for them. If you are granted access to these queues then you should specify that you are a member of the largeq or bigmemq groups. For example:
#PBS -q large #PBS -W group_list=largeq
or
#PBS -q bigmem #PBS -W group_list=bigmemq
- my_job.qsub
#!/bin/bash #PBS -l select=1:ncpus=2 #PBS -l walltime=10:00:00 #PBS -q serial #PBS -P SHORTNAME #PBS -o /mnt/lustre/users/username/my_data/stdout.txt #PBS -e /mnt/lustre/users/username/my_data/stderr.txt #PBS -N TophatEcoli #PBS -M myemailaddress@someplace.com #PBS -m b module add chpc/BIOMODULES module add tophat/2.1.1 NP=`cat ${PBS_NODEFILE} | wc -l` EXE="tophat" ARGS="--num-threads ${NP} someindex reads1 reads2 -o output_dir" cd /mnt/lustre/users/username/my_data ${EXE} ${ARGS}
Note that username should be your username and SHORTNAME should be your research programme's code. More details on the job script file can be found in our PBS quickstart guide.
Submit Job Script
Finally submit your job using:
qsub
username@login2:~ $ qsub my_job.qsub 192757.sched01 username@login2:~ $
where 192757.sched01 is the jobID that is returned.
Monitor jobs
Jobs can then be monitored/controlled in several ways:
qstat
check status of pending and running jobs
username@login2:~ $ qstat -u username sched01: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 192759.sched01 username serial TophatEcol -- 1 24 -- 00:02 Q -- username@login2:~ $
check status of particular job
username@login2:~ $ qstat -f 192759.sched01 Job Id: 192759.sched01 Job_Name = TophatEcoli Job_Owner = username@login2.cm.cluster resources_used.cpupercent = 0 resources_used.cput = 00:00:00 resources_used.mem = 0kb resources_used.ncpus = 96 resources_used.vmem = 0kb resources_used.walltime = 00:00:00 job_state = R queue = serial server = sched01 Checkpoint = u ctime = Mon Oct 10 06:57:13 2016 Error_Path = login2.cm.cluster:/mnt/lustre/users/username/my_data/stderr.txt exec_host = cnode0962/0*24 exec_vnode = (cnode0962:ncpus=24) Hold_Types = n Join_Path = n Keep_Files = n Mail_Points = a mtime = Mon Oct 10 06:57:15 2016 Output_Path = login2.cm.cluster:/mnt/lustre/users/username/my_data/stdout.txt Priority = 0 qtime = Mon Oct 10 06:57:13 2016 Rerunable = True Resource_List.ncpus = 24 Resource_List.nodect = 1 Resource_List.place = free Resource_List.select = 1:ncpus=24 Resource_List.walltime = 00:02:00 stime = Mon Oct 10 06:57:15 2016 session_id = 36609 jobdir = /mnt/lustre/users/username substate = 42 Variable_List = PBS_O_SYSTEM=Linux,PBS_O_SHELL=/bin/bash, PBS_O_HOME=/home/dane,PBS_O_LOGNAME=username,PBS_O_WORKDIR=/mnt/lustre/users/username/my_data, PBS_O_LANG=en_ZA.UTF-8, PBS_O_PATH=/apps/chpc/bio/anaconda3/bin:/apps/chpc/bio/R/3.3.1/gcc-6.2 .0/bin:/apps/chpc/bio/bzip2/1.0.6/bin:/apps/chpc/bio/curl/7.50.0/bin:/a pps/chpc/bio/lib/png/1.6.21/bin:/apps/chpc/bio/openmpi/2.0.0/gcc-6.2.0_ java-1.8.0_73/bin:... comment = Job run at Mon Oct 10 at 06:57 on (cnode0962:ncpus=24)+(cnode0966 :ncpus=24)+(cnode0971:ncpus=24)+(cnode0983:ncpus=24) etime = Mon Oct 10 06:57:13 2016 umask = 22 run_count = 1 eligible_time = 00:00:00 Submit_arguments = my_job.qsub pset = rack=cx14 project = SHORTNAME username@login01:~ $
Cancel jobs
qdel
username@login01:~ $ qdel 192759.sched01 username@login01:~ $
Example interactive job request
To request an interactive session on a single core, the full command for qsub is:
qsub -I -P PROJ0101 -q serial -l select=1:ncpus=1:mpiprocs=1:nodetype=haswell_reg
To request an interactive session on a full node, the full command for qsub is:
qsub -I -P PROJ0101 -q smp -l select=1:ncpus=24:mpiprocs=24:nodetype=haswell_reg
Note:
- Please think carefully about whether you really need a full node, or if 1, 2 or 3 cores might be sufficient
-Iselects an interactive job- You can add
-Xto get X-forwarding - you still must specify your project
- the queue must be
smp,serialortest - interactive jobs only get one node:
select=1 - for the
smpqueue you can request several cores:ncpus=24 - you can run MPI code: indicate how many ranks you want with
mpiprocs=
If you find your interactive session timing out too soon then add -l walltime=4:0:0 to the above command line to request the maximum 4 hours.
Basic examples
bowtie
Things to note about this script – bowtie currently does not run across multiple nodes. So using anything other than select=1 will result in compute resources being wasted.
Job script
Then your job script called bowtie_script.qsub will look something like this:
- bowtie_script.qsub
#! /bin/bash #PBS -l select=1:ncpus=24 #PBS -l place=excl #PBS -l walltime=06:00:00 #PBS -q workq #PBS -o /home/username/lustre/some_reads/stdout.txt #PBS -e /home/username/lustre/some_reads/stderr.txt #PBS -M youremail@address.com #PBS -m be #PBS -N bowtiejob ################## MODULEPATH=/opt/gridware/bioinformatics/modules:$MODULEPATH source /etc/profile.d/modules.sh #######module add module add bowtie2/2.2.2 NP=`cat ${PBS_NODEFILE} | wc -l` EXE="bowtie2" forward_reads="A_reads1.fq,B_reads_1.fq" reverse_reads="A_reads1.fq,B_reads_1.fq" output_file="piggy_hits.sam" ARGS="sscrofa --shmem --threads ${NP} --sam -q -1 ${forward_reads} -2 ${reverse_reads} ${output_file}" ${EXE} ${ARGS}
Note: username should contain your actual user name!
Submit your job
Finally submit your job using:
user@login01:~ $ qsub bowtie_script.qsub