Shell for Bioinformatics
Learning Objectives
- Understand the necessity for, and use of, the command line interface (bash/shell).
Installations
Access to a High Performance Computer server is through logging in to the server via ssh using a utility application called “Terminal” for performing tasks on the command line (shell). Using a "Terminal" is different for the Mac OS and Windows OS.
Mac users:
No installation requirements.
Windows users: will use MobaXterm", a comprehensive remote computing tool designed for Windows.
Instructions for SAIAB students and researchers with access to the SAIAB lab417 cluster
To run through the code in the lessons below, you will need to be logged into lab417 and working on a compute node (i.e. your command prompt should have the word SLURM in it).
ssh username@lab417.saiab.ac.za and enter your password. Alternatively use the following: ssh username@172.20.142.126 and enter your password.$ srun --cpus-per-task=1 -t 0-2:30 --mem 100M --pty /bin/bash
to get on a compute node or as specified in the lesson.
SLURM in it. For example:[SLURM] (base) evilliers@lab417:~$
exit
command twice), please follow points 1. and 2. above to log back in and
get on a compute node when you restart with the self learning.Lessons
Day I
- Introduction to Shell (30 min)
- Wildcards and shortcuts in Shell (30 min)
- Examining and creating files (30 min)
- Searching and redirection (60 min)
- Shell scripts and variables in Shell (60 min)
Day II
- Loops and automation (60 min)
- Permissions and Environment Variables (40 min)
- Introduction to High-performance computing (30 min)
- Job scheduling on High-performance compute servers using SLURM on the SAIAB server. (60 min) For users wishing to use the CHPC server, the PBS job scheduling system and modules are used. More details are found in the CHPC specific lesson.
Resources
Cheat sheets:
- A Critical Guide to Unix
- Slurm_Cheat_Sheet.pdf
- PBS Cheat Sheet.pdf
- Unix/Linux Command Cheat Sheet
- Unix shell cheat sheet
Online tutorials:
- Explain Shell
- Introduction to the Command Line for Genomics
- BASH Programming - Introduction HOW-TO
- Bioinformatics from the Command Line
This lesson has been modified from a course developed by members of the teaching team at the Harvard Chan Bioinformatics Core (HBC). These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.