Wrap-up and Q&A
Learning objectives
- Review the skills covered across the two days
- Understand the core reproducibility habits to carry into real analyses
- Know where to go next — including the ONT genome assembly course
Approximate time: 20 minutes
What you have learned
Over the two days you have built a complete foundation for working on the lab417 cluster:
Day 1 — Unix fundamentals & shell scripting
- Navigating the file system and working with files (
cd,ls,cp,mv,mkdir,rm) - Wildcards and shortcuts for selecting files quickly
- Viewing and creating files (
cat,less,head,tail,nano) - File permissions and environment variables (
chmod,$PATH,.bashrc) - Searching and redirection (
grep,|,>,cut,sort,awk) - Writing shell scripts with variables and
forloops for automation
Day 2 — HPC, data handling & Conda
- HPC architecture and submitting jobs with SLURM (
srun,sbatch,squeue,#SBATCHdirectives) - Moving data on and off the server with
scpandrsync - Working with compressed files (
gzip,zcat,tar) and keeping jobs alive withscreen - Managing software with Conda — environments, installing tools from Bioconda, exporting to YAML
- A real ONT QC practical with
nanoqandseqkit
Reproducibility — the habits that matter
The single most important thing to take away is reproducibility. A few simple habits make your work trustworthy and repeatable:
- Pin tool versions during an active analysis — different versions can give different results.
- Export your environment (
conda env export > environment.yml) and keep it with your project. Include it as supplementary material when you publish. - Script your steps instead of typing commands ad hoc, so the analysis can be re-run exactly.
- Document data sources and commands in a short README alongside your results.
Where to go next
This week continues with the ONT genome assembly course, which builds directly on what you practiced here. You will reuse the same Conda workflow to install the assembly tools (flye, medaka) alongside the QC tools you already met (nanoq, seqkit), and you will submit those jobs to SLURM on lab417 exactly as covered on Day 2.
If you work on the CHPC Lengau cluster instead, which uses the PBS scheduler rather than SLURM, see the CHPC-specific lesson for the equivalent job-submission commands.
Open Q&A
Use the remaining time to ask anything from the two days. Some questions worth thinking about:
- Which of these skills will you use first in your own project?
- What does your data look like, and how will you get it onto lab417?
- What software will you need, and how will you set up an environment for it?
- How will you make sure someone else could reproduce your analysis?
SAIAB AGRP Bioinformatics Training. Open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0).