Wrap-up and Q&A

Learning objectives

Review the skills covered across the two days
Understand the core reproducibility habits to carry into real analyses
Know where to go next — including the ONT genome assembly course

Approximate time: 20 minutes

What you have learned

Over the two days you have built a complete foundation for working on the lab417 cluster:

Day 1 — Unix fundamentals & shell scripting

Navigating the file system and working with files (cd, ls, cp, mv, mkdir, rm)
Wildcards and shortcuts for selecting files quickly
Viewing and creating files (cat, less, head, tail, nano)
File permissions and environment variables (chmod, $PATH, .bashrc)
Searching and redirection (grep, |, >, cut, sort, awk)
Writing shell scripts with variables and for loops for automation

Day 2 — HPC, data handling & Conda

HPC architecture and submitting jobs with SLURM (srun, sbatch, squeue, #SBATCH directives)
Moving data on and off the server with scp and rsync
Working with compressed files (gzip, zcat, tar) and keeping jobs alive with screen
Managing software with Conda — environments, installing tools from Bioconda, exporting to YAML
A real ONT QC practical with nanoq and seqkit

Reproducibility — the habits that matter

The single most important thing to take away is reproducibility. A few simple habits make your work trustworthy and repeatable:

Pin tool versions during an active analysis — different versions can give different results.
Export your environment (conda env export > environment.yml) and keep it with your project. Include it as supplementary material when you publish.
Script your steps instead of typing commands ad hoc, so the analysis can be re-run exactly.
Document data sources and commands in a short README alongside your results.

Where to go next

This week continues with the ONT genome assembly course, which builds directly on what you practiced here. You will reuse the same Conda workflow to install the assembly tools (flye, medaka) alongside the QC tools you already met (nanoq, seqkit), and you will submit those jobs to SLURM on lab417 exactly as covered on Day 2.

If you work on the CHPC Lengau cluster instead, which uses the PBS scheduler rather than SLURM, see the CHPC-specific lesson for the equivalent job-submission commands.

Open Q&A

Use the remaining time to ask anything from the two days. Some questions worth thinking about:

Which of these skills will you use first in your own project?
What does your data look like, and how will you get it onto lab417?
What software will you need, and how will you set up an environment for it?
How will you make sure someone else could reproduce your analysis?

SAIAB AGRP Bioinformatics Training. Open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0).