Skip to the content.

Conda Practical — Installing ONT Bioinformatics Tools

Learning objectives

Approximate time: 40 minutes

This is a hands-on practical that brings together everything from the Conda sessions. You will build a real working environment for quality-checking Oxford Nanopore sequencing data — using the same tools you will need in the ONT genome assembly course later this week.

Before you start: make sure Conda is installed and your Bioconda channels are configured (covered in the earlier Conda sessions). You should be on a compute node — your prompt should show [SLURM].

1. Create and activate the environment

Keeping each project in its own environment avoids dependency clashes. Create one named ont-qc with a specific Python version, then activate it:

conda create -n ont-qc python=3.10
conda activate ont-qc

Your prompt should now start with (ont-qc), showing the environment is active.

2. Install real ONT tools from Bioconda

Install two widely-used QC tools in a single command, pulling from both the bioconda and conda-forge channels:

conda install -c bioconda -c conda-forge nanoq seqkit

3. Run the tools on an ONT FASTQ file

A small subsampled ONT read set (~500 reads) is provided for this exercise in the course folder unix_lesson:

unix_lesson/ont_demo_reads.fastq.gz

Notice it is a .gz file — both tools read it compressed, no need to decompress.

Generate read statistics with nanoq (the -s flag prints a summary report):

nanoq -i unix_lesson/ont_demo_reads.fastq.gz -s

Now run seqkit stats on the same file:

seqkit stats unix_lesson/ont_demo_reads.fastq.gz

4. Interpret the output

Both tools report the same kinds of summary numbers. The key ones for Nanopore QC:

Worked example — the demo dataset

Running nanoq on the provided demo file:

nanoq -i unix_lesson/ont_demo_reads.fastq.gz -s

produces this single line of summary statistics:

500 4807721 12335 47072 1025 9615 8548 30.0 30.0

Reading the columns left to right:

Running seqkit stats on the same file gives the same numbers in a labelled table:

seqkit stats unix_lesson/ont_demo_reads.fastq.gz

file                                              format  type  num_seqs    sum_len  min_len  avg_len  max_len
unix_lesson/ont_demo_reads.fastq.gz  FASTQ   DNA        500  4,807,721    1,025  9,615.4   47,072

Note how the two tools agree: num_seqs = 500 reads, sum_len = 4,807,721 bases, min_len = 1,025, avg_len = 9,615, max_len = 47,072. seqkit gives a quick labelled overview; nanoq adds long-read-specific metrics like N50 and quality.

So this small demo set has 500 long reads with a healthy N50 of ~12 kb and a mean quality around Q30 — exactly the kind of read length and quality you want going into an assembly.

Why this matters: these are exactly the numbers you check before attempting an assembly. Too few bases, short N50, or low quality all predict a poor assembly — so QC first, assemble second.

5. Export the environment for reproducibility

An analysis is only reproducible if someone else can rebuild the exact software environment. Export it to YAML two ways:

conda env export > ont-qc-environment.yml
conda env export --from-history > ont-qc-minimal.yml

Open the YAML files and look at their structure — the environment name, the channels, and the dependency list:

cat ont-qc-minimal.yml

Anyone can later recreate the environment from this file with:

conda env create -f ont-qc-environment.yml

6. Useful extras

Bridge to the assembly course

You have now built a real environment, installed real ONT tools, run QC, and exported a reproducible recipe. This week, in the ONT genome assembly course, you will use these same tools plus flye (assembler) and medaka (polisher) — installed into a new environment using the exact workflow you just practiced here.

Exercises

  1. Create the ont-qc environment and confirm it is active (check your prompt).
  2. Install nanoq and seqkit, then run both on the demo FASTQ file.
  3. Compare the read count and N50 reported by nanoq and seqkit stats — do they agree?
  4. Export both a full and a minimal YAML, then open each and describe how they differ.

SAIAB AGRP Bioinformatics Training. Open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0).