Conda for Bioinformatics

Managing Bioinformatics Software in Your Home Directory

Page 1: Introduction to Conda

What is Conda?

Conda is a powerful package and environment management system that allows you to install, update, and manage software packages and their dependencies entirely within your home directory. Unlike system-wide package managers that require administrator privileges, conda gives you complete control over your software environment without needing root access.

Why Use Conda?

  • No Admin Rights Required: Install complex software stacks in your home directory without bothering system administrators.
  • Dependency Management: Conda automatically resolves and installs all required dependencies, preventing the "dependency hell" that often plagues manual installations.
  • Environment Isolation: Create separate environments for different projects, preventing conflicts between different versions of the same software.
  • Cross-Platform: Works identically on Linux, macOS, and Windows.
  • Scientific Computing Focus: Excellent support for Python, R, scientific libraries, and data science tools.

What You Can Install with Conda

  • Python and R interpreters with different versions
  • Scientific libraries (NumPy, SciPy, Pandas, Matplotlib)
  • Machine learning frameworks (TensorFlow, PyTorch, scikit-learn)
  • Bioinformatics tools (BWA, SAMtools, BLAST, GATK, STAR)
  • Development tools (Git, editors, compilers)
  • System utilities and command-line tools

Course Prerequisites

  • Basic familiarity with command-line interface
  • Access to a terminal (Linux, macOS, or Windows with WSL)
  • At least 2GB of free space in your home directory

Page 2: Conda on lab417

Conda is already installed

On the lab417 server you do not need to download or install conda. A shared Miniconda is installed system-wide at /opt/miniconda3 and is added to your PATH automatically when you log in. Your course account is also pre-configured to keep your environments and packages in your home directory.

Verifying Conda is Available

Log in to lab417 and run:

# Check conda version
conda --version

# Check conda info
conda info

If these commands print version/info, conda is ready to use — no installation required.

The Shared Base Environment is Read-Only

The shared base environment at /opt/miniconda3 belongs to the system and is not writable by your account. Installing software directly into it fails with:

EnvironmentNotWritableError: The current user does not have write permissions to the target environment.
  environment location: /opt/miniconda3

This is expected. You never install into the shared base. Instead you create your own environment, which is stored in your home directory and is fully under your control.

Your Account is Pre-Configured

Your home directory contains a ~/.condarc file (created for you) that routes all environments and downloaded packages into your home space:

# ~/.condarc
envs_dirs:
  - ~/.conda/envs
pkgs_dirs:
  - ~/.conda/pkgs

You can confirm this with:

conda config --show envs_dirs pkgs_dirs

Installing Software: samtools Example

Create a named environment and install samtools into it from the bioconda channel, then activate it:

# Create an environment named "bio" with samtools
conda create -n bio -c bioconda -c conda-forge samtools

# Activate it
conda activate bio

# Verify
samtools --version

The bio environment lives in ~/.conda/envs/bio — writable, isolated, and yours. Add more tools by listing them after the create command, or install into the active environment with conda install -c bioconda <tool>.

Using Conda on Your Own Machine

If you later want conda on your own laptop (not lab417), download Miniconda from https://conda.io/miniconda.html and run the installer. The environment workflow above is identical once conda is installed.

Page 3: Getting Started with Conda

Understanding the Base Environment

Conda has a "base" environment containing conda itself and a Python installation. On lab417 the base lives in the shared, read-only /opt/miniconda3 — you cannot install into it. For all bioinformatics work you create your own named environments in your home directory and install into those.

# Check which environment you're in (and list all environments)
conda info --envs

# Check conda version
conda --version

# See what's installed in the current environment
conda list

Golden Rule on lab417

Never conda install into base — it will fail with a permissions error. Always conda create -n <name> ... then conda activate <name>, and install inside that environment.

Essential Conda Commands for Bioinformatics

Keeping conda updated

On lab417 conda itself is in the shared base and is kept updated by the system administrators — you do not (and cannot) run conda update conda. You keep the tools inside your own environment up to date instead (shown below).

Setting up bioinformatics channels

The bioconda channel is essential for bioinformatics software. Set it up first:

# Add essential channels for bioinformatics
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

# Set channel priority (recommended)
conda config --set channel_priority strict

Basic bioinformatics package operations

First create and activate your own environment, then run install/update/remove commands inside it:

# Create an environment and activate it (do this once)
conda create -n bio -c bioconda -c conda-forge samtools
conda activate bio

# Search for bioinformatics tools (works without an environment too)
conda search samtools
conda search bwa
conda search blast

# Install more software into the active environment
conda install -c bioconda bcftools htslib

# Install a specific version (important for reproducibility)
conda install -c bioconda samtools=1.15

# If you get an error "Could not solve for environment specs" use the following command.
# This will install the correct libraries from conda-forge
conda create -n samtools -c conda-forge -c bioconda samtools=1.15

# Update tools in the active environment
conda update samtools

# Remove tools from the active environment
conda remove samtools

Your First Bioinformatics Package Installation

Let's create an environment and install some commonly used bioinformatics tools into it:

# Create a dedicated environment with sequence analysis tools
conda create -n seqtools -c bioconda -c conda-forge samtools bcftools bwa bowtie2 hisat2

# Activate it
conda activate seqtools

# Add quality control tools into the active environment
conda install -c conda-forge -c bioconda fastqc multiqc trimmomatic

# Verify installation
samtools --version
bwa
fastqc --version

Understanding Bioinformatics Channels

Key channels for bioinformatics:

  • bioconda: Primary source for bioinformatics software (6000+ packages)
  • conda-forge: Community-maintained packages, including Python libraries
  • defaults: Anaconda's main channel
  • r: R packages for statistical analysis

With an environment activated (e.g. conda activate bio), install from specific channels:

# Install from specific channels (into the active environment)
conda install -c bioconda blast
conda install -c conda-forge biopython
conda install -c r r-ggplot2

# Search in bioconda specifically (no environment needed)
conda search -c bioconda "gatk*"

Bioinformatics Tip: Version Control

Always specify exact versions for critical analysis tools to ensure reproducibility. Many bioinformatics tools have version-specific behaviors that can affect results.

Practice Exercises

  1. Search for available versions of BLAST
  2. Install FastQC and check its version
  3. Install the latest version of BWA-MEM2
  4. List all currently installed bioinformatics packages
  5. Search for packages related to "assembly" in bioconda

Page 4: Bioinformatics Environments

Why Environments Are Critical in Bioinformatics

Bioinformatics workflows often require specific tool versions, and different analyses may need conflicting dependencies. Environments solve this by creating isolated spaces for each project or analysis type.

Common Bioinformatics Environment Patterns

  • Project-specific: One environment per research project
  • Analysis-specific: Separate environments for RNA-seq, ChIP-seq, variant calling, etc.
  • Tool-specific: Environments for complex tools with many dependencies (e.g., GATK, Nextflow)
  • Pipeline-specific: Environments matching published workflow requirements

Creating Bioinformatics Environments

Genome assembly environment

# Create assembly environment
conda create --name assembly python=3.9
conda activate assembly
conda install -c conda-forge -c bioconda spades flye canu quast busco
conda install -c conda-forge matplotlib seaborn

Managing Bioinformatics Environments

Working with environments

# List all environments
conda env list

# Activate specific environment
conda activate assembly

# Check what's installed in current environment
conda list

# Show environment info with sizes
conda info

# Deactivate environment
conda deactivate

Environment documentation for reproducibility


# Activate specific environment
conda activate assembly

# Export environment for sharing/publication
conda env export > assembly_environment.yml

# Export with exact versions and hashes
conda env export --no-builds > assembly_environment_exact.yml

# Create environment from published requirements
conda env create -f published_workflow.yml

Bioinformatics Environment Files

Genome assembly environment.yml example

name: assembly
channels:
  - bioconda
  - conda-forge
  - defaults
dependencies:
  - python=3.9
  - spades=3.15.5
  - flye=2.9.1
  - canu=2.2
  - quast=5.2.0
  - busco=5.4.3
  - samtools=1.15
  - pandas=1.5.0
  - numpy=1.23.0
  - matplotlib=3.6.0
  - seaborn=0.11.2

Environment Management Best Practices

Bioinformatics Environment Guidelines

  1. Descriptive naming: Use names like cancer-rnaseq-2024 or assembly-pacbio
  2. Version pinning: Always specify versions for critical tools
  3. Documentation: Export environment.yml files with your analysis
  4. Minimal environments: Don't install everything in one environment
  5. Testing environments: Create test environments for new tool versions

Sharing Environments for Reproducible Research

# Create environment from collaborator's file
conda env create -f collaborator_environment.yml

# Update existing environment from new requirements
conda env update -f updated_requirements.yml --prune

# Export minimal requirements (only explicitly installed)
conda env export --from-history > minimal_requirements.yml

Reproducibility Tip

Always include your environment.yml file with your analysis code and data. This allows others to recreate your exact computational environment, ensuring reproducible results.

Practice Exercises

  1. Create a metagenomics environment with Kraken2, MetaPhlAn, and QIIME2
  2. Set up a phylogenetics environment with RAxML, IQ-TREE, and FigTree
  3. Export your genome assembly environment to a YAML file
  4. Create an environment from a provided environment.yml file
  5. Set up separate environments for Python and R-based analyses

Page 5: Practical & Wrap-up

You now have everything you need for the ONT assembly course: creating environments, installing tools, and exporting them for reproducibility. This final page covers three practical things that will save you time and headaches.

1. mamba — a faster alternative to conda

conda install can be slow, because working out which versions of which packages are compatible (the "solve") is hard work. mamba does exactly the same job, but much faster. It is a drop-in replacement — the arguments are identical, you just swap the command name.

# Install mamba once (into your base environment)
conda install -c conda-forge mamba

# Then use it anywhere you would have used conda install
mamba install -c bioconda nanoq seqkit
mamba create -n assembly flye medaka

Tip

If a conda install command seems stuck on "Solving environment" for a long time, cancel it with Ctrl+C and run the same command with mamba instead.

2. conda clean — free up disk space

Every package you install is also kept in a download cache. Over time this cache can grow to many gigabytes. When your home directory is running low on space, clear it safely with:

# Remove cached package downloads and unused files
conda clean --all

This only deletes cached downloads — it does not remove your environments or installed tools, so it is always safe to run.

3. Troubleshooting the two most common problems

Problem 1: Channel conflicts / package not found

You try to install a tool and conda either cannot find it or complains about conflicts:

PackagesNotFoundError: The following packages are not available from current channels

Cause: the bioinformatics channels are not set up, or are in the wrong order.

Fix: add the channels in the correct priority, then try the install again:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

Order matters — the channel added last ends up highest priority, so conda-forge sits on top. You can also pass channels explicitly on each install: conda install -c bioconda -c conda-forge toolname.

Problem 2: conda activate does not work after installing

Right after installing conda you run conda activate myenv and see:

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'

Cause: conda has not been initialised in your current shell session.

Fix: initialise conda once, then reload your shell:

conda init bash
# then close and reopen your terminal, or run:
source ~/.bashrc

After this your prompt should show (base), and conda activate will work.

Quick Reference: the commands you will actually use

# Environments
conda create -n myenv python=3.10
conda activate myenv
conda deactivate
conda env list

# Installing tools
conda install -c bioconda -c conda-forge nanoq seqkit
mamba install -c bioconda flye medaka   # faster

# Reproducibility
conda env export > environment.yml
conda env export --from-history > minimal.yml
conda env create -f environment.yml

# Maintenance
conda clean --all

Wrap-up

That is everything you need. You can create isolated environments, install bioinformatics tools from Bioconda, speed installs up with mamba, recover from the two most common errors, and export environments so your work is reproducible — ready for the ONT genome assembly course later this week.