Introduction to Pangenomics


  • A pangenome encompasses the complete collection of genes found in all genomes within a specific group, typically a species.
  • Comparing the complete genome sequences of all members within a clade allows for the construction of a pangenome.
  • The pangenome consists of two main components: the core genome and the accessory genome.
  • The accessory genome can be further divided into the shell genome and the cloud genome.
  • In an open pangenome, the size of the pangenome significantly increases with the addition of each new genome.
  • In a closed pangenome, only a few gene families are added to the pangenome when a new genome is introduced.

Downloading Genomic Data


  • The ncbi-genome-download package is a set of scripts designed to download genomes from the NCBI database.

Annotating Genomic Data


  • Prokka is a command line utility that provides rapid prokaryotic genome annotation.
  • Sometimes we need manual curation of the output files of the software.
  • Specialized software exist to perform annotation of specific genomic elements.

Measuring Sequence Similarity


  • To build a pangenome you need to compare the genes and build gene families.
  • BLAST gives a score of similarity between two sequences.

Clustering with BLAST Results


  • The Bidirectional Best-Hit algorithm groups sequences together into families according to the E-value.

Clustering Protein Sequences


  • Clustering protein sequences refers to the process of grouping similar sequences into distinct clusters or families.
  • GET_HOMOLOGUES is a software package for microbial pangenome analysis
  • Three sequence clustering algorithms are supported by GET_HOMOLOGUES; BDBH, COGtriangles, and OrthoMCL

Exploring Pangenome Graphs


  • PPanGGOLiN is a software to create and manipulate prokaryotic pangenomes.
  • PPanGGOLiN integrates gene families and their genomic neighborhood to build a graph and define the partitions.
  • PPanGGOLiN is designed to scale up to tens of thousands of genomes.

Interactive Pangenome Plots


  • Anvi’o can build a pangenome starting from genomes or metagenomes, or a combination of both
  • Anvi’o allows you to interactively visualize your pangenomes
  • Anvi’o platform includes additional scripts to explore the geometric and biochemical homogeneity of the gene clusters, to compute and visualize the ANI values of the genomes, to conduct a functional enrichment analysis in a group of genomes, among others

Other Resources


  • Downstream analysis of pangenomes could be focused on describing the core or the accessory genome of the organism studied.
  • Examples using the information obtained in the CORE GENOME:
    1. Selection of a conserved gene to design a molecular test for a diagnostic tool or a vaccine.
    1. Reconstruction of a species phylogenetic tree by using all the core genes.
  • Examples using the information obtained in the ACCESSORY GENOME:
    1. Describe niche-specific genes among the strains compared.
    1. Analysis of horizontal gene transfer or genetic recombination.
    1. Evolutionary studies of genes (duplication, gain-loss genes, etc.).