Introduction to Pangenomics
- A pangenome encompasses the complete collection of genes found in all genomes within a specific group, typically a species.
- Comparing the complete genome sequences of all members within a clade allows for the construction of a pangenome.
- The pangenome consists of two main components: the core genome and the accessory genome.
- The accessory genome can be further divided into the shell genome and the cloud genome.
- In an open pangenome, the size of the pangenome significantly increases with the addition of each new genome.
- In a closed pangenome, only a few gene families are added to the pangenome when a new genome is introduced.
Downloading Genomic Data
- The
ncbi-genome-downloadpackage is a set of scripts designed to download genomes from the NCBI database.
Annotating Genomic Data
- Prokka is a command line utility that provides rapid prokaryotic genome annotation.
- Sometimes we need manual curation of the output files of the software.
- Specialized software exist to perform annotation of specific genomic elements.
Measuring Sequence Similarity
- To build a pangenome you need to compare the genes and build gene families.
- BLAST gives a score of similarity between two sequences.
Clustering with BLAST Results
- The Bidirectional Best-Hit algorithm groups sequences together into families according to the E-value.
Clustering Protein Sequences
- Clustering protein sequences refers to the process of grouping similar sequences into distinct clusters or families.
- GET_HOMOLOGUES is a software package for microbial pangenome analysis
- Three sequence clustering algorithms are supported by GET_HOMOLOGUES; BDBH, COGtriangles, and OrthoMCL
Exploring Pangenome Graphs
- PPanGGOLiN is a software to create and manipulate prokaryotic pangenomes.
- PPanGGOLiN integrates gene families and their genomic neighborhood to build a graph and define the partitions.
- PPanGGOLiN is designed to scale up to tens of thousands of genomes.
Interactive Pangenome Plots
- Anvi’o can build a pangenome starting from genomes or metagenomes, or a combination of both
- Anvi’o allows you to interactively visualize your pangenomes
- Anvi’o platform includes additional scripts to explore the geometric and biochemical homogeneity of the gene clusters, to compute and visualize the ANI values of the genomes, to conduct a functional enrichment analysis in a group of genomes, among others
Other Resources
- Downstream analysis of pangenomes could be focused on describing the core or the accessory genome of the organism studied.
- Examples using the information obtained in the CORE GENOME:
- Selection of a conserved gene to design a molecular test for a diagnostic tool or a vaccine.
- Reconstruction of a species phylogenetic tree by using all the core genes.
- Examples using the information obtained in the ACCESSORY GENOME:
- Describe niche-specific genes among the strains compared.
- Analysis of horizontal gene transfer or genetic recombination.
- Evolutionary studies of genes (duplication, gain-loss genes, etc.).