Introduction to Genome Mining
- Natural products are encoded in Biosynthetic Gene Clusters (BGCs)
- Genome mining describes the exploitation of genomic information with specialized algorithms intended to discover and study BGCs
Secondary metabolite biosynthetic gene cluster identification
- antiSMASH is a bioinformatic tool capable of identifying, annotating and analysing secondary metabolite BGC
- antiSMASH can be used as a web-based tool or as stand-alone command-line tool
- The file extensions accepted by antiSMASH are GenBank, FASTA and EMBL
Genome Mining Databases
- MIBiG provides BGCs that have been experimentally tested
- antiSMASH database comprises predicted BGCs of each organism
BGC Similarity Networks
- BGC similarity is measured by BiG-SCAPE according to protein domain content, adjacency and sequence identity.
- The
gbksof the regions identified by antiSMASH are the input for BiG-SCAPE. - BiG-SCAPE delivers BGCs similarity networks with which it delimits Gene Cluster Families and creates a phylogeny of the BGCs in each GCF.
Homologous BGC Clusterization
You can use the grep --help command to get information
about the available options for the grep command
-
BiG-SLiCEandBiG-FAMare softwares that are useful to compare the metabolic diversity of bacterial lineages between each other and against a big database - An input-folder containing the BGCs from antiSMASH and the taxonomic
information of each genome is needed to run
BiG-SLiCE - The results from the antiSMASH web-tool are needed to run
BiG-FAM - Gene Cluster Families can help us to compare the metabolic capabilities of a set of bacterial lineages
- We can use
BiG-FAMto compare a BGC against the whole database and predict its Gene Cluster Family
Finding Variation on Genomic Vicinities
- CORASON is a command-line tool that finds BGC-families
- Genomic vicinity variation is organized phylogenetically according to the conserved genes in the BGC-family
Evolutionary Genome Mining
- EvoMining is a command-line tool that performs evolutionary genome mining over gene families
- EvoMining hits can belong to new BGC
- MicroReact is an interactive genomic visualizer compatible with EvoMining output
GATOR-GC: Genomic Assessment Tool for Orthologous Regions and Gene Clusters
- GATOR-GC is an innovative tool that uses an enzyme-aware scoring system and evolutionary principles to explore BGC diversity.
- Unlike traditional methods, GATOR-GC offers flexibility in defining the taxonomic scope and prioritizes the identification of novel biosynthetic pathways.
- GATOR-GC can be customized to search for essential and optional enzymes, making it a powerful tool for targeted exploration.
- Dynamic gene cluster diagrams and GATOR neighborhood visualizations provide clear insights into gene conservation and genomic relationships.
Metabolomics workshop
- Data is generated using Liquid Chromatography coupled to a tandem mass spectrometer (LC-MS/MS or MS2).
- Dereplication is the process of identifying previously known compounds.
- Molecular networking is a computational method that organizes MS2 data based on spectral similarity, allowing us to infer relationships between chemical structures
- Feature-Based Molecular Networking (FBMN) enhances classical molecular networking by integrating relative quantitative data, enabling more robust metabolomics statistical analysis.
Other Resources
- First key point. Brief Answer to questions. (FIXME)