Introduction to Genome Mining


  • Natural products are encoded in Biosynthetic Gene Clusters (BGCs)
  • Genome mining describes the exploitation of genomic information with specialized algorithms intended to discover and study BGCs

Secondary metabolite biosynthetic gene cluster identification


  • antiSMASH is a bioinformatic tool capable of identifying, annotating and analysing secondary metabolite BGC
  • antiSMASH can be used as a web-based tool or as stand-alone command-line tool
  • The file extensions accepted by antiSMASH are GenBank, FASTA and EMBL

Genome Mining Databases


  • MIBiG provides BGCs that have been experimentally tested
  • antiSMASH database comprises predicted BGCs of each organism

BGC Similarity Networks


  • BGC similarity is measured by BiG-SCAPE according to protein domain content, adjacency and sequence identity.
  • The gbks of the regions identified by antiSMASH are the input for BiG-SCAPE.
  • BiG-SCAPE delivers BGCs similarity networks with which it delimits Gene Cluster Families and creates a phylogeny of the BGCs in each GCF.

Homologous BGC Clusterization


You can use the grep --help command to get information about the available options for the grep command

  • BiG-SLiCE and BiG-FAM are softwares that are useful to compare the metabolic diversity of bacterial lineages between each other and against a big database
  • An input-folder containing the BGCs from antiSMASH and the taxonomic information of each genome is needed to run BiG-SLiCE
  • The results from the antiSMASH web-tool are needed to run BiG-FAM
  • Gene Cluster Families can help us to compare the metabolic capabilities of a set of bacterial lineages
  • We can use BiG-FAM to compare a BGC against the whole database and predict its Gene Cluster Family

Finding Variation on Genomic Vicinities


  • CORASON is a command-line tool that finds BGC-families
  • Genomic vicinity variation is organized phylogenetically according to the conserved genes in the BGC-family

Evolutionary Genome Mining


  • EvoMining is a command-line tool that performs evolutionary genome mining over gene families
  • EvoMining hits can belong to new BGC
  • MicroReact is an interactive genomic visualizer compatible with EvoMining output

GATOR-GC: Genomic Assessment Tool for Orthologous Regions and Gene Clusters


  • GATOR-GC is an innovative tool that uses an enzyme-aware scoring system and evolutionary principles to explore BGC diversity.
  • Unlike traditional methods, GATOR-GC offers flexibility in defining the taxonomic scope and prioritizes the identification of novel biosynthetic pathways.
  • GATOR-GC can be customized to search for essential and optional enzymes, making it a powerful tool for targeted exploration.
  • Dynamic gene cluster diagrams and GATOR neighborhood visualizations provide clear insights into gene conservation and genomic relationships.

Metabolomics workshop


  • Data is generated using Liquid Chromatography coupled to a tandem mass spectrometer (LC-MS/MS or MS2).
  • Dereplication is the process of identifying previously known compounds.
  • Molecular networking is a computational method that organizes MS2 data based on spectral similarity, allowing us to infer relationships between chemical structures
  • Feature-Based Molecular Networking (FBMN) enhances classical molecular networking by integrating relative quantitative data, enabling more robust metabolomics statistical analysis.

Other Resources


  • First key point. Brief Answer to questions. (FIXME)