All Images

Secondary metabolite biosynthetic gene cluster identification

Image 1 of 1: ‘BIG-SCAPE output as visualized in the web page. The overview page is displayed. At the left is the Run Information, indicating the date and time at which the analysis was started and completed, as well as the parameters of the run. Next is displayed the Input Data, specifying the total number of genomes and the total BGCs, in this example 8 and 23, respectively. There are two pie charts, one representing the BGC per Genome and the other the BGC per Class. At the right is the Network Overview, which allows selection between mix and the different BGC classes. From the mix overview it displays the Number of Families, Average number of BGCs per family, Max number of BGCs in a family and the Families with MIBiG Reference BGCs.’

BIG-SCAPE output as visualized in the web page.
The overview page is displayed. At the left is the Run Information,
indicating the date and time at which the analysis was started and completed, as well as the parameters of the run.
Next is displayed the Input Data, specifying the total number of genomes and the total BGCs,
in this example 8 and 23, respectively. There are two pie charts, one representing the BGC per Genome
and the other the BGC per Class. At the right is the Network Overview,
which allows selection between mix and the different BGC classes.
From the mix overview it displays the Number of Families, Average number of BGCs per family,
Max number of BGCs in a family and the Families with MIBiG Reference BGCs.

Figure 2

Image 1 of 1: ‘BIG-SCAPE output as visualized in the web page. The overview page displaying the clustered heatmap of the presence/absence of the GCFs, each class organized as a column at the base of the table, in each genome, which are organized as rows at the right side of the table. Presence is depicted in red with 1 and absence in beige with 0.’

BIG-SCAPE output as visualized in the web page.
The overview page displaying the clustered heatmap of the presence/absence of the GCFs,
each class organized as a column at the base of the table, in each genome,
which are organized as rows at the right side of the table. Presence is depicted in red with 1 and absence in beige with 0.

Figure 3

Image 1 of 1: ‘BIG-SCAPE similarity network of the complete mix of BGCs obtained from the run. A network is represented for each GCF, each dot represents a BGC. In this example there are a total of 23 BGCs, of which 6 are singletons, there are 28 links and 11 families. Clicking over any of the dots shows the GCF at the right side and allows you to explore it further.’

BIG-SCAPE similarity network of the complete mix of BGCs
obtained from the run. A network is represented for each GCF, each dot represents a BGC.
In this example there are a total of 23 BGCs, of which 6 are singletons, there are 28 links and 11 families.
Clicking over any of the dots shows the GCF at the right side and allows you to explore it further.

Figure 4

Image 1 of 1: ‘BIG-SCAPE output displaying a tree of phylogenetic distances among the BGCs comprised in a GCF. The example shows the GCF 10, comprised by six members. Each BGC is represented in the tree by an arrow diagram of the genes and the protein domains in the genes corresponding to that cluster.’

BIG-SCAPE output displaying a tree of phylogenetic distances
among the BGCs comprised in a GCF. The example shows the GCF 10, comprised by six members.
Each BGC is represented in the tree by an arrow diagram of the genes and the protein domains in the genes
corresponding to that cluster.

Homologous BGC Clusterization

Figure 1

Image 1 of 1: ‘Three biomolecules are depicted in three different colors; blue, red and purple. These molecules are present in BGCs of diverse bacterial lineages and in turn grouped into Gene Cluster Families (GCFs). GF1 contains three domains related to the blue molecule and three from the red molecule. GF2 contains four domains of each of the molecules, blue, red and purple. GCF3 and GCF4 contain each three domains associated with the purple biomolecule but different from each other. Every GCF produces a different metabolite, here represented as weapons or tools.’

Three biomolecules are depicted in three different colors; blue, red and purple. These molecules are present in BGCs of diverse bacterial lineages and in turn grouped into Gene Cluster Families (GCFs). GF1 contains three domains related to the blue molecule and three from the red molecule. GF2 contains four domains of each of the molecules, blue, red and purple. GCF3 and GCF4 contain each three domains associated with the purple biomolecule but different from each other. Every GCF produces a different metabolite, here represented as weapons or tools.

Figure 2

Example of tsv table composed by five columns and two rows.
The first row contains the title for each column; # Dataset name, Path to folder, Path to taxonomy, Description.

Figure 3

Example of the structure of the input-folder.
The content of the directory input-folder is displayed in a tree-like format, listing the files and directories inside it.

Figure 4

Image 1 of 1: ‘BiG SLiCE web page output displaying the results obtained from the example run. A left panel presents the information generated, composed of seven tabs; Summary, Datasets, Runs, Reports, Help, Feedback, and About. The rest is a Summary of all datasets and runs included in the output. Appearing firstly, the Datasets provided as input, organized as a table with five rows; name, #genomes, total, with taxonomy, and description. Next, the information about the Runs, also organized as a table with the following rows; name, started, finished, status, threshold, #BGCs, and #GCFs..’

BiG SLiCE web page output displaying the results obtained from the example run.
A left panel presents the information generated, composed of seven tabs;
Summary, Datasets, Runs, Reports, Help, Feedback, and About.
The rest is a Summary of all datasets and runs included in the output.
Appearing firstly, the Datasets provided as input, organized as a table with five rows;
name, #genomes, total, with taxonomy, and description.
Next, the information about the Runs, also organized as a table with the following rows;
name, started, finished, status, threshold, #BGCs, and #GCFs..

Figure 5

Image 1 of 1: ‘BiG SLiCE web page output displaying the information obtained from the Run-0001. Firstly, it is shown an Overview from the Run. Indicating the Status, when was it Started and Finished, as well as the Parameters, the HMM models and the Input. Next, the Statistics about the run are shown as two bar-plots. The left one plots the average distance to GCF models, whilst the right one shows the amount of BGCs in GCFs.’

BiG SLiCE web page output displaying the information obtained from the Run-0001.
Firstly, it is shown an Overview from the Run. Indicating the Status, when was it Started and Finished,
as well as the Parameters, the HMM models and the Input.
Next, the Statistics about the run are shown as two bar-plots.
The left one plots the average distance to GCF models, whilst the right one shows the amount of BGCs in GCFs.

Figure 6

BiG SLiCE web page output displaying detailed information regarding the BGCs from GCF_7.
The Arrowers show a gene arrow visualization of the domains that are part of each of the genes of the BGCs belonging to GCF_7.

Figure 7

BiG-FAM main page showing an introduction as well as a graphical representation of the database.
A left panel displays the available options; Home, GCFs, BGCs, Query, Statistics, Help, Feedback, and About.

Figure 8

BiG-FAM query page with the option for inserting an antiSMASH job and submitting it.
Below it is described how much of the job is Pending, Processing, and Processed.

Figure 9

Image 1 of 1: ‘BiG-FAM result page indicating firstly an overview of the job; the query which was submitted, its status, as well as the time at which it was started and finished. Next, a table indicating the BGCs from the database which are related with the query BGCs. This is organized as a table with seven rows; query BGC, class, length (kb), completeness, best hit, distance and details’

BiG-FAM result page indicating firstly an overview of the job; the query which was submitted,
its status, as well as the time at which it was started and finished.
Next, a table indicating the BGCs from the database which are related with the query BGCs.
This is organized as a table with seven rows; query BGC, class, length (kb), completeness, best hit, distance and details

Finding Variation on Genomic Vicinities

Figure 1

Image 1 of 1: ‘CORASON's workflow for sorting phylogenetically BGCs. Given a query gene in a reference BGC and a genomic annotated database, CORASON firstly searches for query gene homologues, it filters out all genomic vicinities not related to the reference BGC. Then, CORASON infers a phylogenetic tree and calculates the frequency of occurrence for each gene family from the reference BGC. Using the same reference BGC, if a new query gene is selected, CORASON visualizes a new phylogeny with families containing the same molecular modifications.’

CORASON's workflow for sorting phylogenetically BGCs. Given a query gene in a reference BGC and a genomic annotated database,
CORASON firstly searches for query gene homologues, it filters out all genomic vicinities
not related to the reference BGC. Then, CORASON infers a phylogenetic tree and
calculates the frequency of occurrence for each gene family from the reference BGC.
Using the same reference BGC, if a new query gene is selected, CORASON visualizes
a new phylogeny with families containing the same molecular modifications.

Figure 2

CORASON phylogenetic svg reconstruction using cpsG as query gene and _S. agalactiae_ 1000006 as query cluster.
At the bottom, it is displayed the frequency of occurrence for each gene family from the reference BGC,
each with a different color.

Figure 3

Evolutionary Genome Mining

Figure 1

Image 1 of 1: ‘a) EvoMining expansion-and-recruitment pipeline. A group of grey stacked cylinders representing genomes in a database (DB). Homologues and expansions of seed enzymes, represented as an orange arrow, from the enzyme DB are searched by blastp in the genome DB. The outcome is integrated as the expanded enzyme families (EFs) within the genome DB. Bidirectional best hits (BBH) of seed enzymes, red arrows, are marked as conserved metabolism. The EFs are amplified after being compared against a DB of natural products (NP) biosynthetic enzymes, represented by a blue cylinder, to find recruitments defined as enzymes of the family that are part of a MIBiG BGC. b) The genome DB, represented by the gray stacked cylinders, is searched as previously described. Additionally, antiSMASH predictions, cyan arrows, can be added by the user. antiSMASH enzyme predictions that are at the same time marked in red are defined as transition enzymes, purple arrows. c) EvoMining phylogenetic reconstruction and visualization. On the left side, a phylogenetic reconstruction of an EF is shown. On the right side it is shown the EvoMining tree displaying the EvoMining predictions (green), which are those extra copies closer to enzyme recruitments into BGC (blue) than to conserved metabolic enzymes (red). antiSMASH predicted enzymes are represented in cyan, transition enzymes in black and extra copies that are neither antiSMASH nor EvoMining predictions are left in grey.’

a) EvoMining expansion-and-recruitment pipeline. A group of grey stacked cylinders representing genomes in a database (DB).
Homologues and expansions of seed enzymes, represented as an orange arrow, from the enzyme DB
are searched by blastp in the genome DB.
The outcome is integrated as the expanded enzyme families (EFs) within the genome DB.
Bidirectional best hits (BBH) of seed enzymes, red arrows, are marked as conserved metabolism.
The EFs are amplified after being compared against a DB of natural products (NP) biosynthetic enzymes,
represented by a blue cylinder, to find recruitments defined as enzymes of the family that are part of a MIBiG BGC.
b) The genome DB, represented by the gray stacked cylinders, is searched as previously described.
Additionally, antiSMASH predictions, cyan arrows, can be added by the user.
antiSMASH enzyme predictions that are at the same time marked in red are defined as transition enzymes, purple arrows.
c) EvoMining phylogenetic reconstruction and visualization. On the left side, a phylogenetic reconstruction of an EF is shown.
On the right side it is shown the EvoMining tree displaying the EvoMining predictions (green),
which are those extra copies closer to enzyme recruitments into BGC (blue) than to conserved metabolic enzymes (red).
antiSMASH predicted enzymes are represented in cyan, transition enzymes in black and
extra copies that are neither antiSMASH nor EvoMining predictions are left in grey.

Figure 2

Image 1 of 1: ‘EvoMining phylogenetic reconstruction providing evolutionary insights into the metabolic origin and the fate of members of diverse EF from the Streptococcus example. Seed enzymes are labeled in orange. The most conserved copies or central metabolism copies are marked in red. Enzyme copies recruited into specialized metabolism, contained in MIBiG, are labeled in blue. Enzyme copies that are closer to blue enzyme recruitments than to red conserved enzymes are labeled in green and represent EvoMining Hits. Extra copies with an unknown metabolic fate are shown in gray.’

EvoMining phylogenetic reconstruction providing evolutionary insights into the metabolic origin
and the fate of members of diverse EF from the Streptococcus example.
Seed enzymes are labeled in orange. The most conserved copies or central metabolism copies are marked in red.
Enzyme copies recruited into specialized metabolism, contained in MIBiG, are labeled in blue.
Enzyme copies that are closer to blue enzyme recruitments than to red conserved enzymes are labeled in green
and represent EvoMining Hits. Extra copies with an unknown metabolic fate are shown in gray.

Figure 3

Figure 4

Figure 5

Image 1 of 1: ‘MicroReact visualization of the EvoMining run Streptococcus example. At the left a bar-chart with the EF in the X axis and the number of entries in the Y axis. At the right, the EvoMining phylogenetic tree using the same color code as the chart. Right of the tree the legend indicating the colors by metabolism; central metabolism enzymes in red, expansion enzymes in gray, recruited enzymes contained in MIBiG in blue, secondary metabolism enzymes (EvoMining hits) are marked in green, and seed enzymes are colored in orange. Below appears the metadata from the run, organized in a five row table including Id, metabolism, genome, function and copies.’