Data Processing and Visualization for Metagenomics: All Images

Starting a Metagenomics Project

Figure 1

Flow chart that shows the steps of a metagenomics project: Experimental design, Sampling, DNA extraction, Sequencing, Read quality, Assembly, Binning, Bin quality and Data analysis

Figure 2

Photography of a pond in Cuatro Ciénegas

Image 1 of 1: ‘Three diagrams depicting the three assembly algorithms: The Greedy extension starts with any read, extends it whit the reads that make a match to make a contig, it continues with a different read when the previous contig can not be extended anymore. The Overlap Layout consensus finds every pairwise overlap, makes a layout graph with all the overlaps and chooses consensus sequences to make the contigs. The De Bruijn Graphs divides the reads in k-mers, makes a k-mer graph that shows all the overlapping k-mers, and chooses paths from the graph to make the contigs. ’

Three diagrams depicting the three assembly algorithms: The Greedy extension starts with any read, extends it whit the reads that make a match to make a contig, it continues with a different read when the previous contig can not be extended anymore. The Overlap Layout consensus finds every pairwise overlap, makes a layout graph with all the overlaps and chooses consensus sequences to make the contigs. The De Bruijn Graphs divides the reads in k-mers, makes a k-mer graph that shows all the overlapping k-mers, and chooses paths from the graph to make the contigs.

Metagenome Binning

Figure 1

Image 1 of 1: ‘Diagram depicts the DNA sequences in the original sample as circular chromosomes of three different taxa. After sequencing, the DNA sequences of the three different taxa are mixed as small linear reads; after the assembly, we have contigs, each corresponding to a single taxon, except for the ones with a bad assembly that has sequences of different taxa in the same contig, after the binning taxa separate the contigs.’

Diagram depicts the DNA sequences in the original sample as circular chromosomes of three different taxa. After sequencing, the DNA sequences of the three different taxa are mixed as small linear reads; after the assembly, we have contigs, each corresponding to a single taxon, except for the ones with a bad assembly that has sequences of different taxa in the same contig, after the binning taxa separate the contigs.

Taxonomic Assignment

Figure 1

Diagram of a taxonomic tree with four levels of nodes, some nodes have a number from 1 to 3, and some do not. From the most recent nodes, one has a three, and its parent nodes do not have numbers. This node with a three is selected.

Figure 2

Flow diagram that shows how the initial composition of 33% for each of the three taxa in the sample ends up being 4%, 72%, and 24% after the biases imposed by the extraction, PCR, sequencing and bioinformatics steps.

Figure 3

Krona displays a circled-shape bacterial taxonomy plot with abundance percentages of each taxon

Figure 4

Pavian website showing the upload of two reports

Figure 5

Results Overview tab of the Pavian website where it shows the number of reads classified to several categories for the two samples

Figure 6

Sankey type visualization that shows the abundance of each taxonomic label in a tree-like manner

Figure 7

A bar chart of the abundance of reads of the two samples, showing a segment for the read identified at the specific taxon and another segment for the number of reads identifies at children of the specified taxon

Figure 8

A table of the same format as the Kraken report but for both samples at once.

Exploring Taxonomy with R

Figure 1

A table where the taxonomic
identification information of all OTUs is displayed. Each row represents one
OTU and the columns represent its identification at different levels in the taxonomic classification ranks, begging with Kingdom until we reach Species
in the seventh column.

Figure 2

Figure 3

A table where the abundance of each OTU in each sample is shown. Each row represents one
OTU and the columns represent the samples. In the intersection, a number indicates how many sequenced reads of that OTU are present in that sample.

Figure 4

The OTU number is in the leftmost space of the table as a row name for the searched species.

Diversity Tackled With R

Figure 1

Alpha diversity diagram: In lake A, we have three fishes, each one of a different species. On lake B, we have two fish, each of a different species. Moreover, we have four fish in lake C, each of different species.

Figure 2

Alpha and Beta diversity diagram: Each Lake has a different number of species, and each species has a different number of fish individuals. Both metrics are taken into account to measure alfa and beta diversity.

Figure 3

Image 1 of 1: ‘In lake A, we have four different species, two of these species have three specimens each one. This Lake also has two specimens of a third species and only one specimen of a fourth specie. We got nine fish in total. Lake B has only three different species, the most populated species is also present in lake A and has five specimens, and we have only one specimen of each of the other two species. We got seven species total in lake B ’

In lake A, we have four different species, two of these species have three specimens each one. This Lake also has two specimens of a third species and only one specimen of a fourth specie. We got nine fish in total. Lake B has only three different species, the most populated species is also present in lake A and has five specimens, and we have only one specimen of each of the other two species. We got seven species total in lake B

Figure 4

A figure divided in three
panels. Each of these panels represents a different alpha diversity index.
Inside this section, each point represents the value assigned on this index to
the three different samples. The different indexes give
different values to the same sample.

Figure 5

Figure 6

Figure 7

The same panels as before, but now the samples are arranged horizontally according to the values in the Shannon index panel.

Figure 8

Plot with NMDS1 as a label in x-axis that goes from -0.4 to 0.2 and NMDS2 in y-axis that goes from -0.2 to 0.1. The plot has three dots that are not clustered in any way.

Figure 9

The distance between the three samples, JC1A, JP4D and JP41 is shown in a plane. Each sample has a legend and a color. The color is according to the metadata treatment. There are three possible treatments in the legend: Control mesocosm, Fertilized pond, and Unenriched pond

Taxonomic Analysis with R

Figure 1

Table containing the
taxonomic information of each of the OTUs inside the three samples. Here,
we can see only the Phylum column has information, leaving the other
taxonomic levels blank.

Figure 2

A two-part plot contrasting
the absolute versus the relative abundance of the three samples. On the right
side, we can see how each of the bars has its own height, making it difficult
to compare the information between samples. The right side shows
three bars with the same height after the abundance was transformed to
percentage inside each sample.

Figure 3

A new two-part plot with
a reassignment of the low-abundant taxa on the right side. Compared to the
left legend, the one in the right has fewer groups because of the process of
reassigning the taxa with an abundance lower than 0.5 % to just one
group/color.

Figure 4

A new plot with three bars
representing the absolute abundance of Cyanobacteria in each of the samples.
Each of the colors represents a Genus. Because we see relative
abundances, all the bars have the same height.

Figure 5

New reassignment to the low abundant taxa on the left part of the plot. A new class has been created that contains the taxa with less than 300 reads

Figure 6

A new plot with three bars
representing the absolute abundance of Proteobacteria in each of the samples.
Each of the colors represents a Genus. Because we see relative
abundances, all the bars have the same height.

Other Resources

Figure 1

Pie chart showing the relative abundance at the phylum level, and the legend with the phylum names, read count, and percentages.

Figure 2

Pie chart showing the relative abundance of general functional categories, and the legend with the category names, read count, and percentages.

Figure 3

Pie chart showing the relative abundance of specific functional categories, and the legend with the category names, read count, and percentages.