Summary and Schedule
A lot of metagenomics analysis is done using command-line tools for three reasons:
You will often be working with a large number of files, and working through the command line rather than through a graphical user interface (GUI) allows you to automate repetitive tasks.
You will often need more computing power than is available on your personal computer, and connecting to and interacting with remote computers requires a command-line interface.
You will often need to customize your analyses, and command-line tools often enable more customization than the corresponding GUI tools (if a GUI tool even exists).
In a previous lesson, you learned how to use the bash shell to
interact with your computer through a command-line interface. In this
lesson, you will be applying this new knowledge to carry out a common
metagenomics workflow - identifying Operational Taxonomic Unities (OTUs)
among samples taken from two metagenomes within a location. We will be
starting with a set of sequenced reads (.fastq
files),
perform some quality control steps, assemble those reads into contigs
and finishes by identifying and visualizing the OTUs among these
samples.
As you progress through this lesson, keep in mind that even if you aren’t going to be doing this same workflow in your research, you will be learning some very important lessons about using command-line bioinformatics tools. What you are going to learn here will enable you to use a variety of bioinformatics tools with confidence and greatly enhance your research efficiency and productivity.
Prerequisites
This lesson assumes a working understanding of the bash shell. If you haven’t already completed the Introduction to the Command Line for Metagenomics lesson, and you aren’t familiar with the bash shell; please review those materials before starting this lesson.
This lesson also assumes some familiarity with biological concepts, including the structure of DNA, nucleotide abbreviations, and the concepts microbiome and taxonomy.
This lesson uses data hosted on an Amazon Machine Instance (AMI). Workshop participants will be given information on how to log in to the AMI during the workshop. Learners using these materials for the self-directed study must set up their own AMI. Information on setting up an AMI and accessing the required data is provided on the Metagenomics Workshop setup page.
Things You Need To Know
- Stay calm, and don’t panic.
- Everything is going to be fine.
- We are learning together.
This is the fourth lesson of the Metagenomics Workshop, comprising four lessons in total.
Citation
Claudia Zirión Martínez; Diego Garfias Gallegos; Tania Vanessa Arellano Fernández; Aarón Espinosa Jaime; Edder D Bustos Díaz; José Abel Lovaco Flores; Luis Gerardo Tejero Gómez; J Abraham Avelar Rivas; Nelly Sélem (March , 2023) A Data Carpentry- Style Metagenomics Workshop
Lesson Reference
Episodes 2. Assessing Read Quality, and 3. Trimming and Filtering are adapted from the corresponding episodes in the Data Wrangling and Processing for Genomics lesson that is Copyright (c) The Carpentries. Materials licensed under CC-BY 4.0by the authors: Josh Herr, Ming Tang, Lex Nederbragt, Fotis Psomopoulos (eds): Version 2017.11.0, November 2017. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Starting a Metagenomics Project |
How do you plan a metagenomics experiment? How does a metagenomics project look like? |
Duration: 00h 30m | 2. Assessing Read Quality | How can I describe the quality of my data? |
Duration: 01h 20m | 3. Trimming and Filtering | How can we get rid of sequence data that does not meet our quality standards? |
Duration: 02h 15m | 4. Metagenome Assembly |
Why should genomic data be assembled? What is the difference between reads and contigs? How can we assemble a metagenome? |
Duration: 02h 55m | 5. Metagenome Binning | How can we obtain the original genomes from a metagenome? |
Duration: 03h 55m | 6. Taxonomic Assignment | How can I know to which taxa my sequences belong? |
Duration: 04h 40m | 7. Exploring Taxonomy with R | How can I use my taxonomic assignment results to analyze? |
Duration: 05h 05m | 8. Diversity Tackled With R |
How can we measure diversity? How can I use R to analyze diversity? |
Duration: 05h 55m | 9. Taxonomic Analysis with R |
How can we know which taxa are in our samples? How can we compare depth-contrasting samples? How can we manipulate our data to deliver a message? |
Duration: 06h 55m | 10. Other Resources | Where are other metagenomic resources? |
Duration: 07h 10m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Link to workshop’s setup
This workshop is designed to be run on pre-imaged Amazon Web Services (AWS) instances. For information about how to use the workshop materials, see the setup instructions on the main workshop page.