Summary and Schedule
Data Carpentry’s aim is to teach researchers basic concepts, skills, and tools for working with data so that they can get more done in less time, and with less pain. This workshop uses Data Carpentry’s approach to teach data management and analysis for genome mining research including: best practices for organization of bioinformatics projects and data, use of command-line utilities, use of command-line tools to analyze sequence quality, use of R studio and use of R libraries to compare diversity between samples, and connecting to and using cloud computing.
Prerequisitos
FIX ME
Data
This worksop uses data from experiment: Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan genome”, by Hervé Tettelin, Vega Masignani, Michael J. Cieslewicz, Claire M et al.
All of the data used in this workshop can be downloaded from: More information about this data is available on the Data
page.
| Setup Instructions | Download files required for the lesson | |
| Duration: 00h 00m | 1. Introduction to Genome Mining | What is Genome Mining? |
| Duration: 00h 10m | 2. Secondary metabolite biosynthetic gene cluster identification |
How can I annotate known BGC? Which kind of analysis antiSMASH can perform? Which file extension accepts antiSMASH? |
| Duration: 00h 40m | 3. Genome Mining Databases |
Where can I find experimentally validated BGCs? Where is information about all predicted BGCs? |
| Duration: 01h 05m | 4. BGC Similarity Networks | How can I measure similarity between BGCs? |
| Duration: 01h 50m | 5. Homologous BGC Clusterization |
How can I identify Gene Cluster Families? How can I predict the production of similar metabolites How can I clusterize BGCs into groups that produce similar metabolites? How can I compare the metabolic capability of different bacterial lineages? |
| Duration: 02h 40m | 6. Finding Variation on Genomic Vicinities |
How can I follow variation in genomic vicinities given a reference
BGC? Which gene families are the conserved part of a BGC family? |
| Duration: 03h 15m | 7. Evolutionary Genome Mining |
What is Evolutionary Genome Mining? Which kind of BGCs can EvoMining find? What do I need in order to run an evolutionary genome mining analysis? |
| Duration: 03h 55m | 8. GATOR-GC: Genomic Assessment Tool for Orthologous Regions and Gene Clusters |
What is GATOR-GC, and how does it differ from other BGC exploration
tools? How does GATOR-GC establish BGC boundaries using evolutionary principles? What types of biosynthetic diversity can GATOR-GC identify? What do I need to perform a targeted exploration using GATOR-GC? |
| Duration: 04h 55m | 9. Metabolomics workshop | How can I evaluate the similarity between MS spectra? |
| Duration: 05h 05m | 10. Other Resources | What else can I do? |
| Duration: 05h 25m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Oveview
This workshop is designed to be run on pre-imaged Amazon Web Services (AWS) instances. With the exception of a spreadsheet program, all of the command line software and data used in the workshop are hosted on an Amazon Machine Image (AMI). Please follow the instructions below to prepare your computer for the workshop:
Required software
Description
Windows
MacOS X
- The default shell is usually Bash and there is usually no need to install anything. To see if your default shell is Bash type echo $SHELL in a terminal and press the Enter key. If the message printed does not end with ‘/bash’ then your default is something else and you can run Bash by typing bash.
Option A: Using the lessons with Amazon Web Services (AWS)
If you are signed up to take a Genome Mining Data Carpentry Workshop, you do not need to worry about setting up an AMI instance. The Carpentries staff will create an instance for you and this will be provided to you at no cost. This is true for both self-organized and centrally-organized workshops. Your Instructor will provide instructions for connecting to the AMI instance at the workshop.
If you would like to work through these lessons independently,
outside of a workshop, you will need to start your own AMI instance.
Follow these instructions
on creating an Amazon instance. Use the AMI
ami-0e7fb76a881ab5e09 (Metagenomics - 18 March (The
Carpentries Incubator)) listed on the Community AMIs page. Please note
that you must set your location as N. Virginia in order to
access this community AMI. You can change your location in the upper
right corner of the main AWS menu bar. The cost of using this AMI for a
few days, with the t2.medium instance type is very low (about USD $1.50
per user, per day). Data Carpentry has no control over AWS
pricing structure and provides this cost estimate with no guarantees.
Please read AWS documentation on pricing for up-to-date information.
If you’re an Instructor or Maintainer or want to contribute to these lessons, please get in touch with us team@carpentries.org and we will start instances for you.
After the basic software of the genomic instace is setup you need to add the metagenomics environment. Here is a link to specifications file with the exact versions of each tool in this environment. You can use the spec file as follows:
This environment can be modified by adding or deleting tools in a
file metagenomics.yml, original metagenomics.yml file had
the following content:
OUTPUT
name: GenomeMining
dependencies:
- antismash=6.0.0
- deepbgc=0.1.29
- BiG-SLiCE=1.1.0
Then you can create your own metagenomics conda environment using the metagenomics.yml file.
More information about how to use environments and spec file is available at conda documentation
corason conda
git clone https://github.com/miguel-mx/corason-conda.gitcd corason condaconda env create -f corason.yml --prefix="/opt/anaconda3/envs/corason"