Published April 27, 2023 | Version v1
Journal article Open

Training Data for "Taxonomic Profiling of Metagenomic Data" tutorial


Metagenomics involves the extraction, sequencing and analysis of combined genomic DNA from entire microbiome samples. It includes then DNA from many different organisms, with different taxonomic background.

The investigation of microorganisms present at a specific site and their relative abundance is also called "microbial community profiling". Basic for this is to find out which microorganisms are present in the sample. This can be achieved for all known microbes, where the DNA sequence specific for a certain species is known. For that we try to identify the taxon to which each individual reads belong. Several approaches exist to profile a community.

In this tutorial, we will learn some theory taxonomic profiling, how to run taxonomic profiling tools and visualize their outputs. The dataset we will use for this tutorial comes from an oasis in the mexican desert called Cuatro Ciénegas (Okie et al. 2020). The researchers were interested in genomic traits that affect the rates and costs of biochemical information processing within cells. They performed a whole-ecosystem experiment, thus fertilizing the pond to achieve nutrient enriched conditions.

Here we use 2 datasets:

  • JP4D: a microbiome sample collected from the Lagunita Fertilized Pond
  • JC1A: a control samples from a control mesocosm.

The datafiles are named according to the first four characters of the filenames. It is a collection of paired-end data with R1 being the forward reads and R2 being the reverse reads. Additionally, the reads have been trimmed using cutadapt


Files (424.0 MB)

Name Size Download all
21.5 MB Download
20.5 MB Download
182.7 MB Download
199.3 MB Download
620 Bytes Download