Supplementary information for Salazar et al. (2021): mTAGs: taxonomic profiling using degenerate consensus reference sequences of ribosomal RNA genes

Salazar, Guillem; Ruscheweyh, Hans-Joachim; Sunagawa, Shinichi

doi:10.5281/zenodo.4751841

Published December 18, 2020 | Version v2

Dataset Open

Supplementary information for Salazar et al. (2021): mTAGs: taxonomic profiling using degenerate consensus reference sequences of ribosomal RNA genes

1. ETH Zurich

Supplementary data and code in Salazar et al. (2021). mTAGs: taxonomic profiling using degenerate consensus reference sequences of ribosomal RNA genes.

The code used to reproduced the analyses can be found in the code folder:

All R scripts reproducing the external benchmarking based on data released by Almeida et al. (2018) are in ./code/almeida_stats. The scripts are numbered sequentially according to their order of use.
Scripts computing the metrics used to evaluate the classification and profiling performance based on the internal benchmarking are found in ./code/internal_benchmarking. The scripts are numbered sequentially according to their order of use.
All R scripts reproducing the metagenomes-based benchmarking are in ./code/cami The scripts are numbered sequentially according to their order of use.
A single script producing the figures used in the publication is found in ./code/plots_pub.R.

Supplementary data can be found in the data folder:

The metrics used to evaluate the classification and profiling performance based on both the internal and external benchmarking: ./data/processed.
- The metrics used in the internal benchmarking are found in ./data/processed/internal_benchmarking/all_stats_long.tsv as a tab-delimited file.
- The metrics used in the external benchmarking are found in (as a tab-delimited file):
  - Metrics based on Almeida et al. (2018) data: ./data/processed/almeida_stats/stats_almeida.tsv.
  - Metrics based on mTAGs (computed from profiles): /data/processed/almeida_stats/stats_mtags.tsv.
  - Metrics based on mTAGs (computed from bins): ./data/processed/almeida_stats/stats_mtags_from bins.tsv.
- The metrics used in the metagenomes-based benchmarking are found in ./data/processed/cami as a tab-delimited files.
The reference databases derived from the SILVA SSU database (versions 128 and 138): ./data/raw/silva. For both of them a sequence file (*.fasta), a file containing the cluster members (*.clstr) and a file containing the taxonomic annotation (*taxmap) are provided.
The taxonomic profiles based on mTAGs for the external benchmarking dataset are found in ./data/raw/mtags_almeida. The format is the output format of the mTAGs tool (https://github.com/SushiLab/mTAGs). Data is provided for both databases (*cons: using the degenerate consensus sequence; *repr: using the longest member).
The simulated dataset used for the internal benchmarking is found in ./data/raw/silva_138_simulate_analysis
- Simulated reads of 100, 150 and 250 bp from the SILVA SSU database 138 are found as pairs of FASTA files.
- The true annotation and the annotation predicted by mTAGs of the simulated reads is found in ./data/raw/silva_138_simulate_analysis/annotation based on both databases (cons: using the degenerate consensus sequence; repr: using the longest member).
The simulated dataset used for the metagenomes-based benchmarking is found in ./data/raw/cami

Files

mTAGs_SuppInfo.zip

Files (18.2 GB)

Name	Size	Download all
mTAGs_SuppInfo.zip md5:c0e8f2cca3c9002ae2ac28a39be7a3db	18.2 GB	Preview Download

	All versions	This version
Views	296	195
Downloads	99	93
Data volume	59.7 TB	59.6 TB

Supplementary information for Salazar et al. (2021): mTAGs: taxonomic profiling using degenerate consensus reference sequences of ribosomal RNA genes

Creators

Description

Files

mTAGs_SuppInfo.zip

Files (18.2 GB)