Published December 18, 2020
| Version v2
Dataset
Open
Supplementary information for Salazar et al. (2021): mTAGs: taxonomic profiling using degenerate consensus reference sequences of ribosomal RNA genes
Description
Supplementary data and code in Salazar et al. (2021). mTAGs: taxonomic profiling using degenerate consensus reference sequences of ribosomal RNA genes.
The code used to reproduced the analyses can be found in the code folder:
- All R scripts reproducing the external benchmarking based on data released by Almeida et al. (2018) are in ./code/almeida_stats. The scripts are numbered sequentially according to their order of use.
- Scripts computing the metrics used to evaluate the classification and profiling performance based on the internal benchmarking are found in ./code/internal_benchmarking. The scripts are numbered sequentially according to their order of use.
- All R scripts reproducing the metagenomes-based benchmarking are in ./code/cami The scripts are numbered sequentially according to their order of use.
- A single script producing the figures used in the publication is found in ./code/plots_pub.R.
Supplementary data can be found in the data folder:
- The metrics used to evaluate the classification and profiling performance based on both the internal and external benchmarking: ./data/processed.
- The metrics used in the internal benchmarking are found in ./data/processed/internal_benchmarking/all_stats_long.tsv as a tab-delimited file.
- The metrics used in the external benchmarking are found in (as a tab-delimited file):
- Metrics based on Almeida et al. (2018) data: ./data/processed/almeida_stats/stats_almeida.tsv.
- Metrics based on mTAGs (computed from profiles): /data/processed/almeida_stats/stats_mtags.tsv.
- Metrics based on mTAGs (computed from bins): ./data/processed/almeida_stats/stats_mtags_from bins.tsv.
- The metrics used in the metagenomes-based benchmarking are found in ./data/processed/cami as a tab-delimited files.
- The reference databases derived from the SILVA SSU database (versions 128 and 138): ./data/raw/silva. For both of them a sequence file (*.fasta), a file containing the cluster members (*.clstr) and a file containing the taxonomic annotation (*taxmap) are provided.
- The taxonomic profiles based on mTAGs for the external benchmarking dataset are found in ./data/raw/mtags_almeida. The format is the output format of the mTAGs tool (https://github.com/SushiLab/mTAGs). Data is provided for both databases (*cons: using the degenerate consensus sequence; *repr: using the longest member).
- The simulated dataset used for the internal benchmarking is found in ./data/raw/silva_138_simulate_analysis
- Simulated reads of 100, 150 and 250 bp from the SILVA SSU database 138 are found as pairs of FASTA files.
- The true annotation and the annotation predicted by mTAGs of the simulated reads is found in ./data/raw/silva_138_simulate_analysis/annotation based on both databases (cons: using the degenerate consensus sequence; repr: using the longest member).
- The simulated dataset used for the metagenomes-based benchmarking is found in ./data/raw/cami
Files
mTAGs_SuppInfo.zip
Files
(18.2 GB)
Name | Size | Download all |
---|---|---|
md5:c0e8f2cca3c9002ae2ac28a39be7a3db
|
18.2 GB | Preview Download |