Published July 10, 2024
| Version 220.0
Dataset
Open
RDP Classifier training files for 16S rRNA sequences from GTDB
Description
16S rRNA gene sequences from the Genome Taxonomy Database (GTDB release 220) were used to retrain the RDP Classifier (version 2.13). Two sets of training files are provided:
genus.zip- Genus levelspecies.zip- Species level
The code in prepare_files.R was used to prepare the GTDB sequence and taxonomy files for retraining the RDP Classifier. Notes:
- Steps to retrain the RDP Classifier are adapted from https://john-quensen.com/tutorials/training-the-rdp-classifier/
- Python scripts (lineage2taxTrain.py and addFullLineage.py) are available at https://github.com/rdpstaff/classifier/issues/18
- The first 1000 training sequences (
train_nodups_1000.fasta) are used for benchmarking the classification accuracy (see results at end ofprepare_files.R).