Published November 15, 2023 | Version 1.0.0
Dataset Open

18S V9 metabarcoding reference databases and naive-bayes classifier

  • 1. ROR icon Northern Gulf Institute
  • 2. ROR icon National Oceanic and Atmospheric Administration
  • 3. ROR icon Mississippi State University

Description

18S metabarcoding databases and naive-bayes classifiers specific to the V9 region. Built from the PR2 database using Qiime2 (version 2023.2). Includes a naive-bayes classifier for use with Qiime2. Sequences were dereplicated with Rescript --p-mode 'uniq' , retaining identical sequence records that have differing taxonomies.

Primers used:

EMP 18S 1391f: GTACACACCGCCCGTC

EMP 18S EukBr: TGATCCTTCTGCAGGTTCACCTAC

Stats

19,470 unique sequences

39,170 total sequences

11,748 unique taxa 

Note: there were 221,085 sequences in the original PR2 database. Many were filtered out due to the in-silico extraction with our V9 primers.

File Descriptions

Files in bold are recommended for taxonomic classification.

Create naive-bayes classifier for 18S PR2 database.md:  Markdown with code used to generate databases |

pr2_v5.0.0_SSU_18S-V9_uniq-classifier.qza: Unweighted naive-bayes classifier for 18S V9 (primers 1391f, EukBr), extracted from PR2 v5.0.1, dereplicated, generated by qiime2-2023.2 |

pr2_version_5.0.0_SSU_18S-V9_uniq_seqs.qza: Sequences for 18S V9 (primers 1391f, EukBr), extracted from PR2 v5.0.1, dereplicated, generated by qiime2-2023.2 |

pr2_version_5.0.0_SSU_18S-V9_uniq_tax.qza: Taxa for pr2_version_5.0.0_SSU_18S-V9_uniq_seqs.qza (dereplicated) |

pr2_version_5.0.0_SSU_18S-V9_seqs.qza: Sequences for 18S V9 (primers 1391f, EukBr), extracted from PR2 v5.0.1, NOT dereplicated, generated by qiime2-2023.2 |

pr2_version_5.0.0_SSU_18S-V9_tax.qza: Taxa for pr2_version_5.0.0_SSU_18S-V9_seqs.qza (NOT dereplicated) 

pr2_version_5.0.0_SSU_mothur.fasta: SSU sequences downloaded from PR2 v 5.0.1  |

pr2_version_5.0.0_SSU_mothur.tax: SSU taxa downloaded from PR2 v5.0.1 |

pr2_version_5.0.0_taxonomy.xlsx: Detailed taxonomy downloaded from PR2 v5.0.1 |

Notes

"This work was supported by award NA21OAR4320190 to the Northern Gulf Institute from NOAA's Office of Oceanic and Atmospheric Research, U.S. Department of Commerce."

Files

Create naive-bayes classifier for 18S PR2 database.md

Files (367.1 MB)

Name Size Download all
md5:b6fad0062456a6e3d5897ae57aa27f1b
2.7 kB Preview Download
md5:1d19e73080bc44715adffd523fe0a769
11.7 MB Download
md5:b86d3c535772b464cbb2511b2c1b7c20
915.5 kB Download
md5:391ca170e77cc485fd39dadcf79eed17
1.7 MB Download
md5:636299fc7c77dc397738d65525fa65af
1.5 MB Download
md5:7777ecf13f88f51c2576f435b894d540
1.5 MB Download
md5:458948f0f6697ef7cefe770b0dd1b27d
316.5 MB Download
md5:ed5b9ad47ba770e0eeca87e28b605ea7
30.1 MB Download
md5:45e76eb38065400699fb5eb2cba564db
3.3 MB Download

Additional details

References

  • Guillou, L., Bachar, D., Audic, S., Bass, D., Berney, C., Bittner, L., Boutte, C. et al. 2013. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy. Nucleic Acids Res. 41:D597–604.
  • Bokulich NA, Kaehler BD, Rideout JR, Dillon M, Bolyen E, Knight R, Huttley GA, Caporaso JG. 2018. Optimizing taxonomic classification of marker gene sequences. Microbiome 6(1): 90. doi: https://doi.org/10.1186/s40168-018-0470-z.
  • Robeson MS 2nd, O'Rourke DR, Kaehler BD, Ziemski M, Dillon MR, Foster JT, Bokulich NA. RESCRIPt: Reproducible sequence taxonomy reference database management. PLoS Comput Biol. 2021 Nov 8;17(11):e1009581. doi: 10.1371/journal.pcbi.1009581