pr2database/pr2database: PR2 version 4.12.0
Description
Date : 2019-08-08
Contributors- Javier del Campo - U of Miami - Apicomplexa
- Ramon Massana - ICM, Barcelona - Stramenopiles
- Daniel Vaulot - CNRS Roscoff - Metadata geolocalisation - based on a idea of Margaret Brisbin
- Johan Decelle - CNRS Grenoble - Plastid sequences (PhytoRef database)
- Chetan Gaonkar - Naples - Chaetoceros
- Table pr2_main - add fields
- gene - 18S_RNA, 16S_RNA
- organelle - nucleus, plastid, mitochondria, nucleomorph, apicoplast (left empty for cyanobacteria)
- Table pr2_metadata - add or modify fields
- gb_organelle - import the corresponding gb field
- pr2_sequence_origin - add other possibilities such as genome and metagenome
- pr2_continent, pr2_country, pr2_country_lat, pr2_country_lon - geographical origin extracted from gb_country field
- pr2_location, pr2_location_lat, pr2_location_lon - geographical origin extracted from gb_country field.
- pr2_ocean, pr2_sea, pr2_sea_lat, pr2_sea_lon - extracted from gb_country field and gb_isolation_source
- Table pr2_sequences - add fields
- sequence_hash - hash value of the sequence (using R function
digest::sha1
, see https://cran.r-project.org/web/packages/openssl/vignettes/crypto_hashing.html)
- sequence_hash - hash value of the sequence (using R function
- Table pr2_taxonomy - add fields
- taxon_trophic_mode - detailed trophic mode (e.g. "C-fixation constitutive; Mixotroph")
- 1692 sequences that had more than 2 consecutive "NN" have been removed
- We are now providing separate files for 18S nuclear and 16S plastid sequences for UTAX, dada2, fasta and mothur/Qiime formats.
- The merged file contains both 18S and 16S sequences.
- The metadata file is not provided any more since metadata can be found in the merged file.
- The whole pr2 database is also provided as an SQLite file. It contains the different tables making up pr2.
- Apicomplexa
- Taxonomy completely revised following del Campo et al. (2019)
- New sequences: 2619
- Updated sequences: 5889+239
- Removed sequences: 89
- Stramenopiles - Higher ranks changed according to Massana et al. 2014, Derelle et al 2016, and Adl et al. 2019 compiled by R. Massana.
- Diatoms - Chaetoceros - 196 new sequences have been added from Gaonkar et al. (2019) with help from B. Edvardsen
- Chlorophyta
- Mamiellophyceae - Micromonas clades have been updated according to Tragin and Vaulot 2019.
- Prasinophytes clade IX - Separation between clades IXA and IXB removed waiting for analysis.
- Cryptophyceae - Cryptomonadales moved from family to order.
- Cercozoa - Class Chlorarachniophyceae replaces Filosa-Chlorarachnea
Data originate from the PhytoRef database (Decelle et al. 2015). Taxonomy has been harmonized with the PR2 taxonomy framework. In particular going from 12 levels to 8 taxonomy levels. This integration of plastid sequences should be helpful to researchers that get metabarcodes for both 16S and 18S rRNA.
- 16S plastid sequences added: 6049
- 16S cyanobacteria sequence added: 42
Sequence geo-localisation : Following up on the very good post of Margaret Brisbin, the geoname server (http://www.geonames.org/export/geonames-search.html) and the fuzzywuzzy Python library has been used to provide information about sequence location origin. Country and/or ocean are now provided for 90,788 GenBank entries with countries/ocean coordinates.
References- Adl, S.M., Bass, D., Lane, C.E., Lukeš, J., Schoch, C.L., Smirnov, A., Agatha, S. et al. 2019. Revisions to the Classification, Nomenclature, and Diversity of Eukaryotes. J. Eukaryot. Microbiol. 66:4–119.
- Decelle, J., Romac, S., Stern, R.F., Bendif, E.M., Zingone, A., Audic, S., Guiry, M.D. et al. 2015. PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy. Mol. Ecol. Resour. 15:1435–1445.
- Derelle, R., López-García, P., Timpano, H. & Moreira, D. 2016. A Phylogenomic Framework to Study the Diversity and Evolution of Stramenopiles (=Heterokonts). Mol. Biol. Evol. 33:2890–8.
- Gaonkar, C.C., Piredda, R., Minucci, C., Mann, D.G., Montresor, M., Sarno, D. & Kooistra, W.H.C.F. 2018. Annotated 18S and 28S rDNA reference sequences of taxa in the planktonic diatom family Chaetocerotaceae. PLoS One. 13:e0208929.
- Massana, R., del Campo, J., Sieracki, M.E., Audic, S. & Logares, R. 2014. Exploring the uncultured microeukaryote majority in the oceans: reevaluation of ribogroups within stramenopiles. ISME J. 8:854–66.
- del Campo J., Heger T., Rodríguez-Martínez R., Worden AZ., Richards TA., Massana R., Keeling PJ. 2018. A new framework for the study of apicomplexan diversity across environments. bioRxiv 1. DOI: 10.1101/494880.
Files
pr2database/pr2database-v4.12.0.zip
Files
(37.1 MB)
Name | Size | Download all |
---|---|---|
md5:f687905525a931f7d7be7a4267544cf9
|
37.1 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/pr2database/pr2database/tree/v4.12.0 (URL)