Software Open Access

pr2database/pr2database: PR2 version 4.14.0

Daniel Vaulot

Main changes A single SSU database

From version 4.14.0, a single SSU database is provided which contains sequences for:

  • 18S rRNA from nuclear and nucleomorph
  • 16S rRNA from plastid, apicoplast, chromatophore, mitochondrion
  • 16S rRNA from a small selection of bacteria

The rationale is that the database can now be used to detect bacterial sequences that are amplified with either 18S rRNA or "universal" primers. These sequences can be further assigned with Silva or GTDB.

In order to allow correct assignation with software such as DECIPHER (IDTax) for organelle, the taxonomy is appended with 4 letters corresponding to the organelle

Organelle Taxonomy suffix nucleus nucleomorph :nucl plastid :plas apicoplast :apic chromatophore :chrom mitochondrion :mito Major groups for which taxonomy has been updated

  • Apicomplexa
  • Labyrinthulids
  • Radiolaria
  • Foraminifera
  • Radiolaria

Quarantined sequences (makes sense in these COVID times...)

We are introducing sequences that have been quarantined. These sequences have been reassigned with DECIPHER IDTax but the bootstrap values were low or they have been flagged as problematic by DECIPHER during the LeaningTax phase. These sequences are not provided with the current version but will be added in the future avec verification of their taxonomic assignement.

List of sequences added or updated

  • Added: 9,710
  • Updated: 25,298
  • Quarantined: 614
  • Removed: 462

Contributors

Taxonomic groups updated

  • Alveolata - Javier del Campo

    • Apicomplexa
      • 9955 sequences updated or added.
      • 303 sequences quarantined needing phylogeny assignment.
      • 583 taxonomy entries revised
  • Chlorophyta

    • Ostrobium : 2 sequences added
  • Stramenopiles

    • Labyrinthulids - Javier del Campo
      • Sequences updated or added: 1280
      • Sequences quarantined: 133
      • Taxonomy fully revised: 69 species
    • Cafeteria - Alex Schoenle following Schoenle et al. (2020)
      • sequences updated: 30
      • sequences added: 31
      • script
    • Cafileria marina: 8 sequences added
  • Haptophyta

    • Rappephyceae - Kawachi et al. (2021)
      • Rappemonads moved into Rappephyceae
      • 4 sequences added
  • Radiolaria - Miguel Sandin

  • Foraminifera - Raphaël Morard

    • Total number of validated sequences: 3839
    • Taxonomy updated or added: 315 entries
    • Sequences added: 1149
    • Sequences updated (including new sequences): 2164
    • script to upload to PR2
  • Excavata - Javier del Campo and EUkref team

    • EUkref team: Martin Kolisko, Olga Flegontova, Anna Karnkowska, Gordon Lax, Julia M. Maritz, Tomáš Pánek, Petr Táborský, Jane M. Carlton, Ivan Cepička6, Aleš Horák, Julius Lukeš, Alastair G.B. Simpson, and Vera Tai
    • Total number of validated sequences: 6265
    • Taxa updated or added: 735
    • Sequences added from GenBank: 75
    • Sequences updated (existing + new): 1347 + 2875
    • Sequences quarantined: 104
    • Metadata updated with eukref fields: 6091
  • 16S plastid sequences (Ostreobium and Apicomplexa)- Javier del Campo

    • 87 sequences reassigned
    • 482 sequences added
  • Bacteria, Archaea - Daniel Vaulot

    • Sequences added: 7945
    • Taxa added: 1571
    • These sequences originate from Silva seed alignment v. 132 as found on the mothur site
    • They are used as "control" sequences when assigning metabarcodes, especially for primers that are either "universal", i.e. amplify both 18S and 16S or that are "imperfect", in the sense that they also amplify a small fraction of the 16S sequences.

Sequences uploaded but not yet annotated

  • 8763 18S rRNA sequences added from GenBank - 2020-05-27 to 2021-03-23 - Script

Sequences removed

  • Potential chimera in Radiolaria: 343 (M. Sandin)
  • Bad sequences: 6 (F. Mahé)
  • chimeras: 95 (A M Fiore-Donno)
  • ITS: 20 (A M Fiore-Donno)
  • Badly assigned: 6 (A M Fiore-Donno)

Sequences modified (F. Mahé)

  • complemented: 26
  • reverse complemented: 114 + 189 Script

Metadata added

  • A large number of metadata have been downloaded from GenBank such as GebNak taxonomy and references associated with sequences.

References Stramenopiles

  • Schoenle, A., Hohlfeld, M., Rosse, M., Filz, P., Wylezich, C., Nitsche, F., & Arndt, H. (2020). Global comparison of bicosoecid Cafeteria-like flagellates from the deep ocean and surface waters, with reorganization of the family Cafeteriaceae. European Journal of Protistology, 73, 125665. https://doi.org/10.1016/j.ejop.2019.125665.
  • Jirsová, D., Füssy, Z., Richtová, J., Gruber, A., & Oborník, M. (2019). Morphology, ultrastructure, and mitochondrial genome of the marine non-photosynthetic bicosoecid Cafileria marina gen. Et sp. Nov. Microorganisms, 7(8), 240. https://doi.org/10.3390/microorganisms7080240
  • Pan, J., del Campo, J., & Keeling, P. J. (2017). Reference Tree and Environmental Sequence Diversity of Labyrinthulomycetes. Journal of Eukaryotic Microbiology, 64(1), 88–96. https://doi.org/10.1111/jeu.12342

Haptophyta

  • Kawachi, M., Nakayama, T., Kayama, M., Nomura, M., Miyashita, H., Bojo, O., Rhodes, L., Sym, S., Pienaar, R. N., Probert, I., Inouye, I., & Kamikawa, R. (2021). Rappemonads are haptophyte phytoplankton. Current Biology. https://doi.org/10.1016/j.cub.2021.03.012

Radiolaria

  • Adl, S. M., Bass, D., Lane, C. E., Lukeš, J., Schoch, C. L., Smirnov, A., et al. 2019. Revisions to the classification, nomenclature, and diversity of eukaryotes. J. Eukaryot. Microbiol. 66, 4–119. doi:10.1111/jeu.12691
  • Biard, T., Bigeard, E., Audic, S., Poulain, J., Stemmann, L., Not, F., 2017. Biogeography and diversity of Collodaria (Radiolaria) in the global ocean. Nat. Publ. Gr. 1–42. doi:10.1038/ismej.2017.12
  • Cavalier-Smith, T., Chao, E.E., Lewis, R., 2018. Multigene phylogeny and cell evolution of chromist infrakingdom Rhizaria: contrasting cell organisation of sister phyla Cercozoa and Retaria. Protoplasma 255, 1517–1574. doi:10.1007/s00709-018-1241-1
  • Capella-Gutiérrez, S., Silla-Martínez, J.M., Gabaldón, T., 2009. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973. doi:10.1093/bioinformatics/btp348
  • Decelle, J., Suzuki, N., Mahé, F., Vargas, C. De, Not, F., 2012b. Molecular Phylogeny and Morphological Evolution of the Acantharea (Radiolaria). Protist 163, 435–450. doi:10.1016/j.protis.2011.10.002
  • Gouy, M., Guindon, S., Gascuel, O., 2010. SeaView Version 4: A Multiplatform Graphical User Interface for Sequence Alignment and Phylogenetic Tree Building. Mol. Biol. Evol. 27, 221–224. doi:10.1093/molbev/msp259
  • Katoh, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 30, 772–780. doi:10.1093/molbev/mst010
  • Larsson, A. (2014). AliView: a fast and lightweight alignment viewer and editor for large data sets. Bioinformatics30(22): 3276-3278. http://dx.doi.org/10.1093/bioinformatics/btu531
  • Nakamura, Y., Sandin, M.M., Suzuki N., Somiya R., Tuji A., Not, F, 2020. Phylogenetic revision of the order Entactinaria - Paleozoic relict Radiolaria (Rhizaria, SAR). Protist 171. doi:10.1016/j.protis.2019.125712
  • Rambaut A (2016) FigTree version 1.4.3. http://tree.bio.ed.ac.uk/software/figtree/
    Rognes, T., Flouri, T., Nichols, B., Quince, C., Mahé, F., 2016. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584.doi:10.7717/peerj.2584
  • Sandin M.M., Biard T., Romac S., O'Dogherty L., Suzuki N., Not F., 2020. Morpho-molecular diversity and evolutionary analyses suggest hidden life styles in Spumellaria (Radiolaria) bioRxiv 2020.06.29.176917; doi: https://doi.org/10.1101/2020.06.29.176917
  • Sandin, M.M., Pillet, L., Biard, T., Poirier, C., Bigeard, E., Romac, S., Suzuki, N., Not, F., 2019. Time Calibrated Morpho-molecular Classification of Nassellaria (Radiolaria). Protist 170, 187–208. doi:10.1016/j.protis.2019.02.002
  • Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., Sahl, J.W., Stres, B., Thallinger, G.G., Van Horn, D.J., Weber, C.F., 2009. Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541.doi:10.1128/AEM.01541-09
  • Stamatakis, A., 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313. doi:10.1093/bioinformatics/btu033

Database structure

  • pr2_main
    • quarantined_version: sequences flagged as quarantined will need to be re-assigned latter.
  • pr2_metadata
    • gb_references: removed (empty)
    • gb_locus: removed (empty)
    • gb_division: addede - Three letter code for Genbank division (eg PLN, ENV...)

Metadata added

The following fields were populated from GenBank when the data were missing (413,230 records updated)

  • gb_taxonomy
  • gb_project
  • gb_authors, gb_publication, gb_journal
  • gb_sequence
  • gb_division
  • gb_date

Scripts

Scripts are just provided to show some of the procedures used to update the PR2 database. Do not try to run them, they will not work as they require access to the MySQL PR2 database.

Files (97.7 MB)
Name Size
pr2database/pr2database-v14.4.0.zip
md5:166c0516f970fb61d1b9bef068ee2fc8
97.7 MB Download
306
32
views
downloads
All versions This version
Views 30626
Downloads 321
Data volume 1.0 GB97.7 MB
Unique views 27622
Unique downloads 221

Share

Cite as