There is a newer version of this record available.

Dataset Open Access

DADA2 formatted 16S rRNA gene sequences for both bacteria & archaea

Ali Alishum


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/6d9ca67c-56b2-446a-bc0d-28da6b6b18d7/GTDB_bac-arc_ssu_r86.fa.gz"
      }, 
      "checksum": "md5:307c9d79fb7e167b696fad16f698eb57", 
      "bucket": "6d9ca67c-56b2-446a-bc0d-28da6b6b18d7", 
      "key": "GTDB_bac-arc_ssu_r86.fa.gz", 
      "type": "gz", 
      "size": 7114483
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/6d9ca67c-56b2-446a-bc0d-28da6b6b18d7/RefSeq-RDP16S_v3_May2018.fa.gz"
      }, 
      "checksum": "md5:3a1e9c128c937e5f0c67a86a4d64868f", 
      "bucket": "6d9ca67c-56b2-446a-bc0d-28da6b6b18d7", 
      "key": "RefSeq-RDP16S_v3_May2018.fa.gz", 
      "type": "gz", 
      "size": 4010165
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/6d9ca67c-56b2-446a-bc0d-28da6b6b18d7/Version2AffectedSeqs.txt"
      }, 
      "checksum": "md5:13cf96c338c6f56fbeab06f8cdf7e423", 
      "bucket": "6d9ca67c-56b2-446a-bc0d-28da6b6b18d7", 
      "key": "Version2AffectedSeqs.txt", 
      "type": "txt", 
      "size": 13051
    }
  ], 
  "owners": [
    57672
  ], 
  "doi": "10.5281/zenodo.3188334", 
  "stats": {
    "version_unique_downloads": 1181.0, 
    "unique_views": 1032.0, 
    "views": 1236.0, 
    "downloads": 1833.0, 
    "unique_downloads": 854.0, 
    "version_unique_views": 3966.0, 
    "volume": 6828608971.0, 
    "version_downloads": 2419.0, 
    "version_views": 4894.0, 
    "version_volume": 10188024782.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.3188334", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.2541238", 
    "bucket": "https://zenodo.org/api/files/6d9ca67c-56b2-446a-bc0d-28da6b6b18d7", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.2541238.svg", 
    "html": "https://zenodo.org/record/3188334", 
    "latest_html": "https://zenodo.org/record/3266798", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.3188334.svg", 
    "latest": "https://zenodo.org/api/records/3266798"
  }, 
  "conceptdoi": "10.5281/zenodo.2541238", 
  "created": "2019-05-23T07:51:23.619878+00:00", 
  "updated": "2019-07-03T08:32:01.170929+00:00", 
  "conceptrecid": "2541238", 
  "revision": 4, 
  "id": 3188334, 
  "metadata": {
    "access_right_category": "success", 
    "language": "aig", 
    "doi": "10.5281/zenodo.3188334", 
    "description": "<p>These two combined bacterial and archaeal 16S rRNA gene sequence databases were collated from various sources and formatted for the purpose of using the &quot;assignTaxonomy&quot; command within the DADA2&nbsp;pipeline.</p>\n\n<ol>\n\t<li>RefSeq+RDP: This database contains 14676 bacterial &amp; 660 archaea full 16S rRNA gene sequences.&nbsp; It was compiled in 14/05/2018 from predominantly the NCBI RefSeq 16S rrna database (https://www.ncbi.nlm.nih.gov/refseq/targetedloci/16S_process/)&nbsp;and was supplemented with extra&nbsp;sequences from the&nbsp;RDP database (https://rdp.cme.msu.edu/misc/resources.jsp).</li>\n\t<li>Genome Taxonomy Database (GTDB): our dada2 formatted GTDB reference sequence set contains 20486 bacteria and 1073 archaea full 16S rRNA gene sequences. The database was downloaded from (<a href=\"https://t.co/bIjprJsYUh\">http://gtdb.ecogenomic.org/downloads</a>)&nbsp;on 20/11/2018.</li>\n</ol>\n\n<p>The formatting to DADA2 format of the databases was done using a locally written python 2.7 script. The script&nbsp;takes&nbsp;as input a taxonomy .txt file and a fasta&nbsp;file as provided by the core databases creators and then these two files are matched according to a unique sequence identifier available in both files. Then it&nbsp;outputs a fasta file with all 7 taxonomy ranks separated by &quot;;&quot; as required for DADA2 compatibility. Additionally,&nbsp;we have concatenated&nbsp;the unique&nbsp;sequence ID be it NCBI/RDP or GTDB&nbsp;ID to the species entry. We see this as an important QC step to highlight the issues/confidence associated with short read taxonomy assignment at the more finer rank levels.</p>", 
    "contributors": [
      {
        "orcid": "0000-0003-4498-2870", 
        "affiliation": "Trend Laboratory, Curtin University of Technology", 
        "type": "ContactPerson", 
        "name": "Ali Alishum"
      }, 
      {
        "orcid": "0000-0003-2217-3247", 
        "affiliation": "Trend Laboratory, Curtin University of Technology", 
        "type": "Other", 
        "name": "Seersholm Frederik"
      }, 
      {
        "orcid": "0000-0003-4028-9243", 
        "affiliation": "Commonwealth Scientific and Industrial Research Organisation (CSIRO)", 
        "type": "DataCurator", 
        "name": "Greenfield Paul"
      }, 
      {
        "orcid": "0000-0003-1591-5871", 
        "affiliation": "WA Human Microbiome Collaboration Centre (WAHMCC), Trend Laboratory, Curtin University", 
        "type": "Researcher", 
        "name": "Christophersen Claus"
      }
    ], 
    "title": "DADA2 formatted 16S rRNA gene sequences for both bacteria & archaea", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "notes": "The RefSeq+RDP database was updated due to a quotation mark bug that was wrongly placed in front of some of the species names. A file with all the affected species names has been uploaded to review. This shouldn't affect any assignments but might have caused some issues reading into R.  \n\nPython script can be provided on request.", 
    "relations": {
      "version": [
        {
          "count": 3, 
          "index": 1, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "2541238"
          }, 
          "is_last": false, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "3266798"
          }
        }
      ]
    }, 
    "communities": [
      {
        "id": "zenodo"
      }
    ], 
    "version": "Version 2", 
    "references": [
      "Parks, D. H., et al. (2018). \"A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life.\" Nature Biotechnology.", 
      "Cole, J. R., Q. Wang, J. A. Fish, B. Chai, D. M. McGarrell, Y. Sun, C. T. Brown, A. Porras-Alfaro, C. R. Kuske, and J. M. Tiedje. 2014. Ribosomal Database Project: data and tools for high throughput rRNA analysis Nucl. Acids Res. 42(Database issue):D633-D642; doi: 10.1093/nar/gkt1244 [PMID: 24288368]", 
      "NCBI 16S RefSeq Nucleotide sequence records: https://www.ncbi.nlm.nih.gov/nuccore?term=33175%5BBioProject%5D+OR+33317%5BBioProject%5D"
    ], 
    "keywords": [
      "DADA2 format", 
      "16S rRNA", 
      "Bacterial", 
      "Archaeal"
    ], 
    "publication_date": "2019-01-16", 
    "creators": [
      {
        "orcid": "0000-0003-4498-2870", 
        "affiliation": "Trend Laboratory, Curtin University of Technology", 
        "name": "Ali Alishum"
      }
    ], 
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "relation": "isVersionOf", 
        "identifier": "10.5281/zenodo.2541238"
      }
    ]
  }
}
4,894
2,419
views
downloads
All versions This version
Views 4,8941,236
Downloads 2,4191,833
Data volume 10.2 GB6.8 GB
Unique views 3,9661,032
Unique downloads 1,181854

Share

Cite as