There is a newer version of this record available.

Dataset Open Access

DADA2 formatted 16S rRNA gene sequences for both bacteria & archaea

Ali Alishum


JSON-LD (schema.org) Export

{
  "inLanguage": {
    "alternateName": "aig", 
    "@type": "Language", 
    "name": "Antigua and Barbuda Creole English"
  }, 
  "description": "<p>These two combined bacterial and archaeal 16S rRNA gene sequence databases were collated from various sources and formatted for the purpose of using the &quot;assignTaxonomy&quot; command within the DADA2&nbsp;pipeline.</p>\n\n<ol>\n\t<li>RefSeq+RDP: This database contains 14676 bacterial &amp; 660 archaea full 16S rRNA gene sequences.&nbsp; It was compiled in 14/05/2018 from predominantly the NCBI RefSeq 16S rrna database (https://www.ncbi.nlm.nih.gov/refseq/targetedloci/16S_process/)&nbsp;and was supplemented with extra&nbsp;sequences from the&nbsp;RDP database (https://rdp.cme.msu.edu/misc/resources.jsp).</li>\n\t<li>Genome Taxonomy Database (GTDB): our dada2 formatted GTDB reference sequence set contains 20486 bacteria and 1073 archaea full 16S rRNA gene sequences. The database was downloaded from (<a href=\"https://t.co/bIjprJsYUh\">http://gtdb.ecogenomic.org/downloads</a>)&nbsp;on 20/11/2018.</li>\n</ol>\n\n<p>The formatting to DADA2 format of the databases was done using a locally written python 2.7 script. The script&nbsp;takes&nbsp;as input a taxonomy .txt file and a fasta&nbsp;file as provided by the core databases creators and then these two files are matched according to a unique sequence identifier available in both files. Then it&nbsp;outputs a fasta file with all 7 taxonomy ranks separated by &quot;;&quot; as required for DADA2 compatibility. Additionally,&nbsp;we have concatenated&nbsp;the unique&nbsp;sequence ID be it NCBI/RDP or GTDB&nbsp;ID to the species entry. We see this as an important QC step to highlight the issues/confidence associated with short read taxonomy assignment at the more finer rank levels.</p>", 
  "license": "http://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "Trend Laboratory, Curtin University of Technology", 
      "@id": "https://orcid.org/0000-0003-4498-2870", 
      "@type": "Person", 
      "name": "Ali Alishum"
    }
  ], 
  "url": "https://zenodo.org/record/3188334", 
  "datePublished": "2019-01-16", 
  "contributor": [
    {
      "affiliation": "Trend Laboratory, Curtin University of Technology", 
      "@id": "https://orcid.org/0000-0003-4498-2870", 
      "@type": "Person", 
      "name": "Ali Alishum"
    }, 
    {
      "affiliation": "Trend Laboratory, Curtin University of Technology", 
      "@id": "https://orcid.org/0000-0003-2217-3247", 
      "@type": "Person", 
      "name": "Seersholm Frederik"
    }, 
    {
      "affiliation": "Commonwealth Scientific and Industrial Research Organisation (CSIRO)", 
      "@id": "https://orcid.org/0000-0003-4028-9243", 
      "@type": "Person", 
      "name": "Greenfield Paul"
    }, 
    {
      "affiliation": "WA Human Microbiome Collaboration Centre (WAHMCC), Trend Laboratory, Curtin University", 
      "@id": "https://orcid.org/0000-0003-1591-5871", 
      "@type": "Person", 
      "name": "Christophersen Claus"
    }
  ], 
  "version": "Version 2", 
  "keywords": [
    "DADA2 format", 
    "16S rRNA", 
    "Bacterial", 
    "Archaeal"
  ], 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "contentUrl": "https://zenodo.org/api/files/6d9ca67c-56b2-446a-bc0d-28da6b6b18d7/GTDB_bac-arc_ssu_r86.fa.gz", 
      "@type": "DataDownload", 
      "fileFormat": "gz"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/6d9ca67c-56b2-446a-bc0d-28da6b6b18d7/RefSeq-RDP16S_v3_May2018.fa.gz", 
      "@type": "DataDownload", 
      "fileFormat": "gz"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/6d9ca67c-56b2-446a-bc0d-28da6b6b18d7/Version2AffectedSeqs.txt", 
      "@type": "DataDownload", 
      "fileFormat": "txt"
    }
  ], 
  "identifier": "https://doi.org/10.5281/zenodo.3188334", 
  "@id": "https://doi.org/10.5281/zenodo.3188334", 
  "@type": "Dataset", 
  "name": "DADA2 formatted 16S rRNA gene sequences for both bacteria & archaea"
}
4,894
2,419
views
downloads
All versions This version
Views 4,8941,236
Downloads 2,4191,833
Data volume 10.2 GB6.8 GB
Unique views 3,9661,032
Unique downloads 1,181854

Share

Cite as