Published July 20, 2024 | Version v3
Dataset Open

Protistan metabolism across the western North Atlantic Ocean revealed through autonomous underwater profiling

  • 1. University of Georgia Skidaway Institute of Oceanography
  • 2. Woods Hole Oceanographic Institution

Description

Metatranscriptomic assembly, predicted open reading frames, counts, and annotation files from seawater samples obtained in the western North Atlantic Ocean. GitHub notebooks are located here: https://github.com/cnatalie/BATS. Assembly was created using the eukrhythmic pipeline: https://github.com/AlexanderLabWHOI/eukrhythmic

  • merged_merged.fasta.gz = Final assembly, merged across 44 metatranscriptomes using 4 different assemblers
  • merged.fasta.transdecoder.pep.zip = Open reading frames of final assembly, predicted by Transdecoder 
  • merged.fasta.transdecoder-estimated-taxonomy.out.zip = EUKulele-derived taxonomic annotations of ORFs using a combined EukProt, PhyloDB, and RefSeq reference database
  • newtaxa.eukprot.merged.fasta.transdecoder-estimated-taxonomy.out.zip = similar to above, but manually curated mid-level taxonomy for supergroups of interest
  • eggnog.emapper.annotations.zip = eggnog-mapper annotations of ORFs
  • table.tab.zip = counts associated with ORFs (merged.fasta.transdecoder.pep) generated with Salmon
  • TPM_table.tab.zip = community-wide TPM (normalized) counts associated with ORFs (merged.fasta.transdecoder.pep) generated with Salmon
  • copiesperL_ORFs_FactorIncluded.csv.zip = raw counts associated with ORFs (merged.fasta.transdecoder.pep) converted to copies per L taking into account spiked-in RNA standard concentration (copies), standard reads mapped, volume of seawater filtered, and dilution factor used in library preparation
  • assembly.table.tab.zip = counts associated with final assembly (merged_merged.fasta) generated with Salmon
  • SamplesViewReportCLIO_AE1913merged_trans210506_updated220606exclusive.zip = Exclusive spectral counts associated with ORFs (merged.fasta.transdecoder.pep). Peptide-spectrum matches were performed using Sequest algorithm within IseNode Proteome Discoverer 2.2.0.388 (Thermo Fisher Scientific). Scaffold 5.1.2 (Proteome Software)  was used for protein grouping and exclusive spectral counting. Note, the (+x) data has been removed from protein names, which indicates whether (and how many) proteins sharing peptides were designated into the same protein group. 
  • cds.length2.tab.zip = Length of proteins (ORFs) in nucleotide base pairs
  • CTD.zip = CTD files from cruise AE1913

Files

assembly.table.tab.zip

Files (10.3 GB)

Name Size Download all
md5:c27a00ca0b4cb3dc379d174645d4b181
780.6 MB Preview Download
md5:3ffb6607b060ebcc3ddb8b6d22618509
283.5 MB Preview Download
md5:6b0dfe31436521ea8e01fb0ec657e4ce
42.3 MB Preview Download
md5:2aacd270be6d96dcfa3fa98d5543dcbd
1.0 GB Preview Download
md5:d1a10877946aaa555195fc3dd7f234c6
110.4 MB Preview Download
md5:89da0964cde2df436d81b943650f233a
1.7 GB Preview Download
md5:4734775e54d4c9a39c4054e342375f0d
5.9 GB Download
md5:5b142d883a9aedb0a5d77224aeb994db
111.0 MB Preview Download
md5:99583b3a12eaf14fa9c95483369add97
1.7 MB Preview Download
md5:bcb22c2c5c480443831f374392fece7d
199.6 MB Preview Download
md5:76d1f21fdf7a1142f3355455cdb9cc5b
145.6 MB Preview Download