Supplemental data for the publication: "Resolving the microalgal gene landscape at the strain level: A novel hybrid transcriptome of Emiliania huxleyi CCMP3266"
- 1. Department of Plant and Environmental Sciences, Weizmann Institute of Science, Rehovot, Israel
- 2. Nancy and Stephen Grand Israel National Center for Personalized Medicine, Weizmann Institute of Science, Rehovot, Israel
Description
The dataset is part of a peer-reviewed publication that can be accessed here: https://doi.org/10.1128/aem.01418-21. The dataset includes the following files:
Data S1: Hybrid transcriptome of E. huxleyi CCMP3266 (FASTA format). The FASTA header matches the “TransID.SPAdes”, “GeneID” and “TransID.TSA” columns of Data S2.
Data S2: E. huxleyi CCMP3266 hybrid transcriptome annotation table (tsv format). Column 1 - 4: CCMP3266 gene and transcript IDs; column 5 - 6: gene and transcript length; column 7: longest transcript per gene; column 8: transcripts with protein-coding ORF; column 9 - 10: Illumina short-read counts; column 11 - 13: results of differential gene expression analysis; column 14: PacBio CCS long-read counts; column 15 - 27: blastx/blast2GO functional annotations.
Data S3: E. huxleyi CCMP1516 reference genes used for transcriptome completeness estimates (FASTA format). The FASTA file contains nucleotide sequences of E. huxleyi CCMP1516 core genes supported by expressed sequence tags (ESTs). The set of genes was compiled from data given by (Read et al., 2013; PubMed ID: 23760476).
Data S4: E. huxleyi CCMP3266 sGenome (FASTA format).
Data S5: E. huxleyi CCMP3266 sGenome gene annotation file (GFF3 format).
Data S6: E. huxleyi CCMP3266 novel genes that were absent from the CCMP1516 reference genome (tsv format). Column 1 (GeneID) and column 2 (TransID.SPAdes) include CCMP3266 gene and transcript identifiers, which can be used to retrieve nucleotide sequences from the hybrid transcriptome (Data S1). Column 3 ‑ 4: gene and transcript length; column 5: gene expression levels determined by mapping Illumina QC reads to the sGenome (RPK normalized); column 6: gene expression levels determined by mapping PacBio CCS reads to the sGenome (read counts); column 7: number of publically available E. huxleyi transcriptomes (n = 17; available at TSA database) that produced a significant BLAT hit; column 8 ‑ 15: blastx/blast2GO functional annotations.
Files
Files
(341.9 MB)
Name | Size | Download all |
---|---|---|
md5:dade50ee524b6bbed3397dc45101d9af
|
64.0 MB | Download |
md5:84ac4af29ce45bde9968af6bcbb2abde
|
30.3 MB | Download |
md5:5b6d797e4549283378603f4e1750239f
|
26.8 MB | Download |
md5:8cd6b8c38cda114cd24c47ffc711b206
|
187.3 MB | Download |
md5:8fafc6ed790758560eea6cc865d65b45
|
33.2 MB | Download |
md5:26e854b4283a4b171cf3d0c8b87cc362
|
276.2 kB | Download |