Published November 29, 2024 | Version 1.0.0
Dataset Open

Platynereis dumerilii full-length transcriptome of developmental stages

  • 1. ROR icon Max Planck Institute of Molecular Cell Biology and Genetics

Description

To generate a high-quality full-length transcriptome for the annelid Platynereis dumerilii, we collected samples from representative developmental stages, from unfertilized eggs to 5 days post-fertilization. Each sample consisted of a bulk mix from 1 to 5 batches of embryos fertilized by different parents. We incubated all batches at 18 degrees Celsius until the desired time point, then collected the embryos into a clean tube and snap-froze them in liquid nitrogen with as little seawater as possible. The samples were stored at -80 degrees Celsius until RNA extraction. We extracted total RNA from the samples using a Trizol protocol. After measuring the RNA concentration with NanoDrop, we created a bulk RNA mix by combining 1 µL from each sample into a new tube. We gave the sample to the Sequencing and Genotyping facility of the Max Planck Institute of Molecular Cell Biology and Genetics, who ran aliquots of this bulk mix through a Bioanalyzer and gel electrophoresis. They found no evidence of RNA degradation. From this sample, they prepared PacBio Iso-Seq libraries using the Express Template Prep Kit 2.0 and sequenced full-length transcripts on a SMRT 8M Cell for 30 hours using a PacBio Sequel II System. They processed the raw movie subreads with SMRT Analysis software, following the Iso-Seq v3 workflow to generate representative circular consensus sequences, demultiplex and remove primers, trim poly(A) tails, and remove concatemers. After transcript clustering and merging, the resulting dataset contained 176,122 polished high-quality isoforms. Using Cogent, we removed redundant isoforms and obtained a dataset with 117,524 transcripts. From this, we generated a dataset containing only the longest isoform for each gene, with 70,003 sequences in total. We calculated descriptive metrics using Transrate. To estimate their completeness, we used BUSCO for metazoa and obtained a score of 85%. Finally, we annotated the longest-isoform dataset using EnTAP. About 85% of the transcripts have a coding sequence. We obtained annotations for 67% of the sequences, while 33% have remained unannotated.

Datasets

file name file size (zipped) sequences description
0-Pdum_workflow.zip (folder) 3.40 GB - entire pipeline with notebook entries and analyses
1-Pdum_hq_isoforms.zip (fasta) 180.30 MB 176,122 polished high-quality isoforms from CCS
2-Pdum_co_isoforms.zip (fasta) 70.68 MB 117,524 non-redundant polished high-quality isoforms
3-Pdum_co_longest.zip (fasta) 54.85 MB 70,003 longest of non-redundant polished high-quality isoforms
4-Pdum_co_longest_annotations.zip (tsv) 34.37 MB 70,003 (46,635 annotated) annotations for longest-isoform dataset

 

Files

0-Pdum_workflow.zip

Files (3.7 GB)

Name Size Download all
md5:7dfb5e47cfc6138da1cb55f2af19b849
3.4 GB Preview Download
md5:65fe1017bbd2c88c379fa1da2994c122
180.3 MB Preview Download
md5:b504b8330d7435ca7d8f184eaf939077
70.7 MB Preview Download
md5:587096b474eeb0c578c864a6c4ad309f
54.9 MB Preview Download
md5:7b30796391d6388737513e46bb35a0d5
34.4 MB Preview Download

Additional details

Related works

Is part of
Journal article: 10.1186/s12864-025-11727-2 (DOI)

Funding

Deutsche Forschungsgemeinschaft
Active torque generation for spiralian chiral cleavage GZ: TO 563/7-1