Published September 7, 2022 | Version v1
Dataset Open

Simulated data from: Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2

Authors/Creators

  • 1. University of British Columbia

Description

Long-read sequencing technologies have improved significantly since their emergence. Their read lengths, potentially spanning entire transcripts, is advantageous for reconstructing transcriptomes. Existing long-read transcriptome assembly methods are primarily reference-based and to date, there is little focus on reference-free transcriptome assembly. We introduce RNA-Bloom2, a reference-free assembly method for long-read transcriptome sequencing data. RNA-Bloom2 is available on GitHub at: https://github.com/bcgsc/RNA-Bloom.

We benchmarked the assembly quality and the computational performance of RNA-Bloom2 on simulated data. We prepared two mouse simulated datasets with Trans-NanoSim for the cDNA and dRNA sequencing protocols model on experimental ONT data. The datasets were simulated based on the mouse ENSEMBL annotation for GRCm39. To investigate the effect of sequencing depth, we subsampled each dataset to 2, 10, and 18 million reads, resulting in a total of six sets of reads for our benchmarking experiments. Using the simulated data, we showed that the transcriptome assembly quality of RNA-Bloom2 is competitive to those of reference-based methods.

Notes

Decompress the tarballs:

tar -zcf mouse_cdna.tar.gz
tar -zcf mouse_drna.tar.gz

Extract the sample read files:

cd mouse_cdna
bash extract_reads.sh

cd mouse_drna
bash extract_reads.sh

Funding provided by: Genome Canada
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100008762
Award Number: 243FOR

Funding provided by: Genome British Columbia
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000233
Award Number: 243FOR

Funding provided by: National Human Genome Research Institute
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000051
Award Number: 2R01HG007182-04A1

Funding provided by: Natural Sciences and Engineering Research Council of Canada
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000038
Award Number:

Funding provided by: Canadian Institutes of Health Research
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000024
Award Number:

Files

README.txt

Files (43.1 GB)

Name Size Download all
md5:64c72dfa2acad50bf0207a12b8d2ca50
14.8 GB Download
md5:fb6016974713e4df8bc7d3af33da2c1f
28.3 GB Download
md5:5c6055b08b34854d5609ec61a0c02e5c
1.2 kB Preview Download

Additional details

Related works