Published February 28, 2022 | Version v1
Dataset Open

Single cell Iso-Sequencing enables rapid genome annotation for scRNAseq analysis

  • 1. University of Oregon

Description

Single cell RNA sequencing (scRNAseq) is a powerful technique that continues to expand across various biological applications. However, incomplete 3' UTR annotations can impede single cell analysis resulting in genes that are partially or completely uncounted. Performing scRNAseq with incomplete 3' UTR annotations can hinder the identification of cell identities and gene expression patterns and lead to erroneous biological inferences. We demonstrate that performing single cell isoform sequencing (ScISOr-Seq) in tandem with scRNAseq can rapidly improve 3' UTR annotations. Using threespine stickleback fish (Gasterosteus aculeatus), we show that gene models resulting from a minimal embryonic ScISOr-Seq dataset retained 26.1% greater scRNAseq reads than gene models from Ensembl alone. Furthermore, pooling our ScISOr-Seq isoforms with a previously published adult bulk Iso-Seq dataset from stickleback, and merging the annotation with the Ensembl gene models, resulted in a marginal improvement (+0.8%) over the ScISOr-Seq only dataset. In addition, isoforms identified by ScISOr-Seq included thousands of new splicing variants. The improved gene models obtained using ScISOr-Seq lead to successful identification of cell types and increased the reads identified of many genes in our scRNAseq stickleback dataset. Our work illuminates ScISOr-Seq as a cost-effective and efficient mechanism to rapidly annotate genomes for scRNAseq.

Notes

We used raw sequncing data from Naftaly, Pau, and White (2021)'s paper to create the Supplementalfile3_Annotation2_Bulk_Iso_Seq, Supplementalfile4_Annotation3_Pooled_Iso_Seqs_merged_with_ensembl.gtf, and Supplementalfile5_Annotation4_Pooled_Iso_Seqs_merged_with_ensembl_for_cellranger.gtf annotations. Information on accessing the sequencing data from our experiment can be found in our manuscript (Single cell Iso-Sequencing enables rapid genome annotation for scRNAseq analysis in Genetics).

Funding provided by: National Institutes of Health
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000002
Award Number: T32GM007413

Funding provided by: National Science Foundation
Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001
Award Number: OPP-2015301

Files

README.txt

Files (652.7 MB)

Name Size Download all
md5:5b811c5b6b517af8c3dc00fb37bd9085
2.1 kB Preview Download
md5:46403d2415ef5ccc6404bfefa376e088
16.7 kB Download
md5:54a9864ddf556911c009a87821da70af
37.2 MB Download
md5:e95b1c429aee509e2f3c353b44ac8801
40.3 MB Download
md5:66289190aab7fb191445228881b150c7
162.2 MB Download
md5:5210a73d721bd4ee810105a131abb513
162.2 MB Download
md5:018dc6f58abc9b1355499e7959bf1afb
21.7 kB Download
md5:5a093c7d649f59e85891980bc86cba23
18.2 kB Download
md5:ab27158c21db391a25f21783d0b6340b
250.8 MB Download

Additional details

Related works

Is cited by
10.1093/genetics/iyac017 (DOI)