Published September 30, 2015 | Version v1
Dataset Open

Data from: Optimisation of next generation sequencing transcriptome annotation for species lacking sequenced genomes

  • 1. University of Bath
  • 2. Harvard University
  • 3. The University of Texas at Austin
  • 4. Syracuse University

Description

Next generation sequencing methods, such as RNA-seq, have permitted the exploration of gene expression in a range of organisms which have been studied in ecological contexts but lack a sequenced genome. However, the efficacy and accuracy of RNA-seq annotation methods using reference genomes from related species have yet to be robustly characterised. Here we conduct a comprehensive power analysis employing RNA-seq data from Drosophila melanogaster in conjunction with 11 additional genomes from related Drosophila species to compare annotation methods and quantify the impact of evolutionary divergence between transcriptome and the reference genome. Our analyses demonstrate that, regardless of the level of sequence divergence, direct genome mapping, where transcript short reads are aligned directly to the reference genome, significantly outperforms the widely used de novo and guided assembly-based methods in both the quantity and accuracy of gene detection. Our analysis also reveals that direct genome mapping recovers a more representative profile of Gene Ontology functional categories, which are often used to interpret emergent patterns in genome-wide expression analyses. Lastly, analysis of available primate RNA-seq data demonstrates the applicability of our observations across diverse taxa. Our quantification of annotation accuracy and reduced gene detection associated with sequence divergence thus provide empirically derived guidelines for the design of future gene expression studies in species without sequenced genomes.

Notes

Files

Figure_data_files.zip

Files (6.6 GB)

Name Size Download all
md5:a364ac0234f9765cba7485e5f5af06c8
1.2 MB Download
md5:0f5a11437ecbc86e31edfad07e0ef103
348.6 MB Download
md5:d9a4c4a8714b9be3522a9cb526783f09
388.4 MB Download
md5:550ac1fdd2c157a99469474e32b3591a
310.5 MB Download
md5:04ba696f348e86d04ddb8e60ccd1f5fe
384.1 MB Download
md5:cd7e86ecf5d1a68ff0ecbbbd45de154d
329.0 MB Download
md5:2f010666389ab6ff640d3aa24915db9b
359.0 MB Download
md5:fb8611bfb98030c348e14ec594af0941
328.6 MB Download
md5:ab2fe3cb465cf11a781bfdf03d26d780
392.0 MB Download
md5:ed9cb571d86907233f6a766a5aedbc0e
384.7 MB Download
md5:1a88fc206f522c2fdabc6e7cb5bb2648
354.3 MB Download
md5:29968b02ee30ed31b8cff2056fc8b7e0
350.5 MB Download
md5:84e2c31bdfa6c483aedef095e9cbf353
389.4 MB Download
md5:638d7c1108605654a2e82d1cbe337bf6
719.3 kB Download
md5:f1664de629eccbcf9d60e2ecce22f670
27.0 kB Preview Download
md5:46a3b3f8f0fc0ea32625c50cbcbdc1b3
14.7 MB Download
md5:53ff10b8669665430290befc8c5c03ce
263.5 MB Download
md5:ea61980b42dc6caecaeb35e3a4868c82
269.4 MB Download
md5:22fc4696745d8e86f9e32e42517380a6
282.0 MB Download
md5:ed10b9a382b06a50f32fba79092a5cde
285.5 MB Download
md5:4cfaed3f869bea8d199b961e5ecfecb5
1.2 GB Download
md5:afb8b773df4c15bd7a1f1747920c0602
2.4 kB Preview Download

Additional details

Related works

Is cited by
10.1111/1755-0998.12465 (DOI)