There is a newer version of the record available.

Published November 14, 2020 | Version v2
Preprint Open

Semi-artificial datasets as a resource for validation of bioinformatics pipelines for plant virus detection

  • 1. Université de Liège, Terra-Gembloux Agro-Bio Tech, Plant Pathology Laboratory, Passage des Déportés, 2, 5030 Gembloux, Belgium
  • 2. Plant Sciences Unit, Flanders Research Institute for Agriculture, Fisheries and Food (ILVO), Burg. Van Gansberghelaan 96, 9820 Merelbeke, Belgium
  • 3. Department of Plant Pathology, University of California, Davis, 95616
  • 4. Department of Plant Protection, Faculty of Agriculture, University of Sütçü Imam, Kahramanmaras 46060, Turkey
  • 5. Univ. Bordeaux, INRAE, UMR BFP, CS20032, 33882 Villenave d'Ornon cedex, France
  • 6. Institute for Sustainable Plant Protection, CNR, Via Amendola 122/D, Bari 70126, Italy
  • 7. Leibniz Institute - DSMZ, German Collection of Microorganisms and Cell Cultures GmbH, 38124 Braunschweig, Germany
  • 8. Virology, Agroscope, Nyon, Switzerland
  • 9. Department of Plant Pathology, University of California, Davis, 95616; Department of Evolution and Ecology, University of California, Davis, California 95616, USA
  • 10. Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia

Description

The widespread use of High-Throughput Sequencing (HTS) for detection of plant viruses and sequencing of plant virus genomes has led to the generation of large amounts of data and of bioinformatics challenges to process them. Many bioinformatics pipelines for virus detection are available, making the choice of a suitable one difficult. A robust benchmarking is needed for the unbiased comparison of the pipelines, but there is currently a lack of reference datasets that could be used for this purpose. We present 7 semi-artificial datasets composed of real RNA-seq datasets from virus-infected plants spiked with artificial virus reads. Each dataset addresses challenges that could prevent virus detection. We also present 3 real datasets showing a challenging virus composition as well as 8 completely artificial datasets to test haplotype reconstruction software.

Files

Article PHBN VIROMOCK Challenge V2.pdf

Files (317.4 kB)

Name Size Download all
md5:de02d0437ebfcb022cb622b6550433fd
317.4 kB Preview Download