Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published March 2, 2018 | Version v1
Dataset Open

Data from: Batch effects in a multi-year sequencing study: false biological trends due to changes in read lengths

  • 1. University of Zurich

Description

High-throughput sequencing is a powerful tool, but suffers biases and errors that must be accounted for to prevent false biological conclusions. Such errors include batch effects, technical errors only present in subsets of data due to procedural changes within a study. If overlooked and multiple batches of data are combined, spurious biological signals can arise, particularly if batches of data are correlated with biological variables. Batch effects can be minimized through randomisation of sample groups across batches. However, in long-term or multi-year studies where data are added incrementally, full randomisation is impossible and batch effects may be a common feature. Here we present a case study where false signals of selection were detected due to a batch effect in a multi-year study of Alpine ibex (Capra ibex). The batch effect arose because sequencing read length changed over the course of the project and populations were added incrementally to the study, resulting in non-random distributions of populations across read lengths. The differences in read length caused small misalignments in a subset of the data, leading to false variant alleles and thus false SNPs. Pronounced allele frequency differences between populations arose at these SNPs because of the correlation between read length and population. This created highly statistically significant, but biologically spurious, signals of selection and false associations between allele frequencies and the environment. We highlight the risk of batch effects and discuss strategies to reduce the impacts of batch effects in multi-year high-throughput sequencing studies.

Notes

Files

Files (143.3 kB)

Name Size Download all
md5:4fcefdf68d60505c1ab104599bcff85f
81.9 kB Download
md5:b33e6528b10ac89054248e469652d7f9
50.3 kB Download
md5:39cb17a66eb4edade37c086e05a4719a
11.1 kB Download

Additional details

Related works

Is cited by
10.1111/1755-0998.12779 (DOI)