Deep Learning Identifies Erroneous Microarray-based, Gene-level Conclusions in Literature
- 1. Department of Computational Medicine and Bioinformatics, the University of Michigan Medical School, Ann Arbor, MI
- 2. Department of Computational Medicine and Bioinformatics, the University of Michigan Medical School, Ann Arbor, MI; Department of Internal Medicine, the University of Michigan Medical School, Ann Arbor, MI
Description
More than 110,000 publications have used microarray to deciphering phenotype-associated genes, clinical biomarkers and gene functions. Microarray relies on digital assaying the fluorescence signals of arrays. In this study, we retrospectively constructed raw images for 37,724 published microarray data, and developed deep learning algorithms to automatically detect systematic defects. We report that an alarming amount of 26.73% of the microarray-based studies are affected by serious imaging defects. By literature mining, we found publications associated with these affected microarrays have reported disproportionately more biological discoveries on the genes in the contaminated areas compared to other genes. 28.82% of the gene-level conclusions reported in these publications were based on measurements falling into the contaminated area, indicating severe, systematic problems caused by such contaminations. We provided the identified published, problematic datasets, affected genes and the imputed arrays as well as software tools for scanning such contamination that will become essential to future studies to scrutinize and critically analyze microarray data.
Here we uploaded the corrected microarray data, the hand-labelled contamination arrays and associated genes and the predicted contaminations and genes as a public resource
Files
contam_genes.zip
Files
(12.3 GB)
Name | Size | Download all |
---|---|---|
md5:d7115d08505d39f70d274b8344db7224
|
35.4 MB | Preview Download |
md5:6d82bcaee3971bfb6dfbac52b6edb36a
|
99.0 MB | Download |
md5:1b6358ca9b5611a872d6dd5caa561d35
|
316.5 kB | Preview Download |
md5:6448a672b59e84289f81a07282e0347d
|
719.2 MB | Preview Download |
md5:8442a48850bd14a01fed493506a1c794
|
1.2 GB | Preview Download |
md5:227c87eba5f28dfc8d5ca3cc68da53d5
|
1.6 GB | Preview Download |
md5:cf8187faa2bb303d0cf7260792962aa5
|
1.8 GB | Preview Download |
md5:c9c04b94ad5622f6f5f8e5021ee00a7f
|
1.7 GB | Preview Download |
md5:5dd3d4ca82295f3a063850e9c2ac9fc8
|
2.6 GB | Preview Download |
md5:5e10ba9c1b68eac8ccb06c494fd13c3a
|
1.2 GB | Preview Download |
md5:2820d528e339dead81d1fc830abeb836
|
1.2 GB | Preview Download |