Dataset Open Access

Data for: Tang et al., Interpretable classification of Alzheimer's disease pathologies with a convolutional neural network pipeline. bioRxiv 2018.

Tang, Ziqi; Chuang, Kangway; DeCarli, Charles; Jin, Lee-Way; Beckett, Laurel; Keiser, Michael; Dugger, Brittany

Datasets containing 63 whole slide images (WSIs) and their segmented 256x256 pixel tiles with approximately 80,000 tile-level amyloid-β pathology expert annotations.

Paper: "Interpretable classification of Alzheimer's disease pathologies with a convolutional neural network pipeline", bioRxiv 454793; DOI: https://doi.org/10.1101/454793.

Details: A total of 63 WSIs for 63 unique decedent cases spanning Alzheimer’s disease (AD) to non-AD and possessing a variety of CERAD scores. WSIs comprise three datasets as follows:

  1. Development (Phases I-II). 33 WSIs used for convolutional neural network (CNN) model development (29 training, 4 validation).
  2. Hold-out (Phase III). 10 WSIs selected by an expert neuropathologist as a held-out test set to assess the generalizability of the CNN model.
  3. CERAD-like hold-out. 20 blinded WSIs collected solely for use in a CERAD-like scoring comparison study.

Datasets 1 and 2 were color-normalized and segmented to 256x256 pixel image tiles for model training set (61,370 images), validation set (8,630 images), and hold-out test set (10,873 images). Dataset 3 was color-normalized but not segmented.

Expert labels of plaques for Dataset 1 and 2 tiles are included in corresponding CSV files.

Slide source and preparation: All samples were retrieved from archives of the University of California, Davis Alzheimer’s Disease Center Brain Bank (https://www.ucdmc.ucdavis.edu/alzheimers/). Archival samples analyzed in this study were 5 μm formalin fixed, paraffin embedded sections of the superior and middle temporal gyrus from human brain. The tissue had been previously stained with an amyloid-β antibody (4G8, recognizing residues 17-24, BioLegend, formerly Covance) that were first pretreated with formic acid to rid samples of endogenous protein. All slides were digitized using an Aperio AT2 up to 40x magnification.

Code: Please visit https://github.com/keiserlab/plaquebox-paper

 

This study was funded by a NIH P30 AG010129 grant (BND, CD, LWJ, and LB), a Paul G. Allen Family Foundation Distinguished Investigator Award (MJK), and the China Scholarship Council (ZT). These agencies had no role in any aspect of the study, including study design, data collection, analysis, or writing.
Files (110.0 GB)
Name Size
Dataset 1a Development_train.zip
md5:f1b8413b61799a3350f7b431ecf2026f
35.3 GB Download
Dataset 1b Development_validation.zip
md5:ffd0c30e55154901621972c16c259efa
3.9 GB Download
Dataset 2 Hold-out.zip
md5:f0f69ccc39fe9e3072909ec48a1c057a
41.2 GB Download
Dataset 3 CERAD-like hold-out.zip
md5:2200d5d0209fb35e77dfa0692eece03f
26.3 GB Download
Tiles.zip
md5:1420e454def8f09eb945643ba5cfac53
3.3 GB Download
68
79
views
downloads
All versions This version
Views 6868
Downloads 7979
Data volume 2.0 TB2.0 TB
Unique views 4848
Unique downloads 1515

Share

Cite as