Dataset Open Access

Data for: Tang et al., Interpretable classification of Alzheimer's disease pathologies with a convolutional neural network pipeline. bioRxiv 2018.

Tang, Ziqi; Chuang, Kangway; DeCarli, Charles; Jin, Lee-Way; Beckett, Laurel; Keiser, Michael; Dugger, Brittany

Datasets containing 63 whole slide images (WSIs) and their segmented 256x256 pixel tiles with approximately 80,000 tile-level amyloid-β pathology expert annotations.

Paper: "Interpretable classification of Alzheimer's disease pathologies with a convolutional neural network pipeline", bioRxiv 454793; DOI:

Details: A total of 63 WSIs for 63 unique decedent cases spanning Alzheimer’s disease (AD) to non-AD and possessing a variety of CERAD scores. WSIs comprise three datasets as follows:

  1. Development (Phases I-II). 33 WSIs used for convolutional neural network (CNN) model development (29 training, 4 validation).
  2. Hold-out (Phase III). 10 WSIs selected by an expert neuropathologist as a held-out test set to assess the generalizability of the CNN model.
  3. CERAD-like hold-out. 20 blinded WSIs collected solely for use in a CERAD-like scoring comparison study.

Datasets 1 and 2 were color-normalized and segmented to 256x256 pixel image tiles for model training set (61,370 images), validation set (8,630 images), and hold-out test set (10,873 images). Dataset 3 was color-normalized but not segmented.

Expert labels of plaques for Dataset 1 and 2 tiles are included in corresponding CSV files.

Slide source and preparation: All samples were retrieved from archives of the University of California, Davis Alzheimer’s Disease Center Brain Bank ( Archival samples analyzed in this study were 5 μm formalin fixed, paraffin embedded sections of the superior and middle temporal gyrus from human brain. The tissue had been previously stained with an amyloid-β antibody (4G8, recognizing residues 17-24, BioLegend, formerly Covance) that were first pretreated with formic acid to rid samples of endogenous protein. All slides were digitized using an Aperio AT2 up to 40x magnification.

Code: Please visit


This study was funded by a NIH P30 AG010129 grant (BND, CD, LWJ, and LB), a Paul G. Allen Family Foundation Distinguished Investigator Award (MJK), and the China Scholarship Council (ZT). These agencies had no role in any aspect of the study, including study design, data collection, analysis, or writing.
Files (110.0 GB)
Name Size
Dataset 1a
35.3 GB Download
Dataset 1b
3.9 GB Download
Dataset 2
41.2 GB Download
Dataset 3 CERAD-like
26.3 GB Download
3.3 GB Download
All versions This version
Views 6868
Downloads 7979
Data volume 2.0 TB2.0 TB
Unique views 4848
Unique downloads 1515


Cite as