Published July 22, 2018 | Version v0.1.1
Dataset Open

Partitioned Image Data for Machine Learning Analysis of Molecular Biology Figures

  • 1. USC Information Science Institute

Description

 Corpus Composition

This data collection provides four types of hand-curated images from open access research articles images. The types are:

  1. chart (n=811): data displays such as bar charts, scatterplots, line graphs, etc.
  2. diagram (n=816): any general conceptual diagram
  3. gel (n=1182): the output of electrophoresis experiments in Northern, Western, or Southern Blot experiments. 
  4. histology (n=3458): microscope images of tissue with histological staining

The images are simply organized in subdirectories as individual files. File names are based on PubMed Id and Figure number. 

Notes

This work was funded by the National Library of Medicine under grant R01 LM012592-01 ('EVIDENCE EXTRACTION SYSTEMS FOR THE MOLECULAR INTERACTION LITERATURE').

Files

Files (99.7 MB)

Name Size Download all
md5:74d7fc09bf65c91ea305ce606bda343e
99.7 MB Download