There is a newer version of the record available.

Published July 22, 2018 | Version v0.1.0
Dataset Open

Partitioned Image Data for Machine Learning Analysis of Molecular Biology Figures

  • 1. USC Information Science Institute

Description

 Corpus Composition

This data collection provides four types of hand-curated images from open access research articles images. The types are:

  1. chart (n=811): data displays such as bar charts, scatterplots, line graphs, etc.
  2. diagram (n=816): any general conceptual diagram
  3. gel (n=811): the output of electrophoresis experiments in Northern, Western, or Southern Blot experiments. 
  4. histology (n=816): microscope images of tissue  wih histological staining

The images are simply organized in subdirectories as individual files. File names are based on PubMed Id and Figure number. 

Notes

This work was funded by the National Library of Medicine under grant R01 LM012592-01 ('EVIDENCE EXTRACTION SYSTEMS FOR THE MOLECULAR INTERACTION LITERATURE').

Files

Files (99.7 MB)

Name Size Download all
md5:b9c2e8abd1110f201baac0ea997b9fd7
99.7 MB Download