There is a newer version of the record available.

Published July 12, 2024 | Version v3
Dataset Open

A machine learning framework for extracting information from biological pathway images in the literature

  • 1. ROR icon Korea Advanced Institute of Science and Technology

Description

Training and validation datasets_arrow detection.zip:

Training and validation datasets for arrow detection using Faster R-CNN model. A total of 6,471 images have been prepared, including 2,332 images from five different sources and 4,139 augmented images.

 

Test dataset_arrow detection.zip:
Test dataset for arrow detection using Faster R-CNN model. A total of 100 images have been prepared from 89 papers searched through PubMed Central (PMC).

 

EBPI outputs.txt:
Reaction information extracted using EBPI from 49,846 biological pathway images across 466 target chemicals.

 

Supplementary Data 1:

Bounding box labels for 6,471 images in the training and validation datasets and 100 images in the test dataset.

 

Supplementary Data 2:

Dataset for text classification using BioBERT. A total of 59,370 terms have been prepared, including 15,101 “gene” terms, 21,417 “protein” terms, and 22,852 “others” terms by combining the data from MetaCyc and the PaddleOCR results from the papers.

 

Supplementary Data 3:

Collection and processing of pathway images illustrating biological pathways for 466 target chemicals from the bio-based chemicals map.

 

Supplementary Data 4:

Target chemicals satisfying criteria for biochemical reactions not covered by MetaNetX and KEGG.

Files

EBPI outputs.txt.txt

Files (956.2 MB)

Name Size Download all
md5:5071b63b36e3cfcdd6b178d1ce679565
200.7 MB Preview Download
md5:352040510c95ce384b7812ea381a3e0a
2.6 MB Download
md5:354718de3a03fa57aaa9f10cd96df011
1.3 MB Download
md5:34fa6aec3a6542cd177a67c18da8529d
128.8 kB Download
md5:550e209de82f90f8fed150cc37f2dc73
15.0 kB Download
md5:5979ae013794eaef483aebc92904d938
11.3 MB Preview Download
md5:a85d2a5057de84a8d7630f0eda87c9f6
740.1 MB Preview Download

Additional details

Software