There is a newer version of the record available.

Published May 19, 2022 | Version 1.0
Dataset Open

SCI-3000: A Novel Dataset for the Task of Figure, Table and Caption Extraction from Scientific PDFs

  • 1. TU Vienna

Contributors

  • 1. TU Vienna

Description

This dataset contains bounding boxes of figures, tables, captions in 34,791 pages extracted from 3000 open-access scientific publications from the fields of medicine, chemistry, physics, computer science, and technology. The underlying publications are also included in PDF form.

For more details, refer to the README file.

Files

SCI-3000-full.zip

Files (6.8 GB)

Name Size Download all
md5:e2abc448cc6c529eed243324bc184cb5
6.8 GB Preview Download