A Comparison of Approaches for Automated Text Extraction from Scholarly Figures

Böschen, Falk; Scherp, Ansgar

doi:10.1007/978-3-319-51811-4_2

Published December 31, 2016 | Version v1

Conference paper Open

A Comparison of Approaches for Automated Text Extraction from Scholarly Figures

1. Kiel University
2. Kiel University and Leibniz Information Centre for Economics (ZBW)

So far, there has not been a comparative evaluation of different approaches for text extraction from scholarly figures. In order to fill this gap, we have defined a generic pipeline for text extraction that abstracts from the existing approaches as documented in the literature. In this paper, we use this generic pipeline to systematically evaluate and compare 32 configurations for text extraction over four datasets of scholarly figures of different origin and characteristics. In total, our experiments have been run over more than 400 manually labeled figures. The experimental results show that the approach BS-4OS results in the best F-measure of 0.67 for the Text Location Detection and the best average Levenshtein Distance of 4.71 between the recognized text and the gold standard on all four datasets using the Ocropy OCR engine.

Files

2017-MMM-BoeschenScherp-TX.pdf

Files (268.8 kB)

Name	Size	Download all
2017-MMM-BoeschenScherp-TX.pdf md5:491fe99e667082902c708de885759075	268.8 kB	Preview Download

Additional details

European Commission
MOVING - Training towards a society of data-savvy information professionals to enable open leadership innovation 693092

214

Views

320

Downloads

Show more details

	All versions	This version
Views	214	214
Downloads	320	320
Data volume	87.4 MB	87.4 MB

More info on how stats are collected....

DOI

Resource type

Conference paper

Publisher

Zenodo

Conference

23rd International Conference on Multimedia Modeling (MMM2017) , Reykjavik, Iceland, 04-06 January 2017 (Session 2A)

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: March 2, 2017
Modified: August 3, 2024

A Comparison of Approaches for Automated Text Extraction from Scholarly Figures

Creators

Description

Files

2017-MMM-BoeschenScherp-TX.pdf

Files (268.8 kB)

Additional details

Funding