Published June 6, 2022 | Version v1
Thesis Open

From PDF to structured references: A comparative study on tools for bibliographic references extraction and parsing

  • 1. University of Bologna

Contributors

  • 1. University of Bologna

Description

The aim of this work is to identify all, and only, the tools which, given a full text paper in PDF format, are able to identify, extract and parse bibliographic references. The methods they are based on don’t influence the tools selection. The first phase of this thesis is the literature review. From this step, seven tools are identified: Anystyle, Cermine, ExCite, GROBID, Pdfssa4met, Scholarcy and Science Parse. In a second moment, these tools are compared and evaluated in different research fields, providing interesting results. Indeed, Anystyle obtains the best overall score, followed by Cermine. However, in some of the subtasks investigated alongside the overall results, other tools resulted to have a better performance in specific tasks. Thus, in this variegated scenario, different solutions can be adopted on the basis on the user’s requirements.

Files

CioffiAlessiaMasterThesis.pdf

Files (1.7 MB)

Name Size Download all
md5:d0bba48e71adecf2098a83fc5fcd9f70
1.7 MB Preview Download