From PDF to structured references: A comparative study on tools for bibliographic references extraction and parsing
Description
The aim of this work is to identify all, and only, the tools which, given a full text paper in PDF format, are able to identify, extract and parse bibliographic references. The methods they are based on don’t influence the tools selection. The first phase of this thesis is the literature review. From this step, seven tools are identified: Anystyle, Cermine, ExCite, GROBID, Pdfssa4met, Scholarcy and Science Parse. In a second moment, these tools are compared and evaluated in different research fields, providing interesting results. Indeed, Anystyle obtains the best overall score, followed by Cermine. However, in some of the subtasks investigated alongside the overall results, other tools resulted to have a better performance in specific tasks. Thus, in this variegated scenario, different solutions can be adopted on the basis on the user’s requirements.
Files
CioffiAlessiaMasterThesis.pdf
Files
(1.7 MB)
Name | Size | Download all |
---|---|---|
md5:d0bba48e71adecf2098a83fc5fcd9f70
|
1.7 MB | Preview Download |