Published May 4, 2023 | Version v1
Thesis Open

Optimizing a Natural Language Processing pipeline for the automatic creation of RDF data

Creators

  • 1. Giet

Description

The research of coins plays an important role when it comes to the enlightenment of our history, since coins can give us an idea of the time period they were used in. Additionally, if we know where the coin was made and where it was found, we can also learn about the mobility of people during that time. The development of tools that researchers can use thereby plays an important role in driving the development of the field forward, which is the aim of the research project D4N4(Data quality for Numismatics based on Natural language processing and Neural Networks). This thesis deals with the optimization of a Natural Language Processing pipeline which is used to create RDF data. The RDF data is being generated from a database belonging to the D4N4 research project and houses information for about 50.000 coins. Currently, the execution of the pipeline is tied to manual work and relies on the D2RQ program to create the RDF data. In order to optimize the pipeline, this thesis introduces a revised version which removes the manual executions of the previous version and through a new script, written in Python using the libraries RDFLib and MySQL.connector, removes the dependence on the D2RQ program when it comes to the creation of RDF data. The execution of the overall pipeline is handled by a Jupyter notebook and only requires the user to specify the coins via an id and start the script. As a result, the new version offers a more convenient and faster way to create RDF data out of the database.

Files

Thesis.pdf

Files (1.5 MB)

Name Size Download all
md5:6022e7cb29da7eeddfa16254ed71dda5
1.5 MB Preview Download