Published February 28, 2025 | Version v1
Report Open

CLS INFRA D8.3 Report on Applied NLP Named Entity Recognition

  • 1. ROR icon Ghent University
  • 2. Royal Library of Belgium
  • 3. Charles University, Faculty of Arts

Description

The following report documents the work of Work Package 8 in the CLS Infrastructure Project. The general goals of this work package are to increase the ease of access and application to NLP tools, including for less-well-resourced languages, as well as their standardization. The report is organized as follows: a generation explaination of named entity recognition tasks, technical boundaries, challenges for literary scholar (and or those working with unstructured texts) and thus proposed tools for these tasks. This includes machine learning pipeline for automatically extracting pre-defined mentions of known objects, such as people, places or organizations to generative AI solutions and in multiple languages appliacle to a wide set of scholars. These tools integrate work in both WP 6 and 7 which facilitates integration of the pipeline from data preparation, programmable corpora, to analysis and back.

This research was conducted within the framework of the European-funded Computational Literary Studies Infrastructure (CLS INFRA, https://clsinfra.io/) project, funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101004984, which aims to build a shared and sustainable infrastructure for literary studies with digital tools.

Files

D8.3 Report on NLP for NER.pdf

Files (1.4 MB)

Name Size Download all
md5:97a95df2febc5607e20610ab2c0b3c62
1.4 MB Preview Download

Additional details

Funding

European Commission
CLS INFRA - Computational Literary Studies Infrastructure 101004984