Published April 12, 2021 | Version v1
Conference paper Open

Impact Analysis of Document Digitization on Event Extraction

  • 1. University of La Rochelle, L3i, F-17000, La Rochelle, France
  • 2. Sorbonne University, F-75006 Paris, France

Description

This paper tackles the epidemiological event extraction task applied to digitized documents. Event extraction is an information extraction task that focuses on identifying event mentions from textual data. In the context of event-based health surveillance from digitized documents, several key issues remain challenging in spite of great efforts. First, image documents are indexed through their digitized version and thus, they may contain numerous errors, e.g. misspellings. Second, it is important to address international news, which would imply the inclusion of multilingual data. To clarify these important aspects of how to extract epidemic-related events, it remains necessary to maximize the use of digitized data. In this paper, we investigate the impact of working with digitized multilingual documents with different levels of synthetic noise over the performance of an event extraction system. This type of analysis, to our knowledge, has not been alleviated in previous research.

Files

paper28.pdf

Files (957.6 kB)

Name Size Download all
md5:bcd10e9691f937a55270fdeb390436e1
957.6 kB Preview Download

Additional details

Funding

EMBEDDIA – Cross-Lingual Embeddings for Less-Represented Languages in European News Media 825153
European Commission