Published March 10, 2022 | Version v1.0
Project deliverable Open

D3.7 Technical report on information extraction from heterogeneous data using TDM

  • 1. CLARIN ERIC
  • 2. ATHENA
  • 3. DARIAH/UGOE
  • 4. CUNI

Description

This deliverable summarises the activities within Task 3.3. that were led by CLARIN ERIC with the main partners CLARIN/Athena, CLARIN/CUNI, and DARIAH/UGOE, and additional collaboration with SciencesPo. Over the course of M1-M38 the partners developed three demonstration scenarios that highlight the value of NLP (Natural Language Processing) technologies for the SSH field and investigated which aspects of the outcomes of T3.3. and in which form can be shared via SSH Open Marketplace.

Three types of scenarios include: (1) Application of TDM (Text Data Mining) to large bodies of multilingual texts on the use case of processing Collective Bargaining Agreements (CBAs); (2) Integration of linguistic analysis for information extraction into SSH tasks on the use case of verbal aggression detection in the context of social media. (3) TDM handling of heterogeneous data on the use case of processing the intertextuality phenomena in European drama history.

All use cases work with data in multiple languages, and created pipelines take this multilinguality into account.

The outcome of the demonstrations are stored to be accessed after the end of the project as online code notebooks and a workflow on SSH Open Marketplace (use case 1), service in the EOSC portal (use case 2), and a set of python scripts on the publicly accessible GitHub pages and a workflow on SSH Open Marketplace (use case 3).

Notes

Approved by EC - 27 April 2022

Files

D3.7 Technical report on information extraction from heterogeneous data using TDM (Approved 27 Apr 2022).pdf

Additional details

Funding

European Commission
SSHOC - Social Sciences & Humanities Open Cloud 823782