Published November 2, 2021 | Version 2
Conference paper Open

Extracting Relations from Italian Wikipedia using Self-Training

  • 1. Dipartimento di Informatica, Università degli Studi di Bari "Aldo Moro"

Description

This dataset contains relations extracted from the Italian Wikipedia by the WikiOIE framework.
WikiOIE is based on UDPipe and the Universal Dependencies project for text processing.
It easily allows customizing the information extraction (IE) approach to automatically extract triples (subject, predicate, object).
This dataset contains relations extracted by a supervised approach based on self-training. 
The extraction process is provided in JSON format.

Version 2 of the dataset was extracted using an improved version of the learning algorithm. The files of version 2 are identified by the suffix "_reg" in the file name.

More information and the Java code are available here: https://github.com/pippokill/WikiOIE

Self-training approach:

Lucia Siciliani, Pierluigi Cassotti, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, and Giovanni Semeraro 2021. Extracting Relations from Italian Wikipedia using Self-Training. In Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021). CEUR-WS.

WikiOIE framework:

Pierluigi Cassotti, Lucia Siciliani, Pierpaolo Basile, Marco de Gemmis, and Pasquale Lops. 2021. Extracting relations from Italian Wikipedia using unsupervised information extraction. In Proceedings of the 11th Italian Information Retrieval Workshop 2021 (IIR 2021). CEUR-WS.

Files

Files (2.1 GB)

Name Size Download all
md5:988f604cf7ea9fa9b976314d479dae0b
212.5 MB Download
md5:7a3cb0beaa42ec1b2e53c928abe4e09f
212.5 MB Download
md5:7eec61a214b2824c965a0d8c9d832dfe
212.4 MB Download
md5:12a8fe1760a69ef13e8a2f6433f962be
209.4 MB Download
md5:e1cf83106e1ceed163c78b5036e0fd36
209.4 MB Download
md5:d1671320341e263748ce7463e6302bdc
209.3 MB Download
md5:8c27a2ce563444e3ccbe366bb0315038
209.4 MB Download
md5:34a2fbf1e8762ceac5904e68267cb166
209.4 MB Download
md5:48e64dbc5119bfef2fb19061481f6866
209.4 MB Download
md5:5355252cce1f7c977f11c509a6c11e16
209.3 MB Download