There is a newer version of the record available.

Published November 2, 2021 | Version v1
Conference paper Open

Extracting Relations from Italian Wikipedia using Self-Training

  • 1. Dipartimento di Informatica, Università degli Studi di Bari "Aldo Moro"

Description

This dataset contains relations extracted from the Italian Wikipedia by the WikiOIE framework.
WikiOIE is based on UDPipe and the Universal Dependencies project for text processing.
It easily allows customizing the information extraction (IE) approach to automatically extract triples (subject, predicate, object).
This dataset contains relations extracted by a supervised approach based on self-training. 
The extraction process is provided in JSON format.

More information and the Java code are available here: https://github.com/pippokill/WikiOIE

Self-training approach:

Lucia Siciliani, Pierluigi Cassotti, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, and Giovanni Semeraro 2021. Extracting Relations from Italian Wikipedia using Self-Training. In Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021). CEUR-WS.

WikiOIE framework:

Pierluigi Cassotti, Lucia Siciliani, Pierpaolo Basile, Marco de Gemmis, and Pasquale Lops. 2021. Extracting relations from Italian Wikipedia using unsupervised information extraction. In Proceedings of the 11th Italian Information Retrieval Workshop 2021 (IIR 2021). CEUR-WS.

Files

Files (1.5 GB)

Name Size Download all
md5:48498cfa276c7cd29aec31873d848bf7
212.5 MB Download
md5:10f7345c813d3f066c13528d3fdb3c5f
212.5 MB Download
md5:a86dcc01483c28f2c1688a02333f3833
212.4 MB Download
md5:15881eb1556eea9b0084a7cf064ea50a
212.5 MB Download
md5:988f604cf7ea9fa9b976314d479dae0b
212.5 MB Download
md5:7a3cb0beaa42ec1b2e53c928abe4e09f
212.5 MB Download
md5:7eec61a214b2824c965a0d8c9d832dfe
212.4 MB Download