Extracting Relations from Italian Wikipedia using Self-Training
Creators
- 1. Dipartimento di Informatica, Università degli Studi di Bari "Aldo Moro"
Description
This dataset contains relations extracted from the Italian Wikipedia by the WikiOIE framework.
WikiOIE is based on UDPipe and the Universal Dependencies project for text processing.
It easily allows customizing the information extraction (IE) approach to automatically extract triples (subject, predicate, object).
This dataset contains relations extracted by a supervised approach based on self-training.
The extraction process is provided in JSON format.
More information and the Java code are available here: https://github.com/pippokill/WikiOIE
Self-training approach:
Lucia Siciliani, Pierluigi Cassotti, Pierpaolo Basile, Marco de Gemmis, Pasquale Lops, and Giovanni Semeraro 2021. Extracting Relations from Italian Wikipedia using Self-Training. In Proceedings of the Eighth Italian Conference on Computational Linguistics (CLiC-it 2021). CEUR-WS.
WikiOIE framework:
Pierluigi Cassotti, Lucia Siciliani, Pierpaolo Basile, Marco de Gemmis, and Pasquale Lops. 2021. Extracting relations from Italian Wikipedia using unsupervised information extraction. In Proceedings of the 11th Italian Information Retrieval Workshop 2021 (IIR 2021). CEUR-WS.
Files
Files
(1.5 GB)
Name | Size | Download all |
---|---|---|
md5:48498cfa276c7cd29aec31873d848bf7
|
212.5 MB | Download |
md5:10f7345c813d3f066c13528d3fdb3c5f
|
212.5 MB | Download |
md5:a86dcc01483c28f2c1688a02333f3833
|
212.4 MB | Download |
md5:15881eb1556eea9b0084a7cf064ea50a
|
212.5 MB | Download |
md5:988f604cf7ea9fa9b976314d479dae0b
|
212.5 MB | Download |
md5:7a3cb0beaa42ec1b2e53c928abe4e09f
|
212.5 MB | Download |
md5:7eec61a214b2824c965a0d8c9d832dfe
|
212.4 MB | Download |