Published September 9, 2021 | Version 1.00
Dataset Open

Relations from Italian Wikipedia using Unsupervised Information Extraction

  • 1. Università degli Studi di Bari Aldo Moro

Description

This dataset contains relations extracted from the Italian Wikipedia by the WikiOIE framework.
WikiOIE is based on UDPipe and the Universal Dependencies project for text processing.
It easily allows customizing the information extraction (IE) approach to automatically extract triples (subject, predicate, object).
This dataset contains relations extracted by two unsupervised IE methods. The former (simple) is based only on PoS-tag patterns; the latter (simpledep) also uses syntactic dependencies. 
The extraction process is provided in JSON format.

More information and the Java code are available here https://github.com/pippokill/WikiOIE

Pierluigi Cassotti, Lucia Siciliani, Pierpaolo Basile,Marco de Gemmis, and Pasquale Lops. 2021. Extracting relations from Italian Wikipedia using unsupervised information extraction. In Proceedings of the 11th Italian Information Retrieval Workshop 2021 (IIR 2021). CEUR-WS.

Files

Files (9.8 GB)

Name Size Download all
md5:db9e3539b563e6ac8fc75ddb43207aba
224.7 MB Download
md5:36abed9f9f6d917dc6d2c7f6a4e50a8a
224.7 MB Download
md5:4d04e2e19c5454d82e540b139b3fb637
224.6 MB Download
md5:aab4b397f0d491528a791dae43d4f645
224.7 MB Download
md5:f840d5882a8ef839cbef4257fbe9b32e
224.7 MB Download
md5:b9200c6ee008ed20802d4d1164592ebd
224.7 MB Download
md5:c0aa9938d36f5b51fc2d96237e3d6171
224.6 MB Download
md5:9c60e3f549c3a95fc0658dbe0813525b
212.7 MB Download
md5:0e640e61f52d6ed12d0431870b1ebca8
212.7 MB Download
md5:81d7b7ae91ba2bbb9281ff424fb58b3e
212.6 MB Download
md5:ec1909ae320c30862be276ef082b81d8
212.7 MB Download
md5:ebd2c010e39bd8ef6386d832ff9f166b
212.6 MB Download
md5:1aec0b31185adc42c6178b621cec9475
212.7 MB Download
md5:38658f11f4f3a07788b63a60e5270876
212.6 MB Download
md5:8b62ed24c461723b26a4bc285168d8aa
956.4 MB Download
md5:2b42d1744b6f305ab9e98d6e7e651c88
956.0 MB Download
md5:c2a92fa45ab5dbccc814a306cf0310a3
956.2 MB Download
md5:ac901f5f206e20832b395ed0f7117f83
956.4 MB Download
md5:bc3e3da873639eb70f90ca6a1339e27f
956.2 MB Download
md5:282cd25012b31d6016fc238d406d95f0
956.5 MB Download
md5:593d4863684a086a4162591bbb695be8
955.8 MB Download