Published October 30, 2024 | Version v1
Dataset Open

BLM-OdI (Blackbird Language Matrices Object Drop verb alternations in Italian)

  • 1. ROR icon Idiap Research Institute
  • 2. ROR icon University of Geneva

Description

Description

BLM-OdI is an Object-Drop (OD) alternation dataset for testing lexical semantic properties of verbs, their ability to enter or not a causative alternation. The subject in OD bears the same semantic role (Agent) in both the transitive and intransitive forms (L’artista dipingeva la finestra/L’artista dipingeva the artist painted the window’/‘the artist painted’) and the verb does not have a causative meaning.

Blackbird Language Matrices (BLMs) are multiple-choice problems, where the input is a sequence of sentences built using specific generative rules, and the answer set consists of a correct answer that continues the input sequence, and several incorrect contrastive options. The contrastive options are built by violating the underlying generating rules of the sentences. In a  BLM matrix, all sentences share the targeted linguistic phenomenon (in this case verb alternations), but differ in other aspects relevant for the phenomenon in question.   

BLM datasets also have a lexical variation dimension, to explore the impact of lexical variation on detecting relevant structures: type I – minimal lexical variation for sentences within an instance, type II – one word difference across the sentences within an instance, type III – maximal lexical variation within an instance. 

The data comes grouped by lexical variation (i.e. type I/II/III) and each subset is split into train/test. Each split contains 2140 training and 240 testing instances.

 

Reference

If you use this dataset,please cite the following publication:

Nastase, Vivi& Samo, Giuseppe & Jiang, Chunyang & Merlo, Paola. (2024). Exploring Italian sentence embeddings properties through multi-tasking. DOI: 10.48550/arXiv.2409.06622.

Files

Files (4.6 MB)

Name Size Download all
md5:d47935b059e16e6dfe4877453fffed0f
4.6 MB Download

Additional details

Related works

Is described by
Conference paper: 10.48550/arXiv.2409.06622 (DOI)

Funding

Disentangling linguistic intelligence: automatic generalisation of structure and meaning across languages TMAG-1_209426
Swiss National Science Foundation