Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published October 30, 2024 | Version v1
Dataset Open

BLM-OdI-Gen (Blackbird Language Matrices Object-Drop Alternation in Italian)

  • 1. ROR icon Idiap Research Institute
  • 2. ROR icon University of Geneva

Description

Description

BLM-OdI-Gen is a dataset in Italian for learning the underlying rules of causative/inchoative alternation in sentences, developed in the Blackbird Language Matrices (BLM) framework (this dataset is a subset from the training data of https://www.idiap.ch/dataset/BLM-OdI). In this task, an instance consists of a sequence of sentences with specific attributes. To predict the correct answer as the next element of the sequence, a model must correctly detect the underlying generative rules used to produce the dataset. An instance represents the object-drop alternation, where the subject bears the same semantic role (Agent) in both the transitive and intransitive forms and the verb does not have a causative meaning.

Blackbird Language Matrices (BLMs) are multiple-choice problems, where the input is a sequence of sentences built using specific generating rules, and the answer set consists of a correct answer that continues the input sequence, and several incorrect contrastive options, built by violating the underlying generating rules of the sentences. In a BLM matrix, all sentences share the targeted linguistic phenomenon (in this case object-drop alternation), but differ in other aspects relevant for the phenomenon in question. 

The BLM-OdI-Gen is one of the six sub-tasks of BLM-It challenge. All sub-tasks are instances of the general BLM task, but they differ along two dimensions: the linguistic problem defined (Agr, Caus, Od) and the lexical complexity of the data (II, III)1.  

The data comes grouped by lexical variation (i.e. type II/III) and each subset is split into train/test. The statistics of the current iteration of the dataset are (train:test split information):

type II 80:2080 
type III  80:2080

 

Reference

If you use this dataset,please cite the following publication:

Jiang, Chunyang & Samo, Giuseppe & Nastase, Vivi & Merlo, Paola. (2024). BLM-It — Blackbird Language Matrices for Italian: A CALAMITA Challenge. (TO APPEAR)

Files

Files (1.4 MB)

Name Size Download all
md5:e6df03e5cacaaa8a78a69ac49626c7c1
1.4 MB Download

Additional details

Funding

Disentangling linguistic intelligence: automatic generalisation of structure and meaning across languages TMAG-1_209426
Swiss National Science Foundation