BLM-CausH (Blackbird Language Matrices Causative and Passive Alternation in Hebrew)

Samo, Giuseppe; Merlo, Paola

doi:10.34777/00jx-2739

Published May 21, 2026 | Version v1

Dataset Open

BLM-CausH (Blackbird Language Matrices Causative and Passive Alternation in Hebrew)

1. Idiap Research Institute
2. University of Geneva

Description

BLM-CausH is a dataset in Modern Hebrew for learning the causative alternation, developed in the Blackbird Language Matrices (BLM) framework. In this task, an instance consists of sequences of sentences with specific attributes. To predict the correct answer as the next element of the sequence, a model must correctly detect the underlying generative rules used to produce the dataset.

The instantiated data are extracted from natural data extracted from two treebanks of Universal Dependencies of Hebrew containing respectively news (HBT v.2.15, Tsarfaty 2013; McDonald et al. 2013; 114,648 tokens, 6,143 trees) and encyclopaedic entries (IAHLTWiki v. 2.15, henceforth IW; Zeldes et al. 2022; 103,395 tokens; 5,039 trees). We collected sentences where the main verb is annotated with relevant the morphosyntactic property HEBBINYAN.

The data comes grouped by target voice, in two groups SENT (full sentences) and VERB (verb only) and each subset is split into train/test. The statistics of the current iteration of the dataset are (train:test split information):

paal-SENT	1800:200
paal-VERB	1800:200
nifal-SENT	1800:200
nifal-VERB	1800:200

hifil-SENT	1800:200
hifil-VERB	1800:200
hufal-SENT	1800:200
hufal-VERB	1800:200

Reference

If you use this dataset, please cite the following publication:

Giuseppe Samo, Paola Merlo, Modelling the Morphology of Verbal Paradigms: A Case Study in the Tokenization of Turkish and Hebrew, paper accepted at the SigTurk – SIGTURK 2026 Workshop

Files

Files (4.2 MB)

Name	Size	Download all
BLM-CausH.tar.gz md5:cf9ccfec435784963a4e589d14f27de3	4.2 MB	Download

Additional details

Swiss National Science Foundation
Disentangling linguistic intelligence: automatic generalisation of structure and meaning across languages 209426

	All versions	This version
Views	55	55
Downloads	30	30
Data volume	126.0 MB	126.0 MB

BLM-CausH (Blackbird Language Matrices Causative and Passive Alternation in Hebrew)

Authors/Creators

Description

Description

Reference

Files

Files (4.2 MB)

Additional details

Funding