Datasets for "ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships"

Rodriguez Portela, Johan David; Manrique Piramanrique, Rubén Francisco; Perez Terán, Nicolás

doi:10.5281/zenodo.15002371

Published March 10, 2025 | Version v1

Dataset Open

Datasets for "ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships"

Contributors

Researcher (3):

ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships

These are the datasets for the paper ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships.

Dataset dictionary

This repository contains the splits that resulted from the research project "ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships". All the splits are in JSONL format and have the same fields per example:

sentence_1: First sentence of the pair.
sentence_2: Second sentence of the pair.
connector: Linking phrase used to extract pair.
connector_type: NLI label, between "contrasting", "entailment", "reasoning" or "neutral"
extraction_strategy: "linking_phrase" for "contrasting", "entailment", "reasoning" and "none" for neutral.
distance: How many sentences before the connector is the sentence_1
sentence_1_position: Number of sentence for sentence_1 in the source document
sentence_1_paragraph: Number of paragraph for sentence_1 in the source document
sentence_2_position: Number of sentence for sentence_2 in the source document
sentence_2_paragraph: Number of paragraph for sentence_2 in the source document
id: Unique identifier for the example
dataset: Source corpus of the pair. Metadata of corpus, including source can be found in dataset_metadata.xlsx.
genre: Writing genre of the dataset.
domain: Domain genre of the dataset.

Example:

{"sentence_1":"sefior Bcajavides no es moderado, tampoco lo convertirse e\u00f1 declarada divergencia de miras polileido en griego","sentence_2":"era mayor claricomentarios, as\u00ed de los peri\u00f3dicos como de los homes dado \u00e1 la voluntad de los hombres, sin que sobreticas","connector":"por consiguiente,","connector_type":"reasoning","extraction_strategy":"linking_phrase","distance":1.0,"sentence_1_paragraph":4,"sentence_1_position":86,"sentence_2_paragraph":4,"sentence_2_position":87,"id":"esnews__spanish_pd_news__531537","dataset":"esnews__spanish_pd_news","genre":"news","domain":"spanish_public_domain_news"}

Dataset files

ESNLIR_datasets.zip: Contains the splits used for BERT-based model training, validation and testing, including stress test splits.
labeled_final_dataset.jsonl: Is the final test dataset with 974 examples selected by human majority label matching the original linking phrase label.

Files

ESNLIR_datasets.zip

Files (651.2 MB)

Name	Size	Download all
dataset_metadata.xlsx md5:ef9498315a118cbc1ddf3b856b45e198	23.4 kB	Download
ESNLIR_datasets.zip md5:133f30b8abef3974fdeafbff48c0e6e0	650.4 MB	Preview Download
labeled_final_dataset.jsonl md5:d179e75d0dda23ebd68b905df5072c8c	781.3 kB	Download

	All versions	This version
Views	166	166
Downloads	171	171
Data volume	44.9 GB	44.9 GB

Datasets for "ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships"

Authors/Creators

Contributors

Researcher (3):

Description

ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships

Dataset dictionary

Dataset files

Files

ESNLIR_datasets.zip

Files (651.2 MB)