Datasets for "ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships"
Description
ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships
These are the datasets for the paper ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships.
Dataset dictionary
This repository contains the splits that resulted from the research project "ESNLIR: A Spanish Multi-Genre Dataset with Causal Relationships". All the splits are in JSONL format and have the same fields per example:
- sentence_1: First sentence of the pair.
- sentence_2: Second sentence of the pair.
- connector: Linking phrase used to extract pair.
- connector_type: NLI label, between "contrasting", "entailment", "reasoning" or "neutral"
- extraction_strategy: "linking_phrase" for "contrasting", "entailment", "reasoning" and "none" for neutral.
- distance: How many sentences before the connector is the sentence_1
- sentence_1_position: Number of sentence for sentence_1 in the source document
- sentence_1_paragraph: Number of paragraph for sentence_1 in the source document
- sentence_2_position: Number of sentence for sentence_2 in the source document
- sentence_2_paragraph: Number of paragraph for sentence_2 in the source document
- id: Unique identifier for the example
- dataset: Source corpus of the pair. Metadata of corpus, including source can be found in dataset_metadata.xlsx.
- genre: Writing genre of the dataset.
- domain: Domain genre of the dataset.
Example:
{"sentence_1":"sefior Bcajavides no es moderado, tampoco lo convertirse e\u00f1 declarada divergencia de miras polileido en griego","sentence_2":"era mayor claricomentarios, as\u00ed de los peri\u00f3dicos como de los homes dado \u00e1 la voluntad de los hombres, sin que sobreticas","connector":"por consiguiente,","connector_type":"reasoning","extraction_strategy":"linking_phrase","distance":1.0,"sentence_1_paragraph":4,"sentence_1_position":86,"sentence_2_paragraph":4,"sentence_2_position":87,"id":"esnews__spanish_pd_news__531537","dataset":"esnews__spanish_pd_news","genre":"news","domain":"spanish_public_domain_news"}
Dataset files
- ESNLIR_datasets.zip: Contains the splits used for BERT-based model training, validation and testing, including stress test splits.
- labeled_final_dataset.jsonl: Is the final test dataset with 974 examples selected by human majority label matching the original linking phrase label.
Files
ESNLIR_datasets.zip
Files
(651.2 MB)
Name | Size | Download all |
---|---|---|
md5:ef9498315a118cbc1ddf3b856b45e198
|
23.4 kB | Download |
md5:133f30b8abef3974fdeafbff48c0e6e0
|
650.4 MB | Preview Download |
md5:d179e75d0dda23ebd68b905df5072c8c
|
781.3 kB | Download |