ExpliCA Dataset

Clinciu, Miruna-Adriana; Hastie, Helen; Eshghi, Arash

doi:10.5281/zenodo.14066155

Published November 11, 2024 | Version v1

Dataset Open

ExpliCA Dataset

1. University of Edinburgh

The ExpliCA dataset comprises 100 causal natural language explanations (NLEs), each meticulously paired with a set of causal triples. This dataset was developed to advance research in explainable artificial intelligence, with a focus on understanding and modeling causal relationships in text.

The dataset has been structured to enable comprehensive analysis of both original, human-curated explanations and AI-generated explanations, allowing researchers to make direct comparisons between the two. This setup supports a deeper investigation into how causal reasoning is represented in AI-generated content versus human explanations.

Original Explanations and Causal Triples

Each of the 100 curated explanations is linked with a corresponding set of causal triples, designed to capture key components of the causal relationship:

T1: The subject or initiator of the causal relationship.
T2: The causal verb or predicate describing the cause-effect connection.
T3: The object or effect, representing the outcome of the causal relationship.

Generated Explanations

In addition to original explanations, the dataset includes several types of generated explanations:

Explanations Generated from Triples: Explanations generated directly from the causal triples to assess the potential of automated explanation generation.
Explanations Generated from Triples with Reference: Explanations generated from triples that also reference the original explanations, providing additional context and coherence.

Human, Automated and LLMa Evaluation Using the REFLEX Framework

To evaluate the quality and reliability of explanations, both human and automated evaluations were conducted, as well as evaluation using LLMs as evaluators.

Human Evaluation of Original Explanations: Human evaluators assessed the original explanations to establish baseline quality metrics.
Human Evaluation of Generated Explanations: Human evaluators reviewed the generated explanations (both from triples alone and with reference to original explanations) for clarity, accuracy, and consistency with causal relationships.

The REFLEX framework, as presented in the PhD thesis of Miruna Clinciu, was applied to evaluate both original and generated explanations.

Files

ExpliCA Dataset.zip

Files (9.7 MB)

Name	Size	Download all
ExpliCA Dataset.zip md5:3dc702032c391f8158e58e199cc4cbec	9.7 MB	Preview Download

Additional details

Available: 2024-11-04

Release after Thesis Submission

Repository URL: https://github.com/MirunaClinciu/ExpliCA
Programming language: Python

	All versions	This version
Views	180	180
Downloads	6	6
Data volume	58.4 MB	58.4 MB

ExpliCA Dataset

Original Explanations and Causal Triples

Generated Explanations

Human, Automated and LLMa Evaluation Using the REFLEX Framework

Files

ExpliCA Dataset.zip

Files (9.7 MB)

Additional details

Dates

Software

ExpliCA Dataset

Creators

Description

Original Explanations and Causal Triples

Generated Explanations

Human, Automated and LLMa Evaluation Using the REFLEX Framework

Files

ExpliCA Dataset.zip

Files (9.7 MB)

Additional details

Dates

Software