Published January 27, 2025 | Version 1.0.0
Dataset Open

Webis Crowd RAG Corpus 2025

  • 1. Leipzig University
  • 2. ROR icon Center for Scalable Data Analytics and Artificial Intelligence
  • 3. ROR icon University of Kassel
  • 4. ROR icon Hessian Center for Artificial Intelligence
  • 5. ROR icon Friedrich Schiller University Jena
  • 6. ROR icon Bauhaus-Universität Weimar

Description

Data Documentation


responses.jsonl.gz

RAG responses of about 250 words written by human writers and LLMs in different response styles. 

Key Description Source Values
response The UUID of the response / task. Task Specification UUID
topic The topic ID of this task. Task Specification String ID
style The text style of the response written for this task. Task Specification One of essay, news, bullet
kind Whether this text was written by an LLM or a human Task Specification One of human, llm
query The query text of this topic. TREC RAG String value
references_ids The IDs of the 20 sources retrieved for this topics' query. Aligned with references_texts TREC RAG List of String IDs
references_texts The texts of the 20 sources retrieved for this topics' query. Aligned with references_ids TREC RAG List of String values
text The text as written by the human author or LLM. Writing Survey String
cleaned_text Text as cleaned by our preprocessing pipeline, without reference markers. Writing Survey String
statements Text parsed into individual statements, each with the corresponding references_ids cited. Writing Survey List of Dictionaries


ratings.jsonl.gz

Ratings on pairwise response utility as given by crowd workers. The columns prefixed {dimension} below are included once for each possible dimension (correctness_topical, coherence_logical, coherence_stylistic, coverage_broad, coverage_deep, consistency_internal, quality_overall).

Key Description Source Value
submission_id The UUID of the questionnaire this response pair was rated by. Task Specification UUID
query_id The topic id this response pair belongs to. TREC RAG String ID
response_a The UUID of the first response in this pair (displayed lefthand side). Task Specification UUID
response_b The UUID of the second response in this pair (displayed righthand side). Task Specification UUID
worker The UUIDs of the 5 workers completing this questionnaire. Task Specification List of UUID
{dimension}_vote The individual votes for the specified dimension by the 5 workers. Prolific Crowd Workers List of string, each entry a, n, or b
{dimension}_spam_probability The individual spam probabilities associated with each vote for the specified dimension. Prolific Crowd Workers List of float, each entry between 0 and 1
{dimension}_p_a The probability of the gold label being a for the specified dimension (first response better than second). Prolific Crowd Workers float
{dimension}_p_n The probability of the gold label being n for the specified dimension (both responses equal). Prolific Crowd Workers float
{dimension}_p_b The probability of the gold label being b for the specified dimension (second response better than first). Prolific Crowd Workers float
{dimension}_gold The gold label with highest probability for the specified dimension. Prolific Crowd Workers a, n, or b


llm_ratings.jsonl.gz


Ratings on pairwise response utility as given by an LLM. The columns prefixed {dimension} below are included once for each possible dimension (correctness_topical, coherence_logical, coherence_stylistic, coverage_broad, coverage_deep, consistency_internal, quality_overall).

Key Description Source Value
submission_id The UUID of the questionnaire this response pair was rated by. Task Specification UUID
query_id The topic id this response pair belongs to. TREC RAG String ID
response_a The UUID of the first response in this pair (displayed lefthand side). Task Specification UUID
response_b The UUID of the second response in this pair (displayed righthand side). Task Specification UUID
inference The inference mode the judgments were collected with. Task Specification combined, or individual
{dimension} The rating given by the LLM for this {dimension}. LLM Inference a, n, or b

grades.jsonl.gz


Pointwise, per-topic ranked grades as inferred by a Bradley-Terry probabilistic model. Not to be used as absolute values across their topic context!

Key Description Source Value
response The UUID of the response. Task Specification UUID
correctness_topical The topical correctness grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better.
coherence_logical The logical coherence grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better.
coherence_stylistic The stylistic coherence grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better.
coverage_broad The broad coverage grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better.
coverage_deep The deep coverage grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better.
consistency_internal The internal consistency grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better.
quality_overall The overall quality grade of this response. Pairwise Inference w. Bradley-Terry Model Integer, 1-6, per topic relative ranks, higher is better.

 

Files

Files (8.8 MB)

Name Size Download all
md5:b63d3896cf69e68a7093106fc16606be
11.8 kB Download
md5:a259d6ab4194e682fc85a8e62dd65955
40.1 kB Download
md5:c63c06902a80cdf4feb2061b562462dd
726.9 kB Download
md5:36f71d164befa760fa0763be52a1781e
8.1 MB Download

Additional details

Additional titles

Alternative title
Webis-CrowdRAG-25
Alternative title
CRAG-25
Alternative title
Webis-CrowdRAG