Webis Crowd RAG Corpus 2025

Gienapp, Lukas; Hagen, Tim; Fröbe, Maik; Hagen, Matthias; Stein, Benno; Potthast, Martin; Scells, Harrisen

doi:10.5281/zenodo.14748980

Published January 27, 2025 | Version 1.0.0

Dataset Open

Webis Crowd RAG Corpus 2025

1. Leipzig University
2. Center for Scalable Data Analytics and Artificial Intelligence
3. University of Kassel
4. Hessian Center for Artificial Intelligence
5. Friedrich Schiller University Jena
6. Bauhaus-Universität Weimar

Data Documentation

responses.jsonl.gz

RAG responses of about 250 words written by human writers and LLMs in different response styles.

Key	Description	Source	Values
`response`	The UUID of the response / task.	Task Specification	UUID
`topic`	The topic ID of this task.	Task Specification	String ID
`style`	The text style of the response written for this task.	Task Specification	One of `essay`, `news`, `bullet`
`kind`	Whether this text was written by an LLM or a human	Task Specification	One of `human`, `llm`
`query`	The query text of this topic.	TREC RAG	String value
`references_ids`	The IDs of the 20 sources retrieved for this topics' query. Aligned with `references_texts`	TREC RAG	List of String IDs
`references_texts`	The texts of the 20 sources retrieved for this topics' query. Aligned with `references_ids`	TREC RAG	List of String values
`text`	The text as written by the human author or LLM.	Writing Survey	String
`cleaned_text`	Text as cleaned by our preprocessing pipeline, without reference markers.	Writing Survey	String
`statements`	Text parsed into individual statements, each with the corresponding `references_ids` cited.	Writing Survey	List of Dictionaries

ratings.jsonl.gz

Ratings on pairwise response utility as given by crowd workers. The columns prefixed {dimension} below are included once for each possible dimension (correctness_topical, coherence_logical, coherence_stylistic, coverage_broad, coverage_deep, consistency_internal, quality_overall).

Key	Description	Source	Value
`submission_id`	The UUID of the questionnaire this response pair was rated by.	Task Specification	UUID
`query_id`	The topic id this response pair belongs to.	TREC RAG	String ID
`response_a`	The UUID of the first response in this pair (displayed lefthand side).	Task Specification	UUID
`response_b`	The UUID of the second response in this pair (displayed righthand side).	Task Specification	UUID
`worker`	The UUIDs of the 5 workers completing this questionnaire.	Task Specification	List of UUID
`{dimension}_vote`	The individual votes for the specified dimension by the 5 workers.	Prolific Crowd Workers	List of string, each entry `a`, `n`, or `b`
`{dimension}_spam_probability`	The individual spam probabilities associated with each vote for the specified dimension.	Prolific Crowd Workers	List of float, each entry between 0 and 1
`{dimension}_p_a`	The probability of the gold label being `a` for the specified dimension (first response better than second).	Prolific Crowd Workers	float
`{dimension}_p_n`	The probability of the gold label being `n` for the specified dimension (both responses equal).	Prolific Crowd Workers	float
`{dimension}_p_b`	The probability of the gold label being `b` for the specified dimension (second response better than first).	Prolific Crowd Workers	float
`{dimension}_gold`	The gold label with highest probability for the specified dimension.	Prolific Crowd Workers	`a`, `n`, or `b`

llm_ratings.jsonl.gz

Ratings on pairwise response utility as given by an LLM. The columns prefixed {dimension} below are included once for each possible dimension (correctness_topical, coherence_logical, coherence_stylistic, coverage_broad, coverage_deep, consistency_internal, quality_overall).

Key	Description	Source	Value
`submission_id`	The UUID of the questionnaire this response pair was rated by.	Task Specification	UUID
`query_id`	The topic id this response pair belongs to.	TREC RAG	String ID
`response_a`	The UUID of the first response in this pair (displayed lefthand side).	Task Specification	UUID
`response_b`	The UUID of the second response in this pair (displayed righthand side).	Task Specification	UUID
`inference`	The inference mode the judgments were collected with.	Task Specification	`combined`, or `individual`
`{dimension}`	The rating given by the LLM for this `{dimension}`.	LLM Inference	`a`, `n`, or `b`

grades.jsonl.gz

Pointwise, per-topic ranked grades as inferred by a Bradley-Terry probabilistic model. Not to be used as absolute values across their topic context!

Key	Description	Source	Value
`response`	The UUID of the response.	Task Specification	UUID
`correctness_topical`	The topical correctness grade of this response.	Pairwise Inference w. Bradley-Terry Model	Integer, 1-6, per topic relative ranks, higher is better.
`coherence_logical`	The logical coherence grade of this response.	Pairwise Inference w. Bradley-Terry Model	Integer, 1-6, per topic relative ranks, higher is better.
`coherence_stylistic`	The stylistic coherence grade of this response.	Pairwise Inference w. Bradley-Terry Model	Integer, 1-6, per topic relative ranks, higher is better.
`coverage_broad`	The broad coverage grade of this response.	Pairwise Inference w. Bradley-Terry Model	Integer, 1-6, per topic relative ranks, higher is better.
`coverage_deep`	The deep coverage grade of this response.	Pairwise Inference w. Bradley-Terry Model	Integer, 1-6, per topic relative ranks, higher is better.
`consistency_internal`	The internal consistency grade of this response.	Pairwise Inference w. Bradley-Terry Model	Integer, 1-6, per topic relative ranks, higher is better.
`quality_overall`	The overall quality grade of this response.	Pairwise Inference w. Bradley-Terry Model	Integer, 1-6, per topic relative ranks, higher is better.

Files

Files (8.8 MB)

Name	Size	Download all
grades.jsonl.gz md5:b63d3896cf69e68a7093106fc16606be	11.8 kB	Download
llm_ratings.jsonl.gz md5:a259d6ab4194e682fc85a8e62dd65955	40.1 kB	Download
ratings.jsonl.gz md5:c63c06902a80cdf4feb2061b562462dd	726.9 kB	Download
responses.jsonl.gz md5:36f71d164befa760fa0763be52a1781e	8.1 MB	Download

Additional details

Alternative title: Webis-CrowdRAG-25
Alternative title: CRAG-25
Alternative title: Webis-CrowdRAG

Repository URL: https://github.com/webis-de/sigir25-rag-crowdsourcing

	All versions	This version
Views	116	116
Downloads	155	155
Data volume	399.9 MB	399.9 MB

Data Documentation

responses.jsonl.gz

ratings.jsonl.gz

llm_ratings.jsonl.gz

grades.jsonl.gz

Files (8.8 MB)

Additional titles

Software

Webis Crowd RAG Corpus 2025

Authors/Creators

Description

Data Documentation

responses.jsonl.gz

ratings.jsonl.gz

llm_ratings.jsonl.gz

grades.jsonl.gz

Files

Files (8.8 MB)

Additional details

Additional titles

Software