Published March 12, 2026 | Version 1.0
Dataset Open

Sample Dataset for AI-Generated Scientific Storytelling

Authors/Creators

Description

This repository contains the dataset used to fine-tune the models evaluated in the paper.

The released data represents the training material and is provided to illustrate the structure, format, and intermediate representations used in the scientific storytelling pipeline.

Contents:

- dataset.json: metadata describing scientific papers and associated narrative sources.
- paper_transcriptions.json: parsed text of scientific papers used as model input.
- stories_with_text.json: narrative texts used as supervision for story generation.

Files

dataset.json

Files (6.4 MB)

Name Size Download all
md5:611293fd0647195d804a63c86e1d0eab
896.1 kB Preview Download
md5:0e23506aaa4b74c93f86f73aa1d62092
4.0 MB Preview Download
md5:cfc0a0a08cb0188756d3aff7ea3c31ec
600 Bytes Preview Download
md5:86664348918a06e46cca7c69dd584aa2
1.6 MB Preview Download

Additional details

Dates

Updated
2026-03-12