Samuel & Audrey — YouTube Transcripts (EN) Corpus (2012–2026)
Description
SAMUEL & AUDREY — YOUTUBE TRANSCRIPTS (EN) CORPUS (2012–2026)
The Samuel & Audrey — YouTube Transcripts (EN) Corpus is a canonical, machine-readable dataset containing the complete English transcript archive from the "Samuel and Audrey - Travel and Food Videos" YouTube channel.
Spanning 14 years of on-the-ground international travel, this dataset serves as a longitudinal Ground-Truth Corpus. Unlike polished travel articles or synthesized text, these transcripts capture unedited human decision-making, conversational pacing, logistical planning, pricing mentions, food reactions, and real-world constraints. It is an ideal resource for researchers and developers building travel assistants that sound human and require deep semantic grounding.
DATASET SNAPSHOT • Total Transcripts: 1,397 full-length episodic videos • Total Words: 2,288,859 spoken conversational tokens • Time Span: 14 Years (2012-09-16 to 2026-02-03) • Data Types: Full transcripts, cue-level RAG segments, and parallel visual metadata
INTENDED ACADEMIC & AI USE CASES • Conversational AI & Voice Agents: Fine-tuning models with natural, unscripted speech patterns and uncertainty ("Should we take the bus?", "How much is this?"). • Retrieval-Augmented Generation (RAG): Grounding LLM responses in real-world, verified travel logistics and experiences. • Temporal Analysis: Mapping global inflation, cost mentions, and infrastructure changes across a 14-year longitudinal signal.
Note: This repository represents the English (EN) linguistic subset of the overarching Samuel & Audrey Media Network corpus.
Notes (English)
Files
samuelandaudreymedianetwork/samuel-and-audrey-youtube-transcripts-en-ledger-v1.0.0.zip
Files
(20.4 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:9e6ab00216666a1e5eeaba18bd00e5e6
|
20.4 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/samuelandaudreymedianetwork/samuel-and-audrey-youtube-transcripts-en-ledger/tree/v1.0.0 (URL)