Podcast annotation dataset for paper "Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts "

doi:10.5281/zenodo.5765655

Published December 6, 2021 | Version 2, fix a data issue

Dataset Open

Podcast annotation dataset for paper "Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts "

1. Sirius XM

Dataset for paper "Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts". Please refer to the paper for details. Compared to the dataset used in the paper, 20 out of the 417 episodes have been removed due to copyright issues.

The data file contains the following fields:

- "episode_intro_start": the time stamp for episode introduction start (in milliseconds)

- "episode_intro_end": the time stamp for episode introduction end (in milliseconds)

- "program_intro_start": the time stamp for program introduction start (in milliseconds)

- "program_intro_end": the time stamp for program introduction end (in milliseconds)

- "program_name": name of the podcast program

- "episode_name": name of the podcast episode

- "transcription": JSON string containing the transcription, including the timestamps.

- "annotator": anonymized annotator ID.

Files

LICENSE.txt

Files (115.4 MB)

Name	Size	Download all
LICENSE.txt md5:3a86ee579a68bc4a89fef4251b030734	20.2 kB	Preview Download
podcast_intro_data_pub.tsv md5:344308ea2c2cb7204acbc53218b732ad	115.4 MB	Download

Additional details

Is supplement to: Journal article: https://arxiv.org/abs/2110.07096 (URL)

	All versions	This version
Views	534	281
Downloads	108	54
Data volume	5.4 GB	4.0 GB

Podcast annotation dataset for paper "Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts "

Creators

Description

Files

LICENSE.txt

Files (115.4 MB)

Additional details

Related works