Dataset Open Access

Podcast annotation dataset for paper "Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts "

Jing, Elise; Schneck, Kristiana; Egan, Dennis; Waterman, Scott A.

Dataset for paper "Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts". Please refer to the paper for details. Compared to the dataset used in the paper, 20 out of the 417 episodes have been removed due to copyright issues. 

The data file contains the following fields:

- "episode_intro_start": the time stamp for episode introduction start (in milliseconds)

- "episode_intro_end": the time stamp for episode introduction end (in milliseconds)

- "program_intro_start": the time stamp for program introduction start (in milliseconds)

- "program_intro_end": the time stamp for program introduction end (in milliseconds)

- "program_name": name of the podcast program

- "episode_name": name of the podcast episode

- "transcription": JSON string containing the transcription, including the timestamps.

- "annotator": anonymized annotator ID.

Files (115.4 MB)
Name Size
LICENSE.txt
md5:3a86ee579a68bc4a89fef4251b030734
20.2 kB Download
podcast_intro_data_pub.tsv
md5:344308ea2c2cb7204acbc53218b732ad
115.4 MB Download
102
19
views
downloads
All versions This version
Views 10271
Downloads 195
Data volume 678.6 MB346.1 MB
Unique views 8367
Unique downloads 84

Share

Cite as