A Large TV Dataset for Speech and Music Activity Detection

Yun-Ning Hung; Chih-Wei Wu; Iroro Orife; Aaron Hipple; William Wolcott; Alexander Lerch

doi:10.1186/s13636-022-00253-8

Published September 3, 2022 | Version 1.0.0

Dataset Open

A Large TV Dataset for Speech and Music Activity Detection

1. Georgia Institute of Technology
2. Netflix Inc.

Automatic speech and music activity detection (SMAD) is an enabling task that can help segment, index, and pre-process audio content in radio broadcast and TV programs. However, due to copyright concerns and the cost of manual annotation, the limited availability of diverse and sizeable datasets hinders the progress of state-of-the-art (SOTA) data-driven approaches. We address this challenge by presenting a large-scale dataset containing Mel spectrogram, VGGish, and MFCCs features extracted from around 1600 hours of professionally produced audio tracks and their corresponding noisy labels indicating the approximate location of speech and music segments. The labels are derived from several sources such as subtitles. A test set curated by human annotators is also included as a subset for evaluation. To the best of our knowledge, this dataset is the first large-scale, open-sourced dataset that contains features extracted from professionally produced audio tracks and their corresponding frame-level speech and music annotations.

Files

README.txt

Files (98.6 GB)

Name	Size	Download all
README.txt md5:460685d7afe5f9ec356d159fddc756b4	1.0 kB	Preview Download
TVSM-cuesheet.zip md5:71751652f39774b7ac50ebe1ec944fb5	3.4 GB	Preview Download
TVSM-pseudo.zip md5:ae3cd13326563fbb2551749701491141	94.3 GB	Preview Download
TVSM-test.zip md5:6165a0d37b59805dcc3dd0716ebcc694	894.8 MB	Preview Download

Views

981

Downloads

Show more details

	All versions	This version
Views	2,265	2,260
Downloads	981	979
Data volume	53.0 TB	53.0 TB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

Published in

EURASIP Journal on Audio, Speech, and Music Processing, 2022.

License: Apache License 2.0

A permissive license whose main conditions require preservation of copyright and license notices. Contributors provide an express grant of patent rights. Licensed works, modifications, and larger works may be distributed under different terms and without source code. Read more

Technical metadata

Created: September 16, 2022
Modified: September 18, 2022

A Large TV Dataset for Speech and Music Activity Detection

Creators

Description

Files

README.txt

Files (98.6 GB)