# SMAD - TV
## An introduction of the Speech and Music activity detection (SMAD) dataset

Speech and Music activity detection (SMAD) dataset contains speech and music activity labels of studio-produced high-quality TV show audio. Due to copy-right issue, we can only share the mel spectrogram and mfcc features of each audio sample. Each audio is derived from 6.1 channels audio. The speech labels are derived from subtitles. The dataset is splitted into three subsets based on how the music labels are derived:

- TVSM-test: the music labels are annotated by human annotators, hence has better accuracy. We use this subset for evaluation.
- TVSM-cuesheet: the music labels are derived from cuesheet - an internal proprietary sources that document the appearance of music in TV shows.
- TVSM-pseudo: the music labels are predicted by deep-learning model pre-trained on TVSM-cuesheet.

Please visit our paper for more details: [EURASIP open access](https://asmp-eurasipjournals.springeropen.com/articles/10.1186/s13636-022-00253-8).