Speech recognition alignments for Finnish parliament data

Virkkunen, Anja; Mansikkaniemi, André; Kurimo, Mikko

doi:10.5281/zenodo.4581941

Published May 15, 2021 | Version 1.0

Dataset Open

Speech recognition alignments for Finnish parliament data

1. Aalto University

This dataset contains speech from Finnish parliament 2008-2020 plenary sessions, segmented and aligned for speech recognition training. In total, the training set has:

1.4 million samples
3100 hours of audio
460 speakers
over 19 million word tokens

Additionally, the upload contains 5h long development and 5h long evaluation sets described in publication 10.21437/Interspeech.2017-1115. Due to the size of the training set (~300 GB) and Zenodo upload limit (50 GB), only the development and evaluation sets are published on Zenodo. Rest of the data is available at: http://urn.fi/urn:nbn:fi:lb-2021051903

The training set comes in two parts:

2008-2016 set which is originally described in publication 10.21437/Interspeech.2017-1115. This set includes a list of samples from sessions in 2008-2014 that can be combined with the 2015-2020 set to form the 3100 hour training set.
A new 2015-2020 dataset.

All audio samples are single-channel, 16 kHz and 16-bit wav files. Each wav file has corresponding transcript in a .trn text file. The data is machine-extracted so there still remains small inaccuracies in the training set transcripts and possibly few Swedish samples. Development and evaluation sets have been corrected by hand.

The licenses can be viewed at:

http://urn.fi/urn:nbn:fi:lb-2019112822 (audio)
http://urn.fi/urn:nbn:fi:lb-2019112823 (text)

The code used in extraction is available at:

https://github.com/aalto-speech/finnish-parliament-scripts (2008-2014, dev and eval sets)
https://github.com/aalto-speech/fi-parliament-tools (2015-2020 set)

Files

fi-parl-asr-dev-eval.zip

Files (1.1 GB)

Name	Size	Download all
fi-parl-asr-dev-eval.zip md5:4fa2b5e22b3b106982797e1ac8445f42	1.1 GB	Preview Download

Additional details

References: Conference paper: 10.21437/Interspeech.2017-1115 (DOI)

European Commission
MeMAD - Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy 780069

	All versions	This version
Views	728	725
Downloads	115	115
Data volume	134.1 GB	134.1 GB

fi-parl-asr-dev-eval.zip

Files (1.1 GB)

Related works

Funding

Speech recognition alignments for Finnish parliament data

Authors/Creators

Description

Files

fi-parl-asr-dev-eval.zip

Files (1.1 GB)

Additional details

Related works

Funding