Published February 1, 2021 | Version 1.0
Other Open

Models for MeMAD language identification pipeline

  • 1. Aalto University

Description

A collection of models for MeMAD spoken language identification pipeline. Zip contains four models:

  1. An xvector embedding model trained on 67 languages using the lidbox toolkit.
  2. A scikit-learn StandardScaler for standardizing the embedding model output before Naîve Bayes classification.
  3. A probabilistic linear discriminant analysis (PLDA) model for reducing the dimensions of the embedding vectors.
  4. A scikit-learn Naïve Bayes model for classifying embedding vectors to six categories: de, en, fi, fr, sv, x-nolang

Files

memad_lid_models.zip

Files (54.0 MB)

Name Size Download all
md5:d363c12d1c12be83c861cb4648e0d429
54.0 MB Preview Download

Additional details

Funding

MeMAD – Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy 780069
European Commission