Models for MeMAD language identification pipeline

Published February 1, 2021 | Version 1.0

Other Open

A collection of models for MeMAD spoken language identification pipeline. Zip contains four models:

An xvector embedding model trained on 67 languages using the lidbox toolkit.
A scikit-learn StandardScaler for standardizing the embedding model output before Naîve Bayes classification.
A probabilistic linear discriminant analysis (PLDA) model for reducing the dimensions of the embedding vectors.
A scikit-learn Naïve Bayes model for classifying embedding vectors to six categories: de, en, fi, fr, sv, x-nolang

Files

Name	Size	Download all
memad_lid_models.zip md5:d363c12d1c12be83c861cb4648e0d429	54.0 MB	Preview Download

European Commission
MeMAD - Methods for Managing Audiovisual Data: Combining Automatic Efficiency with Human Accuracy 780069

301

Views

Downloads

Show more details

DOI

Resource type

Other

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more