Published February 1, 2021
| Version 1.0
Other
Open
Models for MeMAD language identification pipeline
Description
A collection of models for MeMAD spoken language identification pipeline. Zip contains four models:
- An xvector embedding model trained on 67 languages using the lidbox toolkit.
- A scikit-learn StandardScaler for standardizing the embedding model output before Naîve Bayes classification.
- A probabilistic linear discriminant analysis (PLDA) model for reducing the dimensions of the embedding vectors.
- A scikit-learn Naïve Bayes model for classifying embedding vectors to six categories: de, en, fi, fr, sv, x-nolang
Files
memad_lid_models.zip
Files
(54.0 MB)
Name | Size | Download all |
---|---|---|
md5:d363c12d1c12be83c861cb4648e0d429
|
54.0 MB | Preview Download |