Published June 30, 2021
| Version 1.0
Dataset
Open
Spanish CBOW Word Embeddings in FastText
Description
These Spanish word embeddings in FastText have been generated from the largest corpus ever made in Spanish till date. The corpus has more than 2TB of high-quality text, compiled from the different web crawlings done by the National Library of Spain from 2009 to 2019.
These are the CBOW embeddings, for the SKIP-GRAM embeddings see: https://zenodo.org/record/5046525
Citation
@article{gutierrezfandino2022,
author = {Asier Gutiérrez-Fandiño and Jordi Armengol-Estapé and Marc Pàmies and Joan Llop-Palao and Joaquin Silveira-Ocampo and Casimiro Pio Carrino and Carme Armentano-Oller and Carlos Rodriguez-Penagos and Aitor Gonzalez-Agirre and Marta Villegas},
title = {MarIA: Spanish Language Models},
journal = {Procesamiento del Lenguaje Natural},
volume = {68},
number = {0},
year = {2022},
issn = {1989-7553},
url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6405},
pages = {39--60}
}
Copyright
Copyright (c) 2021 Secretaría de Estado de Digitalización e Inteligencia Artificial
Notes
Files
LICENSE.txt
Files
(34.7 GB)
Name | Size | Download all |
---|---|---|
md5:7f3e9e42eb42523de81129b40dc98f8f
|
5.4 GB | Download |
md5:55b270c19689bffbd118a53e70e94be7
|
5.4 GB | Download |
md5:69db2c536c92f75f83ad1d0be1e1dff2
|
5.4 GB | Download |
md5:69425fe03b1f61f344d504efb510437a
|
5.4 GB | Download |
md5:a68ae8b6c2be1bcabece3c95c197ac8f
|
5.4 GB | Download |
md5:9f7d8af555f3289bdffc771b9101f091
|
5.4 GB | Download |
md5:8ae03bdd047f08c646e4ebd5b8c5f34c
|
2.5 GB | Download |
md5:2ab724713fdaf49e4523c4503bfd068d
|
18.7 kB | Preview Download |
md5:edc419d967ff42eccb6b3641b49dddfb
|
873 Bytes | Preview Download |