Published June 30, 2021 | Version 1.0
Dataset Open

Spanish CBOW Word Embeddings in FastText

  • 1. Barcelona Supercomputing Center

Description

These Spanish word embeddings in FastText have been generated from the largest corpus ever made in Spanish till date. The corpus has more than 2TB of high-quality text, compiled from the different web crawlings done by the National Library of Spain from 2009 to 2019. 

These are the CBOW embeddings, for the SKIP-GRAM embeddings see: https://zenodo.org/record/5046525

Citation

@article{gutierrezfandino2022,
	author = {Asier Gutiérrez-Fandiño and Jordi Armengol-Estapé and Marc Pàmies and Joan Llop-Palao and Joaquin Silveira-Ocampo and Casimiro Pio Carrino and Carme Armentano-Oller and Carlos Rodriguez-Penagos and Aitor Gonzalez-Agirre and Marta Villegas},
	title = {MarIA: Spanish Language Models},
	journal = {Procesamiento del Lenguaje Natural},
	volume = {68},
	number = {0},
	year = {2022},
	issn = {1989-7553},
	url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6405},
	pages = {39--60}
}

Copyright

Copyright (c) 2021 Secretaría de Estado de Digitalización e Inteligencia Artificial

Notes

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan-TL).

Files

LICENSE.txt

Files (34.7 GB)

Name Size Download all
md5:7f3e9e42eb42523de81129b40dc98f8f
5.4 GB Download
md5:55b270c19689bffbd118a53e70e94be7
5.4 GB Download
md5:69db2c536c92f75f83ad1d0be1e1dff2
5.4 GB Download
md5:69425fe03b1f61f344d504efb510437a
5.4 GB Download
md5:a68ae8b6c2be1bcabece3c95c197ac8f
5.4 GB Download
md5:9f7d8af555f3289bdffc771b9101f091
5.4 GB Download
md5:8ae03bdd047f08c646e4ebd5b8c5f34c
2.5 GB Download
md5:2ab724713fdaf49e4523c4503bfd068d
18.7 kB Preview Download
md5:edc419d967ff42eccb6b3641b49dddfb
873 Bytes Preview Download