Spanish CBOW Word Embeddings in FastText

Gutiérrez-Fandiño, Asier

doi:10.5281/zenodo.5044988

Published June 30, 2021 | Version 1.0

Dataset Open

Spanish CBOW Word Embeddings in FastText

Gutiérrez-Fandiño, Asier¹

1. Barcelona Supercomputing Center

These Spanish word embeddings in FastText have been generated from the largest corpus ever made in Spanish till date. The corpus has more than 2TB of high-quality text, compiled from the different web crawlings done by the National Library of Spain from 2009 to 2019.

These are the CBOW embeddings, for the SKIP-GRAM embeddings see: https://zenodo.org/record/5046525

Citation

@article{gutierrezfandino2022,
	author = {Asier Gutiérrez-Fandiño and Jordi Armengol-Estapé and Marc Pàmies and Joan Llop-Palao and Joaquin Silveira-Ocampo and Casimiro Pio Carrino and Carme Armentano-Oller and Carlos Rodriguez-Penagos and Aitor Gonzalez-Agirre and Marta Villegas},
	title = {MarIA: Spanish Language Models},
	journal = {Procesamiento del Lenguaje Natural},
	volume = {68},
	number = {0},
	year = {2022},
	issn = {1989-7553},
	url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6405},
	pages = {39--60}
}

Copyright

Notes

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan-TL).

Files

LICENSE.txt

Files (34.7 GB)

Name	Size
cbow.part01.rar md5:7f3e9e42eb42523de81129b40dc98f8f	5.4 GB	Download
cbow.part02.rar md5:55b270c19689bffbd118a53e70e94be7	5.4 GB	Download
cbow.part03.rar md5:69db2c536c92f75f83ad1d0be1e1dff2	5.4 GB	Download
cbow.part04.rar md5:69425fe03b1f61f344d504efb510437a	5.4 GB	Download
cbow.part05.rar md5:a68ae8b6c2be1bcabece3c95c197ac8f	5.4 GB	Download
cbow.part06.rar md5:9f7d8af555f3289bdffc771b9101f091	5.4 GB	Download
cbow.part07.rar md5:8ae03bdd047f08c646e4ebd5b8c5f34c	2.5 GB	Download
LICENSE.txt md5:2ab724713fdaf49e4523c4503bfd068d	18.7 kB	Preview Download
README.md md5:edc419d967ff42eccb6b3641b49dddfb	873 Bytes	Preview Download

	All versions	This version
Views	1,129	1,128
Downloads	744	743
Data volume	3.6 TB	3.6 TB

Spanish CBOW Word Embeddings in FastText

Authors/Creators

Description

Notes

Files

LICENSE.txt

Files (34.7 GB)