Published June 30, 2021 | Version 1.0
Dataset Open

Spanish Skip-Gram Word Embeddings in FastText

  • 1. Barcelona Supercomputing Center

Description

These Spanish word embeddings in FastText have been generated from the largest corpus ever made in Spanish till date. The corpus has more than 2TB of high-quality text, compiled from the different web crawlings done by the National Library of Spain from 2009 to 2019. 

These are the SKIP-GRAM embeddings, for the CBOW embeddings see: https://zenodo.org/record/5044988

Citation

@article{gutierrezfandino2022,
	author = {Asier Gutiérrez-Fandiño and Jordi Armengol-Estapé and Marc Pàmies and Joan Llop-Palao and Joaquin Silveira-Ocampo and Casimiro Pio Carrino and Carme Armentano-Oller and Carlos Rodriguez-Penagos and Aitor Gonzalez-Agirre and Marta Villegas},
	title = {MarIA: Spanish Language Models},
	journal = {Procesamiento del Lenguaje Natural},
	volume = {68},
	number = {0},
	year = {2022},
	issn = {1989-7553},
	url = {http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6405},
	pages = {39--60}
}

Copyright

Copyright (c) 2021 Secretaría de Estado de Digitalización e Inteligencia Artificial

Notes

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan-TL).

Files

LICENSE.txt

Files (34.7 GB)

Name Size Download all
md5:2ab724713fdaf49e4523c4503bfd068d
18.7 kB Preview Download
md5:bf1a424d6f4712d81c7381acaed3eda6
5.4 GB Download
md5:351ff1eebfca861526c01b40d77aec72
5.4 GB Download
md5:4afce5ed8f1366fc53f562112abcc34b
5.4 GB Download
md5:8127736f74bff1fa901d903906ff07ec
5.4 GB Download
md5:d66f8c2d4c74c01029ac5cbfae055100
5.4 GB Download
md5:e93ee8d7f8509246cb362a635020a46e
5.4 GB Download
md5:8f06cd560d4fc5f5a91e5ed98967a566
2.5 GB Download