Published October 27, 2025 | Version v1.0.0
Dataset Open

Spanish word2vec embeddings trained on OpenSubtitles Part 2

  • 1. ROR icon Harrisburg University of Science and Technology

Description

This dataset contains the subs2vec embeddings for Spanish, as presented in https://zenodo.org/records/17243814. The embeddings were trained on large-scale subtitle corpora and represent semantic vector spaces derived from naturalistic language use in films and television from the OpenSubtitles 2018 datasets: https://opus.nlpl.eu/OpenSubtitles/corpus/version/OpenSubtitles

For this language, we provide all embedding variants explored in the study. Specifically, the dataset includes vectors generated under different combinations of:

  • Dimensionality: multiple vector sizes (e.g., 100, 200, 300, …)
  • Window size: varying context windows (e.g., 2, 5, 10, …)
  • Each file corresponds to a unique configuration (dimension × window size). 

Each file contains the vocabulary for that language (column 1) and then the embedding values (columns 2 through dimension size + 1). 

If you use this dataset, please cite:

Files

Files (47.7 GB)

Name Size Download all
md5:9dffde3bd11bbd578ccd6d26e86dd74a
2.6 GB Download
md5:ccfb7f93568e549886260cb9f45f73ed
2.6 GB Download
md5:ce91ea18c7c91ca10fe5503802126a36
2.6 GB Download
md5:5041465170e51c90ee81fe4de782bd8b
2.6 GB Download
md5:982b5129f25e965a705a3dd4b7520011
2.6 GB Download
md5:90b743373a73809d77171bb7f5cfdb09
2.6 GB Download
md5:13492ba094c727c6e1d6f982d9a3f0b2
2.6 GB Download
md5:233fced810837e727da6776d3ddad326
4.3 GB Download
md5:5ddaa14e0808f9dd600128aa489dc5f8
4.2 GB Download
md5:bf358aeb888d579dfe3d389edeb9f030
4.3 GB Download
md5:d534296e1a8387146cc18df022037840
4.3 GB Download
md5:54f4467c9979fb542a1b9bd5dfd0e274
4.3 GB Download
md5:29fc5ff7e1426a1142787104fd732239
4.3 GB Download
md5:74ed5391c85e1ed7219d63c4ef1198e2
4.3 GB Download

Additional details

Related works

Is supplement to
Standard: 10.5281/zenodo.17243812 (DOI)

Software