Published June 10, 2021 | Version 1.2
Dataset Open

GloVe 6B Vectors

  • 1. Leipzig University

Description

GloVe 6B word embeddings from https://nlp.stanford.edu/projects/glove/ (Wikipedia 2014 + Gigaword 5: 6B tokens, 400K vocab, uncased, 50d, 100d, 200d, & 300d vectors), split into single files, converted to gensim's binary word2vec format and zip-compressed (LZMA).

Splitting this data into single files allows for faster downloads and inclusion in memory-restricted environments such as Binder.

To load these vectors, use gensim.models.KeyedVectors.load_word2vec_format(path, binary=True). To uncompress, use Python's zipfile.ZipFile.

This data is made available under the Public Domain Dedication and License v1.0 whose full text can be found at: http://www.opendatacommons.org/licenses/pddl/1.0/.

Files

glove.6B.100d.zip

Files (740.6 MB)

Name Size Download all
md5:f19871e3053750198004fb2acc1f8d44
115.1 MB Preview Download
md5:1df2c1a318572f7b9505e1a962b7a052
227.2 MB Preview Download
md5:dc1af9ca593acdbf870759f0cf9ced99
339.4 MB Preview Download
md5:a6c8d6e1e52401e913e5f6fa137b1d53
58.9 MB Preview Download