Dataset Open Access

Obtaining Better Static Word Embeddings Using Contextual Embedding Models

Gupta, Prakhar; Jaggi, Martin

Obtaining Better Static Word Embeddings Using Contextual Embedding Models

This repository contains the dataset of pretrained word embeddings as well as datasets used to train them, released with the following paper.

“Obtaining Better Static Word Embeddings Using Contextual Embedding Models” ACL (2021).

The wikipedia datasets were preprocessed from the wikipedia dump downloaded from dumps.wikimedia.org under Creative Commons Attribution-Share-Alike 3.0 License .

If you found the provided resources useful, please cite the above paper. Here's a BibTeX entry you may use:

@inproceedings{Gupta2021ObtainingPC,
  title={Obtaining Better Static Word Embeddings Using Contextual Embedding Models},
  author={Prakhar Gupta and Martin Jaggi},
  booktitle={ACL},
  year={2021}
}

Files (38.5 GB)
Name Size
bert_12layer_para.bin
md5:2675c81a4a5faa42483a67cd122b1b78
2.3 GB Download
bert_12layer_sent.bin
md5:62c01bbcd0d7c39d80a957e8dcc1bb9b
2.3 GB Download
bert_24layer_para.bin
md5:6792956409f3e69c1d151459149f83a1
3.1 GB Download
bert_24layer_sent.bin
md5:838601e37ff46b58b082aeb27edbda1c
3.1 GB Download
GPT2_12layer_para.bin
md5:fc54b7cebbd635cebef563387e6c8ae9
2.3 GB Download
GPT2_12layer_sent.bin
md5:6ee10cd9a5b9de6c04d1dac97dd6fbc0
2.3 GB Download
GPT2_24_layer_para.bin
md5:b65ada23017f186c6e7682e78c3893ba
3.1 GB Download
GPT2_24_layer_sent.bin
md5:7a76aebbe13271547dd23cfed3b7431e
3.1 GB Download
roberta_12layer_para.bin
md5:324a03b59043496951ae3b120183338f
2.3 GB Download
roberta_12layer_sent.bin
md5:858462fa1964e4a16c69d54d40747cd9
2.3 GB Download
roberta_24layer_para.bin
md5:de50e98bb84455ad797b7a646e3b2155
3.1 GB Download
roberta_24layer_sent.bin
md5:65fc50b800a312266e9d187b6993db49
3.1 GB Download
wiki_dataset_paragraphs.zip
md5:663cf15c7233f1574b71882ed6df5c46
2.7 GB Download
wiki_dataset_sentences.zip
md5:280c50e991a7ab9300c8539979eca6b3
3.1 GB Download
338
407
views
downloads
All versions This version
Views 338338
Downloads 407407
Data volume 1.1 TB1.1 TB
Unique views 272272
Unique downloads 169169

Share

Cite as