Published June 8, 2021 | Version v0
Dataset Open

Obtaining Better Static Word Embeddings Using Contextual Embedding Models

  • 1. EPFL

Description

Obtaining Better Static Word Embeddings Using Contextual Embedding Models

This repository contains the dataset of pretrained word embeddings as well as datasets used to train them, released with the following paper.

“Obtaining Better Static Word Embeddings Using Contextual Embedding Models” ACL (2021).

The wikipedia datasets were preprocessed from the wikipedia dump downloaded from dumps.wikimedia.org under Creative Commons Attribution-Share-Alike 3.0 License .

If you found the provided resources useful, please cite the above paper. Here's a BibTeX entry you may use:

@inproceedings{Gupta2021ObtainingPC,
  title={Obtaining Better Static Word Embeddings Using Contextual Embedding Models},
  author={Prakhar Gupta and Martin Jaggi},
  booktitle={ACL},
  year={2021}
}

Files

wiki_dataset_paragraphs.zip

Files (38.5 GB)

Name Size Download all
md5:2675c81a4a5faa42483a67cd122b1b78
2.3 GB Download
md5:62c01bbcd0d7c39d80a957e8dcc1bb9b
2.3 GB Download
md5:6792956409f3e69c1d151459149f83a1
3.1 GB Download
md5:838601e37ff46b58b082aeb27edbda1c
3.1 GB Download
md5:fc54b7cebbd635cebef563387e6c8ae9
2.3 GB Download
md5:6ee10cd9a5b9de6c04d1dac97dd6fbc0
2.3 GB Download
md5:b65ada23017f186c6e7682e78c3893ba
3.1 GB Download
md5:7a76aebbe13271547dd23cfed3b7431e
3.1 GB Download
md5:324a03b59043496951ae3b120183338f
2.3 GB Download
md5:858462fa1964e4a16c69d54d40747cd9
2.3 GB Download
md5:de50e98bb84455ad797b7a646e3b2155
3.1 GB Download
md5:65fc50b800a312266e9d187b6993db49
3.1 GB Download
md5:663cf15c7233f1574b71882ed6df5c46
2.7 GB Preview Download
md5:280c50e991a7ab9300c8539979eca6b3
3.1 GB Preview Download