Obtaining Better Static Word Embeddings Using Contextual Embedding Models
Description
Obtaining Better Static Word Embeddings Using Contextual Embedding Models
This repository contains the dataset of pretrained word embeddings as well as datasets used to train them, released with the following paper.
“Obtaining Better Static Word Embeddings Using Contextual Embedding Models” ACL (2021).
The wikipedia datasets were preprocessed from the wikipedia dump downloaded from dumps.wikimedia.org under Creative Commons Attribution-Share-Alike 3.0 License .
If you found the provided resources useful, please cite the above paper. Here's a BibTeX entry you may use:
@inproceedings{Gupta2021ObtainingPC,
title={Obtaining Better Static Word Embeddings Using Contextual Embedding Models},
author={Prakhar Gupta and Martin Jaggi},
booktitle={ACL},
year={2021}
}
Files
wiki_dataset_paragraphs.zip
Files
(38.5 GB)
Name | Size | Download all |
---|---|---|
md5:2675c81a4a5faa42483a67cd122b1b78
|
2.3 GB | Download |
md5:62c01bbcd0d7c39d80a957e8dcc1bb9b
|
2.3 GB | Download |
md5:6792956409f3e69c1d151459149f83a1
|
3.1 GB | Download |
md5:838601e37ff46b58b082aeb27edbda1c
|
3.1 GB | Download |
md5:fc54b7cebbd635cebef563387e6c8ae9
|
2.3 GB | Download |
md5:6ee10cd9a5b9de6c04d1dac97dd6fbc0
|
2.3 GB | Download |
md5:b65ada23017f186c6e7682e78c3893ba
|
3.1 GB | Download |
md5:7a76aebbe13271547dd23cfed3b7431e
|
3.1 GB | Download |
md5:324a03b59043496951ae3b120183338f
|
2.3 GB | Download |
md5:858462fa1964e4a16c69d54d40747cd9
|
2.3 GB | Download |
md5:de50e98bb84455ad797b7a646e3b2155
|
3.1 GB | Download |
md5:65fc50b800a312266e9d187b6993db49
|
3.1 GB | Download |
md5:663cf15c7233f1574b71882ed6df5c46
|
2.7 GB | Preview Download |
md5:280c50e991a7ab9300c8539979eca6b3
|
3.1 GB | Preview Download |