Published May 22, 2017 | Version v1
Dataset Open

Word Embedding Data Sets Learned from Tweets and General Data

Creators

  • 1. Thomson Reuters

Description

This includes 10 word embedding data sets learned from about 400 million tweets and 7 billion words from general data. They can be used in tasks involving social media data, especially tweets, and other types of textual data. Users can choose different embedding sets based on their use cases; they can also easily try all of them to see which one provides the best performance for their application.

More details about the training data collection, word embedding generation, preprocessing steps, and how to use them can be found from the following paper:

Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh, Data Set: Word Embeddings Learned from Tweets and General Data, The 11th International AAAI Conference on Web and Social Media (ICWSM-17).  Montreal, Canada. May 16-18, 2017

Files

README.txt

Files (34.1 GB)

Name Size Download all
md5:5fd0dfc045e4d32bbf899e34f0097917
986 Bytes Preview Download
md5:fedf05d35024836abaaf4557998253d4
5.3 GB Download
md5:b245690c8c1548adba55256533da716b
38.4 MB Preview Download
md5:56aebe4c0eead65aa9be8ffdfa811db3
2.3 GB Download
md5:527b45a1b3de348e727f131031669efb
11.8 MB Preview Download
md5:d3e831b529e01310ff7cb5c2ff90e022
3.5 GB Download
md5:89248914f0a2243794edf330695ee56e
26.5 MB Preview Download
md5:ae829366fa7130237e2330c377c59a09
3.2 GB Download
md5:65f95cd6acd51b8e7595ad44f453bee0
17.0 MB Preview Download
md5:af5cd48c94d9fecc155e1c2ab14567b6
4.8 GB Download
md5:e40048d4161b860375aa8a3f4f81752f
36.3 MB Preview Download
md5:a0116e45c14b0d570a2b9bc7d5a3536e
1.6 GB Download
md5:7d8bfc5e287a5232630b987aab840205
7.6 MB Preview Download
md5:d92640342c331806e872d89573df433e
3.8 GB Download
md5:c4b1c14114519e1b3d68ab20487985f1
25.7 MB Preview Download
md5:3e5c8657d21f96a94761a781acb80a51
2.1 GB Download
md5:b26178d5bc9fddfb74f1b99fbdf93390
10.5 MB Preview Download
md5:484003e5850c531df814a00a267245be
4.5 GB Download
md5:224754a6a830b3bb481123bba884bfbd
32.0 MB Preview Download
md5:b84af96c51516417b9d6ed394806d130
2.6 GB Download
md5:885f12a187dd1ec97b43c55d8203aaba
13.2 MB Preview Download
md5:79e87b962efcef58753a4b5600448253
417.3 kB Preview Download