Published September 23, 2021 | Version v1
Software Open

EDGAR-W2V Embeddings

  • 1. Institute of Informatics & Telecommunications

Description

EDGAR-W2V:  the word embeddings trained on EDGAR-CORPUS.

EDGAR-W2V is 200-dimensional and contains a financial vocabulary of 100k tokens, which is also attached.

EDGAR-W2V is found in the paper entitled "EDGAR-CORPUS: Billions of Tokens Make The World Go Round", Proceedings of the Workshop on Economics and Natural Language Processing (ECONLP) - co-located with EMNLP 2021

Files

edgar-w2v-200d.zip

Files (75.4 MB)

Name Size Download all
md5:a1ad8cce559786edb658b594cacb7e6f
75.4 MB Preview Download