Published November 4, 2021 | Version 1.0
Dataset Open

Social Sciences Word Embeddings in FastText

  • 1. GESIS - Leibniz Institute for the Social Sciences

Description

These social science word embeddings in FastText have been created from 37,604 open access social science research papers from the social science access repository (https://www.gesis.org/ssoar/home). They are available in German and English.

(skipgram model, n-grams with n≥3 and n≤6, different dimensions (100, 150, 200, 300, 500), five epochs, learning rate 0.05, five negative examples)

Please cite:

Schiffers, Ricardo, Dagmar Kern, and Daniel Hienert. 2022. "Evaluation of Word Embeddings for the Social Sciences." In Proceedings of the 6th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, edited by Stefania Degaetano, Anna Kazantseva, Nils Reiter, and Stan Szpakowicz, 1-6. Gyeongju: Association for Computational Linguistics. https://aclanthology.org/2022.latechclfl-1.1.

Files

Files (25.2 GB)

Name Size Download all
md5:ba843936cc209dc4783422b114b8f350
1.1 GB Download
md5:7944171ad3ea2aab2458029a0f0515aa
1.7 GB Download
md5:c70c2a3c253669c42dbb43bc071d747f
2.3 GB Download
md5:3919f9eb29c0e18a3657b3c8874f180f
3.4 GB Download
md5:5053de7e4ed6c3ec3186f310cdc97307
5.6 GB Download
md5:2f06400d3e30da1dbca5ed24ef58d6b4
890.7 MB Download
md5:3b77b65944562e3e311fae3c62ea2ebd
1.3 GB Download
md5:e144382cb9b462cb5cd9b35371bd75bc
1.8 GB Download
md5:a4c848d6d47e9bcc5a88fab245c747aa
2.7 GB Download
md5:202ae7ebca1fa739331c7be8b928b74a
4.4 GB Download