Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published January 19, 2021 | Version 1.0
Other Open

Spanish COVID-19 Twitter Embeddings in FastText

  • 1. Barcelona Supercomputing Center

Description

Intro

300-dimensional FastText embeddings generated from 140 million tweets in Spanish. All tweets are COVID19-related, meaning that they include one or more keywords related to COVID-19 and lockdown.

 

Please, cite:

Miranda-Escalada, A., Farré-Maduell, E., Lima-López, S., Gascó, L., Briva-Iglesias, V., Agüero-Torales, M., & Krallinger, M. (2021, June). The profner shared task on automatic recognition of occupation mentions in social media: systems, evaluation, guidelines, embeddings and corpora. In Proceedings of the Sixth Social Media Mining for Health (# SMM4H) Workshop and Shared Task (pp. 13-20).

@inproceedings{miranda2021profner,
  title={The profner shared task on automatic recognition of occupation mentions in social media: systems, evaluation, guidelines, embeddings and corpora},
  author={Miranda-Escalada, Antonio and Farr{\'e}-Maduell, Eul{\`a}lia and Lima-L{\'o}pez, Salvador and Gasc{\'o}, Luis and Briva-Iglesias, Vicent and Ag{\"u}ero-Torales, Marvin and Krallinger, Martin},
  booktitle={Proceedings of the Sixth Social Media Mining for Health (\# SMM4H) Workshop and Shared Task},
  pages={13--20},
  year={2021}
}

 

Description

  • Available are the cased and uncased versions for the cbow and skipgram models.
  • FastText parameter configurations were: 
    • dim 300 
    • minCount 5
    • minn 3
    • maxn 6

 

Preprocessing

"RT: @" patterns are removed. URL and mentions are substituted by URL and @MENTION. Text is tokenized with NLTK TweetTokenizer.

 

Resources

For more information, see https://temu.bsc.es/smm4h-spanish

Notes

Funded by the Plan de Impulso de las Tecnologías del Lenguaje (Plan TL).

Files

README.txt

Files (28.1 GB)

Name Size Download all
md5:cb6decc3e77e3d4fe840ad7d6b296e6c
7.5 GB Download
md5:ed75b214f4aa63e026dae7293a6ba0b7
6.5 GB Download
md5:192753a1b86627f96bdfcb4a80540b47
341 Bytes Preview Download
md5:fa0ed3051e6dc83adadde3009f7ef5df
7.5 GB Download
md5:333b6c5294836f93928be7224d933b2d
6.5 GB Download