Published January 17, 2022 | Version 1.0
Dataset Open

The ALPIN Sentiment Dictionary: Austrian Language Polarity in Newspapers

  • 1. TU Wien
  • 2. Austrian Academy of Sciences
  • 3. University of Vienna

Description

These datasets are part of the submitted paper for the LREC2022 conference entitled: "The ALPIN Sentiment Dictionary: Austrian Language Polarity in Newspapers"

The various data sources, as well as the methodology, are explained in detail in the research paper which will be available soon.

ALPIN stands for Austrian Language Polarity in Newspapers. The dictionary consists of three different parts which were merged together:

  • Austrian Media Corpus: AMC (AMC_v1.0.csv)
  • STANDARD posts: STP (STP_v1.0.csv)
  • Austriacisms: AUT (AUT_v1.0.csv)

Austrian Media Corpus (AMC) (Ransmayr et al., 2017) & STANDARD posts (STP) (Schabus et al., 2017) rely on the SPLM algorithm as used in SentiDraw (Sharma & Dutta 2021). Austriacisms (AUT) was generated by using the Best-Worst scaling (BWS) (Kiritchenko and Mohammad, 2017b). The AUT list was collected from the “Variantenwörterbuch des Deutschen” (Ammon et al., 2016) (thereby only selecting those words that only surface in Austrian German and in no other variety of German) and an austriacism list of Wikipedia (https://de.wikipedia.org/wiki/Liste_von_Austriazismen).

The scores are scaled to the interval [-1, 1] using the min-max-abs scaling, ranging from negative to positive.

References:
Sharma, S. S., & Dutta, G. (2021). SentiDraw: Using star ratings of reviews to develop domain specific sentiment lexicon for polarity determination. Information Processing & Management, 58(1), 102412.
Kiritchenko, S. and Mohammad, S. M. (2017b). Capturing reliable fine-grained sentiment associations by crowdsourcing and best-worst scaling.
Schabus, D., Skowron, M., & Trapp, M. (2017). One Million Posts: A Data Set of German Online Discussions. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1241–1244. https://doi.org/10.1145/3077136.3080711
Ransmayr, J., Mörth, K., & Ďurčo, M. (2017). AMC (Austrian Media Corpus). In Korpusbasierte Forschungen zum österreichischen Deutsch. In Digitale Methoden der Korpusforschung in Österreich (= Veröffentlichungen zur Linguistik und Kommunikationsforschung Nr. 30) (pp. 27–38). Verlag der Österreichischen Akademie der Wissenschaften.
Ammon, U., Bickel, H., & Ebner, J. (2016). Variantenwörterbuch des Deutschen : die Standardsprache in Österreich, der Schweiz, Deutschland, Liechtenstein, Luxemburg, Ostbelgien und Südtirol sowie Rumänien, Namibia und Mennonitensiedlungen. Walter de Gruyter.

Notes

This research was supported by the DYSEN project, funded by the City of Vienna (MA7-737909/19) and by the DYLEN project, funded by the ÖAW go!digital Next Generation grant (GDNG 2018-02).

Files

ALPIN_v1.0.csv

Files (661.8 kB)

Name Size Download all
md5:e3a4670f0167bbb5674f1aaca4bde1e7
309.6 kB Preview Download
md5:b836946062df566871277dc72c30356d
167.0 kB Preview Download
md5:7d48eac9230839fb619bbf813ac88c2e
12.3 kB Preview Download
md5:f3212d2352f8e43f2690fb0c41f38f94
172.9 kB Preview Download