The ALPIN Sentiment Dictionary: Austrian Language Polarity in Newspapers
Authors/Creators
- 1. TU Wien
- 2. Austrian Academy of Sciences
- 3. University of Vienna
Description
These datasets are part of the submitted paper for the LREC2022 conference entitled: "The ALPIN Sentiment Dictionary: Austrian Language Polarity in Newspapers"
The various data sources, as well as the methodology, are explained in detail in the research paper which will be available soon.
ALPIN stands for Austrian Language Polarity in Newspapers. The dictionary consists of three different parts which were merged together:
- Austrian Media Corpus: AMC (AMC_v1.0.csv)
- STANDARD posts: STP (STP_v1.0.csv)
- Austriacisms: AUT (AUT_v1.0.csv)
Austrian Media Corpus (AMC) (Ransmayr et al., 2017) & STANDARD posts (STP) (Schabus et al., 2017) rely on the SPLM algorithm as used in SentiDraw (Sharma & Dutta 2021). Austriacisms (AUT) was generated by using the Best-Worst scaling (BWS) (Kiritchenko and Mohammad, 2017b). The AUT list was collected from the “Variantenwörterbuch des Deutschen” (Ammon et al., 2016) (thereby only selecting those words that only surface in Austrian German and in no other variety of German) and an austriacism list of Wikipedia (https://de.wikipedia.org/wiki/Liste_von_Austriazismen).
The scores are scaled to the interval [-1, 1] using the min-max-abs scaling, ranging from negative to positive.
References:
Sharma, S. S., & Dutta, G. (2021). SentiDraw: Using star ratings of reviews to develop domain specific sentiment lexicon for polarity determination. Information Processing & Management, 58(1), 102412.
Kiritchenko, S. and Mohammad, S. M. (2017b). Capturing reliable fine-grained sentiment associations by crowdsourcing and best-worst scaling.
Schabus, D., Skowron, M., & Trapp, M. (2017). One Million Posts: A Data Set of German Online Discussions. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1241–1244. https://doi.org/10.1145/3077136.3080711
Ransmayr, J., Mörth, K., & Ďurčo, M. (2017). AMC (Austrian Media Corpus). In Korpusbasierte Forschungen zum österreichischen Deutsch. In Digitale Methoden der Korpusforschung in Österreich (= Veröffentlichungen zur Linguistik und Kommunikationsforschung Nr. 30) (pp. 27–38). Verlag der Österreichischen Akademie der Wissenschaften.
Ammon, U., Bickel, H., & Ebner, J. (2016). Variantenwörterbuch des Deutschen : die Standardsprache in Österreich, der Schweiz, Deutschland, Liechtenstein, Luxemburg, Ostbelgien und Südtirol sowie Rumänien, Namibia und Mennonitensiedlungen. Walter de Gruyter.