Broad-Coverage German Sentiment Classification Model and Dataset for Dialog Systems

Guhr, Oliver; Schumann,  Anne-Kathrin; Bahrmann, Frank; Böhme, Hans-Joachim

doi:10.5281/zenodo.3693810

Published May 15, 2020 | Version 1.0.0

Dataset Open

Broad-Coverage German Sentiment Classification Model and Dataset for Dialog Systems

1. HTW Dresden
2. Text2Knowledge

Training a Broad-Coverage German Sentiment Classification Model for Dialog Systems

This paper describes the training of a general-purpose German sentiment classification model. Sentiment classification is an important aspect of general text analytics. Furthermore, it plays a vital role in dialogue systems and voice interfaces that depend on the ability of the system to pick up and understand emotional signals from user utterances. The presented study outlines how we have collected a new German sentiment corpus and then combined this corpus with existing resources to train a broad-coverage German sentiment model. The resulting data set contains 5.4 million labelled samples. We have used the data to train both, a simple convolutional and a transformer-based classification model and compared the results achieved on various training configurations. The model and the data set will be published along with this paper.

You can find the code for training testing the models, that was published along with the paper in this repository.

The germansentiment Python package contains a easy to use interface for the model that was published with this paper.

Notes

This repository contains the trained models as well as the training data.

Files

models.zip

Files (8.6 GB)

Name	Size
models.zip md5:12f134dd5754a792855f150af823a18f	6.4 GB	Preview Download
no-scare-balanced.zip md5:11f47c5dc70b32262b7eed9de3ee566b	637.0 MB	Preview Download
sentiment-data-reviews-and-neutral.zip md5:75dd91cb1813d504a233d8710c664f88	1.5 GB	Preview Download

Additional details

Is documented by: Conference paper: http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.202.pdf (URL)

	All versions	This version
Views	1,742	1,734
Downloads	1,632	1,627
Data volume	6.1 TB	6.0 TB

Broad-Coverage German Sentiment Classification Model and Dataset for Dialog Systems

Authors/Creators

Description

Notes

Files

models.zip

Files (8.6 GB)

Additional details

Related works