Dataset Open Access
Wachsmuth, Henning;
Prettenhofer, Peter;
Stein, Benno
The corpus consists of 1,128 German news articles from the years 2003 to 2009, collected from 29 general and business news websites. In each article, statements on the revenue of companies or markets were manually annotated, i.e., sentences and entities that refer to a statement are tagged and linked to each other.
Here is an example of a revenue statement from the corpus:
Loewe AG: Vorläufige Neun-Monats-Zahlen
Kronach, [6. November 2007]REF - Das Ergebnis vor Zinsen und Steuern (EBIT) des Loewe Konzerns konnte in den ersten 9 Monaten 2007 um 41% gesteigert werden. Vor diesem Hintergrund hebt die [Loewe AG]ORG ihre EBIT-Prognose für das laufende Geschäftsjahr auf 20 Mio. Euro an. Beim Umsatz strebt Konzernchef [Rainer Hecker]AUTH [für das Gesamtjahr]TIME ein höher als ursprünglich geplantes [Wachstum]TREND [von 10% auf ca. 380 Mio. Euro]MONEY an. (...)
A revenue statement comprises seven attributes:
A total of 2,075 statements have been annotated by domain experts. For more information on the construction of the dataset see the documentation.
Name | Size | |
---|---|---|
revenuecorpus_annotated.tar.gz
md5:63853946daac7647a905e579076f1bd4 |
6.6 MB | Download |
RevenueCorpus_Documentation.pdf
md5:c166a0ed93063d14e5814f868cbe4448 |
23.8 kB | Download |
Henning Wachsmuth, Peter Prettenhofer, and Benno Stein. Efficient Statement Identification for Automatic Market Forecasting. In Chu-Ren Huang and Dan Jurafsky, editors, 23rd International Conference on Computational Linguistics (COLING 10), pages 1128-1136, Stroudsburg, Pennsylvania, August 2010. Association for Computational Linguistics.
All versions | This version | |
---|---|---|
Views | 265 | 265 |
Downloads | 106 | 106 |
Data volume | 146.7 MB | 146.7 MB |
Unique views | 252 | 252 |
Unique downloads | 90 | 90 |