Dataset Open Access

Webis-Revenue-10

Wachsmuth, Henning; Prettenhofer, Peter; Stein, Benno

The corpus consists of 1,128 German news articles from the years 2003 to 2009, collected from 29 general and business news websites. In each article, statements on the revenue of companies or markets were manually annotated, i.e., sentences and entities that refer to a statement are tagged and linked to each other.

Here is an example of a revenue statement from the corpus:

Loewe AG: Vorläufige Neun-Monats-Zahlen
Kronach, [6. November 2007]REF - Das Ergebnis vor Zinsen und Steuern (EBIT) des Loewe Konzerns konnte in den ersten 9 Monaten 2007 um 41% gesteigert werden. Vor diesem Hintergrund hebt die [Loewe AG]ORG ihre EBIT-Prognose für das laufende Geschäftsjahr auf 20 Mio. Euro an. Beim Umsatz strebt Konzernchef [Rainer Hecker]AUTH [für das  Gesamtjahr]TIME ein höher als ursprünglich geplantes [Wachstum]TREND [von 10% auf ca. 380 Mio. Euro]MONEY an. (...)

A revenue statement comprises seven attributes:

  • Forecast/Declaration: A sentence that represents a forecast or declaration on revenue.
  • Organization/Market: The subject of the statement, i.e., either an organization or market.
  • Time Expression: The period of time referenced by the statement.
  • Reference Point: The point in time when the statement was issued (used to resolute relative time expressions). 
  • Money Expression: The monetary value referenced by the statement.
  • Author: The holder of the statement.
  • Trend: A word that indicates the trend of the monetary entity.

A total of 2,075 statements have been annotated by domain experts. For more information on the construction of the dataset see the documentation.

Files (6.6 MB)
Name Size
revenuecorpus_annotated.tar.gz
md5:63853946daac7647a905e579076f1bd4
6.6 MB Download
RevenueCorpus_Documentation.pdf
md5:c166a0ed93063d14e5814f868cbe4448
23.8 kB Download
  • Henning Wachsmuth, Peter Prettenhofer, and Benno Stein. Efficient Statement Identification for Automatic Market Forecasting. In Chu-Ren Huang and Dan Jurafsky, editors, 23rd International Conference on Computational Linguistics (COLING 10), pages 1128-1136, Stroudsburg, Pennsylvania, August 2010. Association for Computational Linguistics.

10
7
views
downloads
All versions This version
Views 1010
Downloads 77
Data volume 13.3 MB13.3 MB
Unique views 1010
Unique downloads 55

Share

Cite as