Dataset Open Access
Henning Wachsmuth; Martin Trenkmann; Benno Stein; Gregor Engels; Tsvetomira Palakarska
An English corpus for studying local sentiment flows and aspect-based sentiment analysis. It contains 2100 hotel reviews balanced with respect to the reviews’ sentiment scores. All reviews are segmented into subsentence-level statements that have then been manually classified as a fact, a positive, or a negative opinion. Also, all hotel aspects mentioned in the reviews have been annotated as such:
In addition, we provide nearly 200k further hotel reviews without manual annotations:
The corpus is free-to-use for scientific purposes, not for commercial applications. In version 2, the annotated XMI files have been changed according to a new underlying type system that is more easily extendable. Notice that some adaptations of the software of version 1 are necessary to make it work with version 2.
In case you publish any results related to the ArguAna TripAdvisor corpus, please cite our CICLing 2014 paper.