Dataset Open Access

ArguAna TripAdvisor

Henning Wachsmuth; Martin Trenkmann; Benno Stein; Gregor Engels; Tsvetomira Palakarska

An English corpus for studying local sentiment flows and aspect-based sentiment analysis. It contains 2100 hotel reviews balanced with respect to the reviews’ sentiment scores. All reviews are segmented into subsentence-level statements that have then been manually classified as a fact, a positive, or a negative opinion. Also, all hotel aspects mentioned in the reviews have been annotated as such:

  • arguana-tripadvisor-annotated-plus-software-v1.zip
  • arguana-tripadvisor-annotated-v2.zip

In addition, we provide nearly 200k further hotel reviews without manual annotations:

  • v1 upon request
  • arguana-tripadvisor-unannotated-v2.zip

The corpus is free-to-use for scientific purposes, not for commercial applications. In version 2, the annotated XMI files have been changed according to a new underlying type system that is more easily extendable. Notice that some adaptations of the software of version 1 are necessary to make it work with version 2.

In case you publish any results related to the ArguAna TripAdvisor corpus, please cite our CICLing 2014 paper.

Files (279.1 MB)
Name Size
arguana-tripadvisor-annotated-plus-software-v1.zip
md5:ef11039ebbd5088784cdf2d37bc0b65f
12.1 MB Download
arguana-tripadvisor-annotated-v2.zip
md5:a450bbdbbf888fb171783b62aa81e332
9.6 MB Download
arguana-tripadvisor-unannotated-v2.zip
md5:85e7c4f4142fc6bfdec1ad671cd78cdb
257.4 MB Download
87
30
views
downloads
All versions This version
Views 8787
Downloads 3030
Data volume 1.8 GB1.8 GB
Unique views 7474
Unique downloads 2020

Share

Cite as