A TripAdvisor Dataset for Dyadic Context Analysis

López-Riobóo Botana, Iñigo Luis; Alonso-Betanzos, Amparo; Bolón-Canedo, Verónica; Guijarro-Berdiñas, Bertha

doi:10.5281/zenodo.6583422

Published May 26, 2022 | Version 1.0

Dataset Open

A TripAdvisor Dataset for Dyadic Context Analysis

1. Research Center on Information and Communication Technologies (CITIC) – Universidade da Coruña. Campus de Elviña, 15071 A Coruña, España.

There are many contexts where dyadic data are present. In social networks, users are linked to a variety of items, defining interactions. In the social platform of TripAdvisor, users are linked to restaurants by means of reviews posted by them. Using the information of these interactions, we can get valuable insights for forecasting, proposing tasks related to recommender systems, sentiment analysis, text-based personalisation or text summarisation, among others. Furthermore, in the context of TripAdvisor there is a scarcity of public datasets and lack of well-known benchmarks for model assessment.

We present six new TripAdvisor datasets from the restaurants of six different cities: London, New York, New Delhi, Paris, Barcelona and Madrid.

If you use this data, please cite the following paper under submission process (preprint - arXiv)

We exclusively collected the reviews written in English from the restaurants of each city. The tabular data is comprised of a set of six different CSV files, containing numerical, categorical and text features:

parse_count: numerical (integer), corresponding number of extracted review by the web scraper (auto-incremental)
author_id: categorical (string), univocal, incremental and anonymous identifier of the user (UID_XXXXXXXXXX)
restaurant_name: categorical (string), name of the restaurant matching the review
rating_review: numerical (integer), review score in the range 1-5
sample: categorical (string), indicating “positive” sample for scores 4-5 and “negative” for scores 1-3
review_id: categorical (string), univocal and internal identifier of the review (review_XXXXXXXXX)
title_review: text, review title
review_preview: text, preview of the review, truncated in the website when the text is very long
review_full: text, complete review
date: timestamp, publication date of the review in the format (day, month, year)
city: categorical (string), city of the restaurant which the review was written for
url_restaurant: text, restaurant url

Notes

This research has been financially supported in part by the Spanish Government [grant number PID2019-109238GB-C22]; by the Xunta de Galicia [grant number ED431G 2019/01 - Research Center on Information and Communication Technologies (CITIC)]; and by European Union ERDF Funds. Special recognition goes to the Spanish Ministerio de Universidades for the predoctoral FPU funds [grant number FPU19/01457]. Please notice that these data is under a CC-BY-NC 4.0 International license. You must NOT use the material for commercial purposes. For the data collection, we designed our own web scraper, selecting a mix of Scrapy python framework and Selenium web driver testing tool. Participants data have been anonymized. We added the field "author_id" as the incremental, univocal and anonymous identifier of each user (UID_XXXXXXXXXX).

Files

Barcelona_reviews.csv

Files (2.5 GB)

Name	Size
Barcelona_reviews.csv md5:68e6b4b9b365c6b42023f0c94bad5d2e	355.6 MB	Preview Download
London_reviews.csv md5:cde6207dc1ebcb41bc043658767a41e4	942.3 MB	Preview Download
Madrid_reviews.csv md5:d20041b7b3c7a71011aaed70935a6d40	145.5 MB	Preview Download
New_Delhi_reviews.csv md5:98ebf3f8ecb6ad883615423352960444	134.8 MB	Preview Download
New_York_reviews.csv md5:2e5099cf95b33390328eea6868f4dd74	473.1 MB	Preview Download
Paris_reviews.csv md5:ee5c33ec456f42fdff135ba92dc1cdd2	465.6 MB	Preview Download

Additional details

Is derived from: Preprint: 10.48550/arXiv.2205.01759 (DOI)

	All versions	This version
Views	1,229	1,226
Downloads	2,383	2,381
Data volume	1.6 TB	1.6 TB

A TripAdvisor Dataset for Dyadic Context Analysis

Authors/Creators

Description

Notes

Files

Barcelona_reviews.csv

Files (2.5 GB)

Additional details

Related works