Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published September 9, 2021 | Version 1.0
Dataset Open

Same Sentiment Classification Train/Dev/Test Pair IDs

  • 1. Bauhaus-Universität Weimar
  • 2. Leipzig University

Description

This "dataset" only includes the compiled pairings of the Yelp Business Review Dataset. To get access to the actual review texts, please follow the instructions on the Yelp Dataset webpage.

The data format is JSONlines.
Python Load Example:

import pandas as pd
traindev_df = pd.read_json("df_traindev.jsonl", lines=True)
test_df = pd.read_json("df_test.jsonl", lines=True)

# example access to single business/review id
s1_bid = test_df.iloc[0]["sent1_business_id"]
s1_rid = test_df.iloc[0]["sent1_review_id"]
s2_bid = test_df.iloc[0]["sent2_business_id"]
s2_rid = test_df.iloc[0]["sent2_review_id"]
label = test_df.iloc[0]["is_same_side"]

See documentation at:

For details on how the data was compiled and used in our experiments, please refer to our code repository. Other derived data splits can be reproduced deterministically by using the same random seed as in our experiments.

Files

traindev_pair_ids.zip

Files (43.3 MB)

Name Size Download all
md5:686346d0e00bd0171766d552f8d2627f
43.3 MB Preview Download

Additional details

Related works

Is derived from
Dataset: https://www.yelp.com/dataset (URL)