Dataset Open Access

Same Sentiment Classification Train/Dev/Test Pair IDs

Erik Körner; Ahmad Dawar Hakimi; Gerhard Heyer; Martin Potthast

This "dataset" only includes the compiled pairings of the Yelp Business Review Dataset. To get access to the actual review texts, please follow the instructions on the Yelp Dataset webpage.

The data format is JSONlines.
Python Load Example:

import pandas as pd
traindev_df = pd.read_json("df_traindev.jsonl", lines=True)
test_df = pd.read_json("df_test.jsonl", lines=True)

# example access to single business/review id
s1_bid = test_df.iloc[0]["sent1_business_id"]
s1_rid = test_df.iloc[0]["sent1_review_id"]
s2_bid = test_df.iloc[0]["sent2_business_id"]
s2_rid = test_df.iloc[0]["sent2_review_id"]
label = test_df.iloc[0]["is_same_side"]

See documentation at:

For details on how the data was compiled and used in our experiments, please refer to our code repository. Other derived data splits can be reproduced deterministically by using the same random seed as in our experiments.

Files (43.3 MB)
Name Size
43.3 MB Download
All versions This version
Views 5353
Downloads 55
Data volume 216.4 MB216.4 MB
Unique views 4141
Unique downloads 44


Cite as