Published June 17, 2022 | Version 1.0.0
Dataset Open

Amazon product reviews (mock dataset)

  • 1. RELX

Description

About

This is a mock dataset with Amazon product reviews. Classes are structured: 6 "level 1" classes, 64 "level 2" classes, and 510 "level 3" classes.


3 files are shared:

  • train_40k.csv - training 40k Amazon product reviews
  • valid_10k.csv - 10k reviews left for validation
  • unlabeled_150k.csv - raw 150k Amazon product reviews, these can be used for language model finetuning.

Level 1 classes are: health personal care, toys games, beauty, pet supplies, baby products, and grocery gourmet food.

Dataset originally from https://www.kaggle.com/datasets/kashnitsky/hierarchical-text-classification

Files

train_40k.csv

Files (95.8 MB)

Name Size Download all
md5:9bda3bc6dcdb8cf68ce401a5b9782399
22.8 MB Preview Download
md5:1d381c81895bb4b9860e03d18f016e01
68.6 MB Preview Download
md5:2c22dc48f1674e5d1cc96f6491c9e2ba
4.5 MB Preview Download