Published June 17, 2022
| Version 1.0.0
Dataset
Open
Amazon product reviews (mock dataset)
Description
About
This is a mock dataset with Amazon product reviews. Classes are structured: 6 "level 1" classes, 64 "level 2" classes, and 510 "level 3" classes.
3 files are shared:
- train_40k.csv - training 40k Amazon product reviews
- valid_10k.csv - 10k reviews left for validation
- unlabeled_150k.csv - raw 150k Amazon product reviews, these can be used for language model finetuning.
Level 1 classes are: health personal care, toys games, beauty, pet supplies, baby products, and grocery gourmet food.
Dataset originally from https://www.kaggle.com/datasets/kashnitsky/hierarchical-text-classification