BuyTheBy - An annotated dataset of paper mill advertisements with price data
Authors/Creators
Description
A preprint describing this dataset has been submitted to arXiv. This entry will be updated as soon as arXiv's moderation process is complete.
The study of paper mills and similar businesses operating in the market for academic and education fraud services is frustrated by the lack of market price data on their various offerings. Here, we assemble BuyTheBy, a large, annotated dataset of timestamped, text-based paper mill advertisements from seven businesses operating out of seven different countries. The dataset consists of 18,710 individual advertisements, of which 15,839 have prices listed. Among these there are 20,598 positions listed as for sale on 5,567 unique products in 14 different product categories with 51,812 timestamped price data points. Code for reproducing figures and summary statistics is available at https://github.com/reeserich/buytheby.
Files
buytheby_v_1_0_combined_processed_ads.csv
Additional details
Software
- Repository URL
- https://github.com/reeserich/buytheby