Published November 19, 2024 | Version 1.0.0
Dataset Open

Synthetic Product Desirability Datasets for Sentiment Analysis Testing

Description

Overview:
This collection contains three synthetic datasets produced by gpt-4o-mini for sentiment analysis and PDT (Product Desirability Toolkit) testing. Each dataset contains 1000 hypothetical software product reviews with the aim to produce a diversity of sentiment and text. The datasets were created as part of the research described in:

Hastings, J.D., Weitl-Harms, S., Doty, J., Myers, Z. L., and Thompson, W., “Utilizing Large Language Models to Synthesize Product Desirability Datasets,” in Proceedings of the 2024 IEEE International Conference
on Big Data (BigData-24), Workshop on Large Language and Foundation Models (WLLFM-24), Dec. 2024.
https://arxiv.org/abs/2411.13485.

Briefly, each row in the datasets was produced as follows:
1) Word+Review: The LLM selected a word and synthesized a review that would align with a random target sentiment.
2) Review+Word: The LLM produced a review to align with the target sentiment score, and then selected a word appropriate for the review.
3) Supply-Word: A word was supplied to the LLM which was then scored, and a review was produced to align with that score.

For sentiment analysis and PDT testing, the two columns of main interest across the datasets are likely 'Selected Word' and 'Hypothetical Review'.

License:
This data is licensed under the CC Attribution 4.0 international license, and may be taken and used freely with credit given. Cite as:

Hastings, J., Weitl-Harms, S., Doty, J., Myers, Z., & Thompson, W. (2024). Synthetic Product Desirability Datasets for Sentiment Analysis Testing (1.0.0). Zenodo. https://doi.org/10.5281/zenodo.14188456

Files

synthetic pdt datasets.zip

Files (424.3 kB)

Name Size Download all
md5:549661ee402b3432b9869092f58d0584
424.3 kB Preview Download