Published March 1, 2025
| Version v4
Dataset
Restricted
LLM-Generated Software Requirements from GitHub Issues
Description
This dataset contains software requirements automatically generated from bug reports and feature requests extracted from the three most popular machine learning repositories on GitHub: Scikit-learn, TensorFlow, and Transformers. The dataset is structured into issue data, generated requirements, and evaluations based on three well-defined criteria.
Dataset Structure
issues.csv: Contains issue titles along with their corresponding repository names and unique identifiers.- Requirements Files: These files store the requirements generated by LLMs for each issue, categorized by different prompting methods:
few_shot_requirements.csvzero_shot_requirements.csvexpert_requirements.csvexpert_few_shot_requirements.csv
- Evaluation Files: These files contain the assessment of the generated requirements based on three key quality criteria: Unambiguity, Understandability, and Singularity. The evaluations are also divided by prompting methods:
few_shot_evaluation.csvzero_shot_evaluation.csvexpert_evaluation.csvexpert_few_shot_evaluation.csv