Dataset for Automated Unit Test Generation via Chain of Thought Prompt and Reinforcement Learning
Authors/Creators
Description
This is the replication package including three types datasets: training dataset with CoT prompts, reward dataset for training reward model, rl dataset for optimizing policy model. The training dataset includes filter_test_cot_rule_50k.csv, filter_train_cot_rule_50k.csv, and filter_valid_cot_rule_50k.csv. These three datasets includes multiple fields (i.e., src_fm, intention, plan, elaboration, gpt_test, src_fm_cot_gpt, target, src_fm_fc_ms_ff,src_fm_intention,src_fm_plan,src_fm_elaboration,idx,rule_cot,rule_cot_nlp,combine_cot,src_fm_rule_cot_nlp,src_fm_cot_nlp_gpt,gpt_cot_filter,src_fm_plan_intention). The reward dataset includes test_athena.json, train_athena.json, and valid_athena.json three files. The rl dataset includes three files: filter_test_cot_gpt_rl.csv, filter_train_cot_gpt_rl.csv, filter_valid_cot_gpt_rl.csv. These files include mulitple fields: src_fm,intention,plan,elaboration,gpt_test,src_fm_cot_gpt,target,src_fm_fc_ms_ff,src_fm_intention,src_fm_plan,src_fm_elaboration,gpt_cot_filter.