Published October 16, 2024 | Version v2
Dataset Open

RARE((Repository for App review REfinement)

Authors/Creators

  • 1. TCS Research

Description

# This directory contains the Benchmark RARE Dataset and Code File as described in the paper:


- RARE_Dataset: In this folder, we introduce RARE, a benchmark for App Review Refinement. This folder contains two subfolders named Gold_Corpus and Silver_Corpus.

1. Gold_Corpus: In this folder, a corpus of 10,000 annotated reviews, collaboratively refined by software engineers and a large language model (LLM) sourced from 10 different application domains, is provided.

2. Silver_Corpus: This folder includes a set of 10,000 automatically refined reviews using the best-performing model, Flan-T5, which was trained on 10,000 reviews from the gold corpus, forming the silver corpus.

 


- Code_File: In this folder, all the code files used in the entire experiment and research are provided. This folder contains four subfolders named Data_Extraction,      Refined_Review_Generation_through_Prompting, Model_Finetuning_and_Inferences, and Result_Evaluation.

1. Data_Extraction: This folder contains 2 Python files named 'Google_Play_Store_Reviews_Extraction_from_10_different_App.py', which was used for extracting 10,000 raw reviews from the Google Play Store, and 'Apple_App_Store_Reviews_Extraction_from_10_different_App.py', which was used for extracting 10,000 raw reviews from the Apple App Store.

2. Refined_Review_Generation_through_Prompting: This folder contain a Python file named 'Prompting_GPT_3.5_TURBO_For_Refined_Review_Generation.py', which was used to guide GPT-3.5-Turbo in generating refined versions of the raw reviews.

3. Model_Finetuning_and_Inferences: This folder contains 16 Python files: one for fine-tuning and another for inference, each for eight models, including BART, Flan-T5, Pegasus, Llama-2, Falcon, Mistral, Orca-2, and Gemma.

4. Result_Evaluation: This folder contains 2 Python files: 'Reference_free_Automatic_Metrics_Evaluation.py' for evaluating reference-free metrics such as FKGL, FKRE, LEN, and SS, and 'Reference_Based_Automatic_Metrics_Evaluation.py' for evaluating reference-based metrics such as SARI and BERTScore Precision.

Files

RARE_Dataset.zip

Files (4.9 MB)

Name Size Download all
md5:db4801449d163d9ef20cc648e82f7369
7.7 kB Download
md5:95421216fa0ac4cc414ca15c08bad439
3.1 kB Download
md5:44bf27ef961f1c892cf46d17a32e4b05
1.2 kB Download
md5:a1dcf88f41327dad781bbca25c3fefa0
4.3 kB Download
md5:db2ff3e44e9295e114893f7d0c7df9cc
2.9 kB Download
md5:2a0af14f7e0738e8434dd7e558b8e897
3.1 kB Download
md5:44bf27ef961f1c892cf46d17a32e4b05
1.2 kB Download
md5:80d078f7942475f0a3856390e66895d9
3.3 kB Download
md5:f980f548a782b6f2ce8797804988edf9
2.4 kB Download
md5:d2b0b08ccbc82cc35f507509d3a515f9
2.7 kB Download
md5:e44747f3a9b0910679c0da296fbe8544
3.4 kB Download
md5:e21d41a1222cb170fe9577fff4a0a9d2
2.1 kB Download
md5:536d83c9038c4d4fa4265caf8399f295
3.5 kB Download
md5:a346082c968d3cc4a978ec2271de542e
2.0 kB Download
md5:b5c7876d48bf63d35bf161551c197820
3.3 kB Download
md5:867e4168b583cb9efb6831d1e4a66ce6
2.3 kB Download
md5:364326e5e3e69824e7636941f2459d47
3.1 kB Download
md5:44bf27ef961f1c892cf46d17a32e4b05
1.2 kB Download
md5:e170850297461ad68c6aace5b62d4eb5
1.1 kB Download
md5:a5b2de0cdb837eac06f80e7052971f29
4.8 MB Preview Download
md5:00534c12c2f894eeb0803e2bb6604ff7
1.8 kB Download
md5:809ef710d17fc4c7eae0813b4e882f72
4.0 kB Download