RARE((Repository for App review REfinement)

Singh, Amrita

doi:10.5281/zenodo.13939427

Published October 16, 2024 | Version v2

Dataset Open

RARE((Repository for App review REfinement)

Singh, Amrita¹

1. TCS Research

# This directory contains the Benchmark RARE Dataset and Code File as described in the paper:

- RARE_Dataset: In this folder, we introduce RARE, a benchmark for App Review Refinement. This folder contains two subfolders named Gold_Corpus and Silver_Corpus.

1. Gold_Corpus: In this folder, a corpus of 10,000 annotated reviews, collaboratively refined by software engineers and a large language model (LLM) sourced from 10 different application domains, is provided.

2. Silver_Corpus: This folder includes a set of 10,000 automatically refined reviews using the best-performing model, Flan-T5, which was trained on 10,000 reviews from the gold corpus, forming the silver corpus.

- Code_File: In this folder, all the code files used in the entire experiment and research are provided. This folder contains four subfolders named Data_Extraction, Refined_Review_Generation_through_Prompting, Model_Finetuning_and_Inferences, and Result_Evaluation.

1. Data_Extraction: This folder contains 2 Python files named 'Google_Play_Store_Reviews_Extraction_from_10_different_App.py', which was used for extracting 10,000 raw reviews from the Google Play Store, and 'Apple_App_Store_Reviews_Extraction_from_10_different_App.py', which was used for extracting 10,000 raw reviews from the Apple App Store.

2. Refined_Review_Generation_through_Prompting: This folder contain a Python file named 'Prompting_GPT_3.5_TURBO_For_Refined_Review_Generation.py', which was used to guide GPT-3.5-Turbo in generating refined versions of the raw reviews.

3. Model_Finetuning_and_Inferences: This folder contains 16 Python files: one for fine-tuning and another for inference, each for eight models, including BART, Flan-T5, Pegasus, Llama-2, Falcon, Mistral, Orca-2, and Gemma.

4. Result_Evaluation: This folder contains 2 Python files: 'Reference_free_Automatic_Metrics_Evaluation.py' for evaluating reference-free metrics such as FKGL, FKRE, LEN, and SS, and 'Reference_Based_Automatic_Metrics_Evaluation.py' for evaluating reference-based metrics such as SARI and BERTScore Precision.

Files

RARE_Dataset.zip

Files (4.9 MB)

Name	Size	Download all
Apple_App_Store_Reviews_Extraction_from_10_different_App.py md5:db4801449d163d9ef20cc648e82f7369	7.7 kB	Download
BART_Finetuning.py md5:95421216fa0ac4cc414ca15c08bad439	3.1 kB	Download
BART_Inferences.py md5:44bf27ef961f1c892cf46d17a32e4b05	1.2 kB	Download
Falcon_Finetuning.py md5:a1dcf88f41327dad781bbca25c3fefa0	4.3 kB	Download
Falcon_Inferences.py md5:db2ff3e44e9295e114893f7d0c7df9cc	2.9 kB	Download
Flan_T5_Finetuning.py md5:2a0af14f7e0738e8434dd7e558b8e897	3.1 kB	Download
Flan_T5_Inferences.py md5:44bf27ef961f1c892cf46d17a32e4b05	1.2 kB	Download
Gemma_Finetuning.py md5:80d078f7942475f0a3856390e66895d9	3.3 kB	Download
Gemma_Inferences.py md5:f980f548a782b6f2ce8797804988edf9	2.4 kB	Download
Google_Play_Store_Reviews_Extraction_from_10_different_App.py md5:d2b0b08ccbc82cc35f507509d3a515f9	2.7 kB	Download
Llama_2_Finetuning.py md5:e44747f3a9b0910679c0da296fbe8544	3.4 kB	Download
Llama_2_Inferences.py md5:e21d41a1222cb170fe9577fff4a0a9d2	2.1 kB	Download
Mistral_Finetuning.py md5:536d83c9038c4d4fa4265caf8399f295	3.5 kB	Download
Mistral_Inferences.py md5:a346082c968d3cc4a978ec2271de542e	2.0 kB	Download
Orca_2_Finetuning.py md5:b5c7876d48bf63d35bf161551c197820	3.3 kB	Download
Orca_2_Inferences.py md5:867e4168b583cb9efb6831d1e4a66ce6	2.3 kB	Download
Pegasus_Finetuning.py md5:364326e5e3e69824e7636941f2459d47	3.1 kB	Download
Pegasus_Inferences.py md5:44bf27ef961f1c892cf46d17a32e4b05	1.2 kB	Download
Prompting_GPT_3.5_TURBO_For_Refined_Review_Generation.py md5:e170850297461ad68c6aace5b62d4eb5	1.1 kB	Download
RARE_Dataset.zip md5:a5b2de0cdb837eac06f80e7052971f29	4.8 MB	Preview Download
Reference_Based_Automatic_Metrics_Evaluation.py md5:00534c12c2f894eeb0803e2bb6604ff7	1.8 kB	Download
Reference_Free_Automatic_Metrics_Evaluation.py md5:809ef710d17fc4c7eae0813b4e882f72	4.0 kB	Download

	All versions	This version
Views	79	68
Downloads	599	599
Data volume	142.1 MB	142.1 MB

RARE((Repository for App review REfinement)

Authors/Creators

Description

Files

RARE_Dataset.zip

Files (4.9 MB)