LLMShot: Reducing snapshot testing maintanence via LLMs
Description
Replication Package for: LLMShot: Reducing snapshot testing maintanence via LLMs
This package replicates the study presented in the paper "LLMShot: Reducing Snapshot Testing Maintenance via LLMs". The tool analyzes UI snapshot differences using large language models (LLMs) and generates comprehensive reports that identify visual discrepancies between expected and actual UI screenshots.
1. Requirements
Before running this project, ensure you have the following:
-
Python 3.8 or later
-
Ollama (with models: gemma3:4b and gemma3:12b)
-
Pillow
-
NumPy
-
Colorama
2. Installation Instructions
Follow the steps below to set up the project.
-
Clone the repository:
git clone https://github.com/yourusername/SnapshotInstructor.git cd SnapshotInstructor -
Install the required Python packages:
pip install numpy pillow colorama -
Install Ollama:
-
Follow installation instructions on Ollama.
-
Download the necessary models:
ollama pull gemma3:4b ollama pull gemma3:12b
-
-
Prepare your dataset:
-
Create a
datasetdirectory in the project root. -
Execute tests using Xcode and run
generate_dataset.sh. -
The dataset should contain the following files:
-
reference.png: Expected UI screenshot -
failure.png: Actual UI screenshot with potential discrepancies -
diff.png: Visualized differences between the two images -
metadata.json: Metadata including test details and categories
-
-
3. Usage Instructions
Interactive Mode:
To start the interactive mode, run:
python process_snapshots.py This will prompt you with a menu to select from the following options:
-
Select Model: Choose between the 4b or 12b models
-
Run Standard Analysis: Identify differences across all snapshots
-
Run 'Ignore Reason' Analysis (From Analysis): Ignore the primary difference detected in the standard analysis
-
Run 'Ignore Reason' Analysis (From Metadata): Ignore the first category from metadata
-
Run 'Analyze and Ignore' Analysis: Analyze and ignore the main difference in one step
Automated Mode:
To run all analyses in batch mode for both models, execute:
python process_snapshots.py --all This will:
-
Perform all analysis modes using the 4b model
-
Perform all analysis modes using the 12b model
-
Generate a comprehensive report
4. Generating Reports
After completing the analyses, generate a visual HTML report:
python generate_report.py The report will be created in the reports/ directory and automatically opened in your default web browser.
5. Reported Features
-
Metrics Dashboard: View aggregated accuracy and performance metrics for all analysis modes
-
Test Case Browser: Browse through individual test cases, including images and detailed analysis
-
Visual Comparison: Compare reference and failure images with highlighted differences
Files
LLMShot-1040.zip
Files
(165.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:1a977f1fb8997fb1404134809411e1c0
|
165.0 MB | Preview Download |