LLMShot: Reducing snapshot testing maintanence via LLMs

Koyuncu, Anil

doi:10.5281/zenodo.15876540

Published July 14, 2025 | Version v1

Dataset Open

LLMShot: Reducing snapshot testing maintanence via LLMs

Koyuncu, Anil (Supervisor)¹

1. Bilkent University

Replication Package for: LLMShot: Reducing snapshot testing maintanence via LLMs

This package replicates the study presented in the paper "LLMShot: Reducing Snapshot Testing Maintenance via LLMs". The tool analyzes UI snapshot differences using large language models (LLMs) and generates comprehensive reports that identify visual discrepancies between expected and actual UI screenshots.

1. Requirements

Before running this project, ensure you have the following:

Python 3.8 or later
Ollama (with models: gemma3:4b and gemma3:12b)
Pillow
NumPy
Colorama

2. Installation Instructions

Follow the steps below to set up the project.

Clone the repository:

git clone https://github.com/yourusername/SnapshotInstructor.git cd SnapshotInstructor
Install the required Python packages:

pip install numpy pillow colorama
Install Ollama:
- Follow installation instructions on Ollama.
- Download the necessary models:
  
  ollama pull gemma3:4b ollama pull gemma3:12b
Prepare your dataset:
- Create a dataset directory in the project root.
- Execute tests using Xcode and run generate_dataset.sh.
- The dataset should contain the following files:
  - reference.png: Expected UI screenshot
  - failure.png: Actual UI screenshot with potential discrepancies
  - diff.png: Visualized differences between the two images
  - metadata.json: Metadata including test details and categories

3. Usage Instructions

Interactive Mode:

To start the interactive mode, run:

python process_snapshots.py

This will prompt you with a menu to select from the following options:

Select Model: Choose between the 4b or 12b models
Run Standard Analysis: Identify differences across all snapshots
Run 'Ignore Reason' Analysis (From Analysis): Ignore the primary difference detected in the standard analysis
Run 'Ignore Reason' Analysis (From Metadata): Ignore the first category from metadata
Run 'Analyze and Ignore' Analysis: Analyze and ignore the main difference in one step

Automated Mode:

To run all analyses in batch mode for both models, execute:

python process_snapshots.py --all

This will:

Perform all analysis modes using the 4b model
Perform all analysis modes using the 12b model
Generate a comprehensive report

4. Generating Reports

After completing the analyses, generate a visual HTML report:

python generate_report.py

The report will be created in the reports/ directory and automatically opened in your default web browser.

5. Reported Features

Metrics Dashboard: View aggregated accuracy and performance metrics for all analysis modes
Test Case Browser: Browse through individual test cases, including images and detailed analysis
Visual Comparison: Compare reference and failure images with highlighted differences

Files

LLMShot-1040.zip

Files (165.0 MB)

Name	Size	Download all
LLMShot-1040.zip md5:1a977f1fb8997fb1404134809411e1c0	165.0 MB	Preview Download

	All versions	This version
Views	109	30
Downloads	63	13
Data volume	6.3 GB	2.1 GB

LLMShot: Reducing snapshot testing maintanence via LLMs

Authors/Creators

Description

Replication Package for: LLMShot: Reducing snapshot testing maintanence via LLMs

1. Requirements

2. Installation Instructions

3. Usage Instructions

Interactive Mode:

Automated Mode:

4. Generating Reports

5. Reported Features

Files

LLMShot-1040.zip

Files (165.0 MB)