There is a newer version of the record available.

Published July 14, 2025 | Version v1
Dataset Open

LLMShot: Reducing snapshot testing maintanence via LLMs

  • 1. ROR icon Bilkent University

Description

Replication Package for: LLMShot: Reducing snapshot testing maintanence via LLMs

This package replicates the study presented in the paper "LLMShot: Reducing Snapshot Testing Maintenance via LLMs". The tool analyzes UI snapshot differences using large language models (LLMs) and generates comprehensive reports that identify visual discrepancies between expected and actual UI screenshots.

1. Requirements

Before running this project, ensure you have the following:

  • Python 3.8 or later

  • Ollama (with models: gemma3:4b and gemma3:12b)

  • Pillow

  • NumPy

  • Colorama

2. Installation Instructions

Follow the steps below to set up the project.

  1. Clone the repository:

    git clone https://github.com/yourusername/SnapshotInstructor.git cd SnapshotInstructor
  2. Install the required Python packages:

     
    pip install numpy pillow colorama
  3. Install Ollama:

    • Follow installation instructions on Ollama.

    • Download the necessary models:

       
      ollama pull gemma3:4b ollama pull gemma3:12b
  4. Prepare your dataset:

    • Create a dataset directory in the project root.

    • Execute tests using Xcode and run generate_dataset.sh.

    • The dataset should contain the following files:

      • reference.png: Expected UI screenshot

      • failure.png: Actual UI screenshot with potential discrepancies

      • diff.png: Visualized differences between the two images

      • metadata.json: Metadata including test details and categories

3. Usage Instructions

Interactive Mode:

To start the interactive mode, run:

 
python process_snapshots.py 

This will prompt you with a menu to select from the following options:

  1. Select Model: Choose between the 4b or 12b models

  2. Run Standard Analysis: Identify differences across all snapshots

  3. Run 'Ignore Reason' Analysis (From Analysis): Ignore the primary difference detected in the standard analysis

  4. Run 'Ignore Reason' Analysis (From Metadata): Ignore the first category from metadata

  5. Run 'Analyze and Ignore' Analysis: Analyze and ignore the main difference in one step

Automated Mode:

To run all analyses in batch mode for both models, execute:

 
python process_snapshots.py --all 

This will:

  1. Perform all analysis modes using the 4b model

  2. Perform all analysis modes using the 12b model

  3. Generate a comprehensive report

4. Generating Reports

After completing the analyses, generate a visual HTML report:

 
python generate_report.py 

The report will be created in the reports/ directory and automatically opened in your default web browser.

5. Reported Features

  • Metrics Dashboard: View aggregated accuracy and performance metrics for all analysis modes

  • Test Case Browser: Browse through individual test cases, including images and detailed analysis

  • Visual Comparison: Compare reference and failure images with highlighted differences

Files

LLMShot-1040.zip

Files (165.0 MB)

Name Size Download all
md5:1a977f1fb8997fb1404134809411e1c0
165.0 MB Preview Download