Testing Static Analyzers via Semantic-Preserving Mutators Learned from Real-World Refactoring Practice

Anonymous

doi:10.5281/zenodo.19350077

Published March 31, 2026 | Version v1

Software Open

Testing Static Analyzers via Semantic-Preserving Mutators Learned from Real-World Refactoring Practice

Anonymous

SAFuzzer

1. SAFuzzer Project Introduction

SAFuzzer is an innovative framework for testing Static Application Security Testing (SAST) tools through semantic-preserving code mutations. The framework employs a three-phase pipeline:

Mutator Invention: Mines patterns from real-world refactoring commits and transforms them into executable Spoon-based mutators via LLM agents
Mutator Refinement: Validates each mutator's semantic-preservation guarantee through rigorous dynamic equivalence checking
Static Analyzer Testing: Applies validated mutators at scale to test static analyzers via metamorphic testing

SAFuzzer supports mainstream SAST tools including SpotBugs, PMD, Infer, CheckStyle, and SonarQube. The framework uses Java Spoon for AST manipulation .

2. Top 10 Mutators Causing Bugs

The following are the top 10 mutators that most frequently cause bugs in SAST tools during testing:

Rank	Mutator Name	Issue Count	Issues
1	EqualityCheckToInstanceofMutator	4	SpotBugs #3916, Infer #2001, Sonar S2259, PMD #6513
2	ConditionalBlockInsertionMutator	4	SpotBugs #3894, Infer #2015, SpotBugs #3929, PMD #6518
3	ParenthesesAdditionMutator	4	SpotBugs #3904, Infer #2015, PMD #6491, CheckStyle #19162
4	IfConditionReorderingMutator	3	SpotBugs #3886, SpotBugs #3920, SpotBugs #3963
5	VariableAdditionMutator	3	SpotBugs #3884, Infer #2015, Infer #1993
6	NullCheckReorderingMutator	3	SpotBugs #3920, SpotBugs #3886, SpotBugs #3916
7	ConditionalLogicInsertionMutator	2	PMD #6518, SpotBugs #3978
8	MethodChainCallSwapMutator	2	SpotBugs #3966, PMD #6494
9	ConditionNegationMutator	2	PMD #6435, SpotBugs #3963
10	SingleLineReturnToBlockReturnMutator	2	PMD #6491, PMD #6519

3. SAFuzzer Usage Guide

Project Architecture Overview

SAFuzzer consists of three main components:

Semantic_Equivalence_Knowledge_Base: A Python-based pipeline for mining semantic-preserving code patterns from real-world refactoring commits using LLM agents and dynamic execution validation. This component extracts transformation patterns from GitHub commits and validates their semantic equivalence.
MutatorExecutor: A standalone Maven project containing mutator implementations and semantic equivalence validation module. This component is used by the knowledge base pipeline to dynamically validate semantic equivalence of code transformations.
Main SAFuzzer Framework: The core Java application that applies validated mutators to test SAST tools via metamorphic testing. This is the primary tool for detecting bugs in static analyzers.

Mutator Generation and Validation Pipeline

The framework includes three Python scripts that automate the mutator invention and refinement process:

1. Stage 1: Mutator Generation (`stage1_generator.py`)

Input: Code pairs from GitHub refactoring commits (raw_diffs_chunk_*_output.json)
Output: Mutator descriptions (JSON) and Java implementations
Process:
1. Uses LLM to analyze code pairs and generate mutator descriptions
2. Converts descriptions into executable Spoon-based Java mutators

Output Structure:

outputs/
├── 1_mutator_description/      # JSON descriptions
└── 2_mutator_implementation/   # Java implementations

2. Stage 2: Compilation Verification (`stage2_compilation_verification.py`)

Input: Java mutators from Stage 1
Output: Compilable mutators
Process:
1. Deploys mutators to sandbox environment
2. Validates compilation using javac
3. Automatically repairs compilation errors using LLM agents (max 5 attempts)
Key Features:
- Parallel processing (16 workers)
- Sandbox isolation for each mutator
- Intelligent repair with specialized tools

Output Structure:

outputs/compilable_3-7/        # Compilable mutators

3. Stage 3: Fast Semantic Verification (`stage3_fast_verify.py`)

Input: Compilable mutators + test seeds
Output: Validation results + repair datasets
Process:
1. Phase 1: Quick verification with 200 seeds
2. Phase 2: Extends to 500 seeds if pass rate < 90%
3. Phase 3: Extends to 1000 seeds if zero triggers
4. Repair loop: LLM-driven repair for failed mutators (max 3 attempts)
Validation Criteria:
- Pass: Trigger rate > 0 AND pass rate ≥ 90% (200 seeds) OR ≥ 80% (extended seeds)
- Fail: Pass rate not met OR zero triggers

Output Structure:

lists/
├── success_list_v2.txt        # Successfully validated mutators
└── fail_list_v2.txt          # Failed mutators

refine_dataset/                # Detailed repair datasets (JSON)

Running the Pipeline

# Step 1: Generate mutators from refactoring commits
python stage1_generator.py

# Step 2: Verify and repair compilation errors
python stage2_compilation_verification.py [start_chunk]

# Step 3: Validate semantic preservation
python stage3_fast_verify.py

Environment Requirements

Java 17 or higher (Maven compilation target)
Python 3.8+ (for analysis scripts in Semantic_Equivalence_Knowledge_Base)
Maven 3.6+ for building the project
8GB RAM minimum, 16GB RAM recommended
20GB free disk space for generated mutants and results

Quick Start with the Core Framework Package

Step 1: Extract and Setup

# Extract the ZIP file
unzip SAFuzzer_Core_Framework_*.zip
cd SAFuzzer_Core_Framework

# Make scripts executable
chmod +x run_complete_pipeline.sh test_pipeline_quick.sh
chmod +x Semantic_Equivalence_Knowledge_Base/run_pipeline.sh

Step 2: Install SAST Tools

Before running SAFuzzer, you need to install the SAST tools. Follow the instructions in tools/README.md to download and install:

SpotBugs 4.9.8
PMD 7.22.0
CheckStyle 13.3.0
Infer 1.2.0
SonarQube Scanner 8.0.1

Step 3: Configure Tool Paths

# Copy the configuration template
cp config.properties.template config.properties

# Edit config.properties with your tool paths
nano config.properties  # or use your favorite editor

Update the paths in config.properties:

spotbugs.jar.path=/absolute/path/to/spotbugs-4.9.8/lib/spotbugs.jar
pmd.cli.path=/absolute/path/to/pmd-bin-7.22.0/bin/pmd
checkstyle.jar.path=/absolute/path/to/checkstyle-13.3.0-all.jar
infer.cli.path=/absolute/path/to/infer-linux-x86_64-v1.2.0/bin/infer
sonar.scanner.path=/absolute/path/to/sonar-scanner-8.0.1.6346-linux-x64/bin/sonar-scanner

Step 4: Build the Project

# Build main SAFuzzer framework
mvn clean compile package

# Build MutatorExecutor (for semantic validation)
cd MutatorExecutor
mvn clean compile
cd ..

Step 5: Install Python Dependencies

# Install required Python packages
pip install -r Semantic_Equivalence_Knowledge_Base/requirements.txt

Step 6: Run Quick Verification Test

# Test if everything works correctly
./test_pipeline_quick.sh

If all tests pass, you're ready to run the full pipeline!

Running the Complete Pipeline

Option A: Run All Three Stages (Recommended)

# This runs the complete SAFuzzer pipeline end-to-end
./run_complete_pipeline.sh

The script will:

Check environment and dependencies
Build the project if needed
Run the Semantic Equivalence Knowledge Base pipeline (Stage 1)
Validate mutators using MutatorExecutor (Stage 2)
Test SAST tools with validated mutators (Stage 3)
Generate results and summary

Option B: Run Individual Stages

Stage 1: Mutator Invention (Pattern Mining)

cd Semantic_Equivalence_Knowledge_Base
./run_pipeline.sh

This stage mines refactoring patterns from GitHub commits. Note: This requires GitHub API access and may take several hours.

Stage 2: Mutator Refinement (Semantic Validation)

cd MutatorExecutor
mvn compile
# The validation is integrated into Stage 1 pipeline

Stage 3: Testing Static Analyzers

# Test a specific test case with SpotBugs
java -cp "target/SASTFuzz-1.0-.jar:target/classes:target/dependency/*" \
  com.mutation.Main \
  --project_path "." \
  --target_case "seeds.PMD_Seeds.bestpractices_AccessorClassGeneration.AccessorClassGeneration1" \
  --target_SAST "SpotBugs" \
  --max_iter 10

# Test all SAST tools on a test case
java -cp "target/SASTFuzz-1.0-.jar:target/classes:target/dependency/*" \
  com.mutation.Main \
  --project_path "." \
  --target_case "seeds.SpotBugs_Seeds.bestpractices_ArrayIsStoredDirectly.ArrayIsStoredDirectly1" \
  --target_SAST "ALL" \
  --max_iter 20

Command Line Parameters

--project_path <arg>      Source code root directory (required)
--target_case <arg>       Target Java class (package.ClassName format) (required)
--target_SAST <arg>       SAST tool to test: SpotBugs, PMD, CheckStyle, 
                          Infer, SonarQube, Semgrep, or ALL (required)
--max_iter <arg>          Maximum mutation iterations (default: 50)

Output Structure

Results are organized in results/run_YYYYMMDD_HHMMSS/:

safuzzer_output.log: Complete execution log
final_results/: Generated mutants and SAST reports
- 0/: Original seed code with baseline SAST analysis
- 1..N/: Each iteration's mutated code and SAST results
- iteration_history.txt: Trace of applied mutators
verification_summary.txt: Pipeline verification results

Advanced Configuration

Custom Mutator Selection

The framework automatically selects from all available mutators. To modify mutator behavior, edit the Scheduler.run() method in src/com/mutation/Scheduler.java.

Rule Coverage Experiment

Enable JaCoCo coverage measurement in config.properties:

jacoco.enabled=true
jacoco.agent.path=/path/to/jacoco-agent.jar
jacoco.cli.path=/path/to/jacoco-cli.jar

Custom SAST Tool Integration

Implement new SAST tool classes extending the SAST abstract class in src/com/mutation/config/.

4. Detected Bug Case Demonstrations

Case 1: PMD `SimplifyConditional` False Negative (#6513)

Bug Description: PMD fails to detect a redundant null check before instanceof when additional conditions are interleaved in the && chain by a semantic-preserving mutation.

Original Code (PMD correctly reports SimplifyConditional):

public class SimplifyConditionalDemo {
    public void foo() {
        String s = "a";
        if (s != null && s instanceof String) {  // <- SimplifyConditional reported (TP)
            System.out.println(s);
        }
    }
}

Mutated Code (PMD silently misses the bug):

public class SimplifyConditionalDemo {
    public void foo() {
        String s = "a";
        String s2 = "a";
        if (s != null && s2 != null && s instanceof String) {  // <- null check still redundant, but NOT reported (FN)
            System.out.println(s);
        }
    }
}

Triggering Mutator: NonNullVarRedundantNullCheckMutator — inserts an additional s2 != null guard into an existing && chain, a common defensive coding pattern that does not change the semantics of the original condition.

Analysis: In both cases the s != null check immediately before s instanceof String is completely redundant, since instanceof already handles null by returning false. PMD's SimplifyConditional detector only matches the pattern when the null check and instanceof are directly adjacent in the && chain. Once any intervening condition is inserted between them, the rule fails to trace the relationship and produces a False Negative. This issue is open and reported on Mar 20, 2026.

Case 2: SpotBugs `IM_BAD_CHECK_FOR_ODD` False Negative (#3886)

Bug Description: SpotBugs fails to detect the incorrect odd-number check pattern when the condition operands are reordered into Yoda-style by a semantic-preserving mutation.

Original Code (SpotBugs correctly reports IM_BAD_CHECK_FOR_ODD):

public class TestModulo {
    public void standardCheck(int i) {
        if (i % 2 == 1) {  // <- IM_BAD_CHECK_FOR_ODD reported (TP)
            System.out.println("Odd");
        }
    }
}

Mutated Code (SpotBugs silently misses the bug):

public class TestModulo {
    public void yodaCheck(int i) {
        if (1 == i % 2) {  // <- semantically identical, but IM_BAD_CHECK_FOR_ODD NOT reported (FN)
            System.out.println("Odd");
        }
    }
}

Triggering Mutator: IfConditionReorderingMutator — rewrites <expr> == <literal> into the Yoda-style <literal> == <expr>, a common and semantically equivalent code transformation.

Analysis: Both i % 2 == 1 and 1 == i % 2 are semantically identical and share the same bug: this check incorrectly returns false for negative odd integers (e.g., -3 % 2 == -1, not 1). SpotBugs' IM_BAD_CHECK_FOR_ODD detector only matches the canonical operand order and fails to recognize the Yoda variant, resulting in a False Negative. This bug was subsequently fixed via PR #3935.

Case 3: PMD `ForLoopCanBeForeach` False Negative (#6495)

Bug Description: PMD fails to detect that a traditional index-based for loop can be replaced by an enhanced foreach loop when the array length is first extracted into a pre-declared local variable by a semantic-preserving mutation.

Original Code (PMD correctly reports ForLoopCanBeForeach):

public class PMD_FN_Demo {
    public void testTruePositive(long[] counts) {
        double total = 0;
        for (int i = 0; i < counts.length; i++) {  // <- ForLoopCanBeForeach reported (TP)
            total += counts[i];
        }
    }
}

Mutated Code (PMD silently misses the bug):

public class PMD_FN_Demo {
    public void testFalseNegative(long[] counts) {
        double total = 0;
        int len = counts.length;               // array length extracted to a local variable
        for (int i = 0; i < len; i++) {        // <- semantically identical, but ForLoopCanBeForeach NOT reported (FN)
            total += counts[i];
        }
    }
}

Triggering Mutator: ConditionalBlockInsertionMutator (combined with loop bound extraction) — hoists the array.length expression into a pre-declared local variable, a standard performance-oriented refactoring that does not change loop semantics.

Analysis: Both loops iterate over the entire array in the same order and produce identical results. PMD's ForLoopCanBeForeach rule performs pattern matching on the loop condition and expects i < array.length literally in the for header. When the bound is stored in an intermediate variable len, the rule's detector fails to trace back to the array and misses the violation. A PR (#6521) has been submitted to address this.

5. Bugs Summary Table

Bug Statistics Overview

The following table summarizes bugs detected across different SAST tools and their current status:

Issue Status	SpotBugs	PMD	Infer	SonarQube	CheckStyle	Overall
Reported	18	10	8	4	2	42
Confirmed	12	6	0	3	1	22
Fixed	2	0	0	0	1	3
Won't Fix	1	0	0	0	0	1

Bug Details

Bug Type	Rule	Status	Issue ID	Issue Link	Rule Link
FN	NN_NAKED_NOTIFY	Reported	#3884	Link	Rule
FN	IM_BAD_CHECK_FOR_ODD	Fixed	#3886	Link	Rule
FN	ST_WRITE_TO_STATIC_FROM_INSTANCE_METHOD	Confirmed	#3893	Link	Rule
FN	UCF_USELESS_CONTROL_FLOW	Confirmed	#3894	Link	Rule
FN	RV_RETURN_VALUE_IGNORED_NO_SIDE_EFFECT	Confirmed	#3900	Link	Rule
FP	IL_INFINITE_RECURSIVE_LOOP	Confirmed	#3904	Link	Rule
FN	SF_SWITCH_NO_DEFAULT	Confirmed	#3905	Link	Rule
FN	NULLPTR_DEREFERENCE	Reported	#1992	Link	Rule
FN	DIVIDE_BY_ZERO	Reported	#1993	Link	Rule
FN	UnconditionalIfStatement	Confirmed	#6435	Link	Rule
FP	INFINITE_EXECUTION_TIME	Reported	#2000	Link	Rule
FN	NP_LOAD_OF_KNOWN_NULL_VALUE	Confirmed	#3916	Link	Rule
FN	NULL_DEREFERENCE	Reported	#2001	Link	Rule
FN	S2259 (Null pointers should not be dereferenced)	Reported	#177381	Link	Rule
FP	DANGLING_POINTER_DEREFERENCE	Reported	#2002	Link	Rule
FN	INFINITE_EXECUTION_TIME	Reported	#2005	Link	Rule
FN	RCN_REDUNDANT_COMPARISON_OF_NULL_AND_NONNULL_VALUE	Confirmed	#3920	Link	Rule
FP	SA_LOCAL_SELF_ASSIGNMENT	Confirmed	#3929	Link	Rule
FN	URF_UNREAD_FIELD	Confirmed	#3955	Link	Rule
FN	NULLPTR_DEREFERENCE	Reported	#2015	Link	Rule
FN	UselessOverridingMethod	Reported	#6491	Link	Rule
FN	CloseResource	Reported	#6494	Link	Rule
FN	LeftCurly	Fixed	#19162	Link	Rule
FN	ForLoopCanBeForeach	Confirmed	#6495	Link	Rule
FN	NP_LOAD_OF_KNOWN_NULL_VALUE	Reported	#3961	Link	Rule
FN	NULLPTR_DEREFERENCE	Reported	#2019	Link	Rule
FP	CWO_CLOSED_WITHOUT_OPENED	Reported	#3962	Link	Rule
FP	IL_INFINITE_RECURSIVE_LOOP	Confirmed	#3963	Link	Rule
FN	DM_STRING_TOSTRING	Fixed	#3966	Link	Rule
FN	SimplifyConditional	Reported	#6513	Link	Rule
FN	SA_FIELD_DOUBLE_ASSIGNMENT	Reported	#3975	Link	Rule
FN	UselessPureMethodCall	Confirmed	#6517	Link	Rule
FN	NS_NON_SHORT_CIRCUIT	Reported	#3976	Link	Rule
FN	UnusedAssignment	Confirmed	#6518	Link	Rule
FP	DoNotUseThreads	Confirmed	#6520	Link	Rule
FN	SimplifyBooleanReturns	Confirmed	#6519	Link	Rule
FN	CollectionTypeMismatch	Reported	#6526	Link	Rule
FP	IL_INFINITE_RECURSIVE_LOOP	Reported	#3978	Link	Rule
FN	AvoidInstantiatingObjectsInLoops	Reported	#6560	Link	Rule
FN	NP_NULL_ON_SOME_PATH	Reported	#3985	Link	Rule
FP	INTEGER_OVERFLOW_L2	Reported	#2027	Link	Rule
FN	Inconsistent synchronization	Reported	#3986	Link	Rule

Files

SAFuzzer_Core_Framework.zip

Files (2.1 MB)

Name	Size	Download all
SAFuzzer_Core_Framework.zip md5:3b132dc5407540186783f01e4ca8732e	2.1 MB	Preview Download

	All versions	This version
Views	19	19
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Testing Static Analyzers via Semantic-Preserving Mutators Learned from Real-World Refactoring Practice

Authors/Creators

Description

SAFuzzer

1. SAFuzzer Project Introduction

2. Top 10 Mutators Causing Bugs

3. SAFuzzer Usage Guide

Project Architecture Overview

Mutator Generation and Validation Pipeline

1. Stage 1: Mutator Generation (stage1_generator.py)

2. Stage 2: Compilation Verification (stage2_compilation_verification.py)

3. Stage 3: Fast Semantic Verification (stage3_fast_verify.py)

Running the Pipeline

Environment Requirements

Quick Start with the Core Framework Package

Step 1: Extract and Setup

Step 2: Install SAST Tools

Step 3: Configure Tool Paths

Step 4: Build the Project

Step 5: Install Python Dependencies

Step 6: Run Quick Verification Test

Running the Complete Pipeline

Option A: Run All Three Stages (Recommended)

Option B: Run Individual Stages

Stage 1: Mutator Invention (Pattern Mining)

Stage 2: Mutator Refinement (Semantic Validation)

Stage 3: Testing Static Analyzers

Command Line Parameters

Output Structure

Advanced Configuration

Custom Mutator Selection

Rule Coverage Experiment

Custom SAST Tool Integration

4. Detected Bug Case Demonstrations

Case 1: PMD SimplifyConditional False Negative (#6513)

Case 2: SpotBugs IM_BAD_CHECK_FOR_ODD False Negative (#3886)

Case 3: PMD ForLoopCanBeForeach False Negative (#6495)

5. Bugs Summary Table

Bug Statistics Overview

Bug Details

Files

SAFuzzer_Core_Framework.zip

Files (2.1 MB)

1. Stage 1: Mutator Generation (`stage1_generator.py`)

2. Stage 2: Compilation Verification (`stage2_compilation_verification.py`)

3. Stage 3: Fast Semantic Verification (`stage3_fast_verify.py`)

Case 1: PMD `SimplifyConditional` False Negative (#6513)

Case 2: SpotBugs `IM_BAD_CHECK_FOR_ODD` False Negative (#3886)

Case 3: PMD `ForLoopCanBeForeach` False Negative (#6495)