# Manual Validation Methodology

## Overview

This directory contains the manual validation results used to quantify the label noise in the JML-BugDB dataset. The validation provides empirical evidence for the 29.2% precision rate reported in the accompanying manuscript.

## Files

### manual_validation_final.csv

A stratified random sample of SZZ-labeled "bug-introducing" commits, manually reviewed by domain experts.

| Metric | Value |
|--------|-------|
| **Total Samples** | 398 |
| **True Positives** | 116 (29.2%) |
| **False Positives** | 282 (70.8%) |
| **95% Confidence Interval** | 24.7% - 33.8% |

### Column Descriptions

| Column | Type | Description |
|--------|------|-------------|
| `project` | String | Source project (kafka, gson, commons-io) |
| `commit_hash` | String | SHA-1 hash of the bug-introducing commit |
| `file_path` | String | Path to the file within the repository |
| `fix_commit` | String | SHA-1 hash of the bug-fixing commit that triggered SZZ |
| `is_true_bug` | Boolean | TRUE = genuine bug-introducing change, FALSE = false positive |
| `category` | String | Classification category (see below) |
| `reasoning` | String | Human-annotated justification for the classification |

## False Positive Categories

Based on manual analysis of 282 false positives:

| Category | Count | Percentage | Description |
|----------|-------|------------|-------------|
| **Refactoring** | ~119 | 42.3% | Code restructuring without functional changes |
| **Feature Addition** | ~51 | 18.1% | New functionality later modified |
| **Documentation** | ~29 | 10.4% | Comments, Javadoc, or documentation changes |
| **Test Changes** | ~25 | 8.9% | Test file modifications |
| **Build/Config** | ~22 | 7.8% | Build scripts, configuration files |
| **Formatting** | ~20 | 7.1% | Whitespace, code style changes |
| **Other** | ~16 | 5.4% | Miscellaneous non-bug changes |

## Sampling Methodology

1. **Stratified Random Sampling:** Samples drawn proportionally from each project based on their contribution to the total dataset
2. **Project Distribution:**
   - Apache Kafka: ~55% of samples
   - Google Gson: ~25% of samples
   - Apache Commons-IO: ~20% of samples
3. **Validation Process:**
   - Each sample reviewed by examining original commit context
   - Cross-referenced with issue tracker when available
   - Temporal evolution analyzed to understand change purpose

## Statistical Significance

- **Sample Size:** 398 (from 91,659 total instances)
- **Confidence Level:** 95%
- **Margin of Error:** ±4.5%
- **Precision Estimate:** 29.2% (95% CI: 24.7% - 33.8%)

The sample size provides sufficient statistical power to characterize the noise distribution while remaining feasible for manual expert review.

## Citation

If you use this validation data, please cite the accompanying paper.
Citation details will be updated upon publication.

## License

This validation dataset is released under the same license as the parent JML-BugDB dataset.
