Human-Machine Co-boosted Bug Report Identification with Mutualistic Neural Active Learning

Long, Guoming; Wang, Shihai; Fang, Hui; Chen, Tao

doi:10.5281/zenodo.19193244

Published March 23, 2026 | Version v1

Dataset Open

Human-Machine Co-boosted Bug Report Identification with Mutualistic Neural Active Learning

1. Loughborough University
2. University of Electronic Science and Technology of China
3. University of Birmingham

Description

This repository accompanies the paper on Human–Machine Bug Report Identification and provides all materials required to reproduce the experimental results.

It includes:

Source code implementing the MNAL (Model-based Neural Active Learning) approach
Scripts for experiments corresponding to RQ1–RQ4
Data preprocessing and model initialization pipelines
Result analysis utilities for reproducing tables and figures
Additional experimental code for discussion studies

Repository Structure

.
├── data_gen.py # Data preprocessing
├── model_gen.py # Model initialization (warm-up training)
├── rq1.py # Experiment for RQ1
├── rq2.py # Experiment for RQ2
├── rq3.py # Experiment for RQ3
├── rq4.py # Experiment for RQ4
├── rq4_HINT/ # Reproduction of HINT method (ICSE 2024)
├── result_analysis.py # Result analysis (tables & figures)
├── discussion/ # Additional experiments
├── README.md

Data

Training set:
https://tickettagger.blob.core.windows.net/datasets/nlbse23-issue-classification-train.csv.tar.gz

Test set:
https://tickettagger.blob.core.windows.net/datasets/nlbse23-issue-classification-test.csv.tar.gz

Full experimental results:
https://www.dropbox.com/scl/fo/o45rrmaolsvnfp8zldqox/h?rlkey=zkqrpev4qqpxyftr9jukvnk45&dl=0

Reproducibility

Environment:

Python 3.10
PyTorch 1.12.1
CUDA 11.7

Example command:

python rq1.py --initial_size <INT> --query_size <INT> --method_setting <METHOD> --start_from_run <RUN> --start_from_step <STEP>

Result Analysis

Generate tables:

python result_analysis.py --table <TABLE_ID>

Generate figures:

python result_analysis.py --fig <FIG_ID>

Additional Experiments

The discussion/ directory includes experiments on:

Sampling strategies
Imbalanced datasets
Upper-bound performance

Shell scripts are provided for execution.

Reference

Gao et al., Learning in the Wild: Towards Leveraging Unlabeled Data for Effectively Tuning Pre-trained Code Models, ICSE 2024.

Files

MNAL-main.zip

Files (196.2 kB)

Name	Size	Download all
MNAL-main.zip md5:3e0f1e30d3353f3e4ae9046063c9e1af	196.2 kB	Preview Download

Additional details

Submitted: 2026-03-23

	All versions	This version
Views	31	31
Downloads	16	16
Data volume	3.1 MB	3.1 MB

Human-Machine Co-boosted Bug Report Identification with Mutualistic Neural Active Learning

Authors/Creators

Description

Description

Repository Structure

Data

Reproducibility

Result Analysis

Additional Experiments

Reference

Files

MNAL-main.zip

Files (196.2 kB)

Additional details

Dates