Published March 23, 2026 | Version v1
Dataset Open

Human-Machine Co-boosted Bug Report Identification with Mutualistic Neural Active Learning

  • 1. ROR icon Loughborough University
  • 2. ROR icon University of Electronic Science and Technology of China
  • 3. ROR icon University of Birmingham

Description

Description

This repository accompanies the paper on Human–Machine Bug Report Identification and provides all materials required to reproduce the experimental results.

It includes:

  • Source code implementing the MNAL (Model-based Neural Active Learning) approach
  • Scripts for experiments corresponding to RQ1–RQ4
  • Data preprocessing and model initialization pipelines
  • Result analysis utilities for reproducing tables and figures
  • Additional experimental code for discussion studies

Repository Structure

.
├── data_gen.py # Data preprocessing
├── model_gen.py # Model initialization (warm-up training)
├── rq1.py # Experiment for RQ1
├── rq2.py # Experiment for RQ2
├── rq3.py # Experiment for RQ3
├── rq4.py # Experiment for RQ4
├── rq4_HINT/ # Reproduction of HINT method (ICSE 2024)
├── result_analysis.py # Result analysis (tables & figures)
├── discussion/ # Additional experiments
├── README.md

Data

Training set:
https://tickettagger.blob.core.windows.net/datasets/nlbse23-issue-classification-train.csv.tar.gz

Test set:
https://tickettagger.blob.core.windows.net/datasets/nlbse23-issue-classification-test.csv.tar.gz

Full experimental results:
https://www.dropbox.com/scl/fo/o45rrmaolsvnfp8zldqox/h?rlkey=zkqrpev4qqpxyftr9jukvnk45&dl=0

Reproducibility

Environment:

  • Python 3.10
  • PyTorch 1.12.1
  • CUDA 11.7

Example command:

python rq1.py --initial_size <INT> --query_size <INT> --method_setting <METHOD> --start_from_run <RUN> --start_from_step <STEP>

Result Analysis

Generate tables:

python result_analysis.py --table <TABLE_ID>

Generate figures:

python result_analysis.py --fig <FIG_ID>

Additional Experiments

The discussion/ directory includes experiments on:

  • Sampling strategies
  • Imbalanced datasets
  • Upper-bound performance

Shell scripts are provided for execution.

Reference

Gao et al., Learning in the Wild: Towards Leveraging Unlabeled Data for Effectively Tuning Pre-trained Code Models, ICSE 2024.

 

Files

MNAL-main.zip

Files (196.2 kB)

Name Size Download all
md5:3e0f1e30d3353f3e4ae9046063c9e1af
196.2 kB Preview Download

Additional details

Dates

Submitted
2026-03-23