Planned intervention: On Thursday 19/09 between 05:30-06:30 (UTC), Zenodo will be unavailable because of a scheduled upgrade in our storage cluster.
Published April 11, 2023 | Version v1
Dataset Open

Supplementary material for 'The MAP metric in Information Retrieval Fault Localization'

  • 1. Graz University of Technology

Description

# map_bench4bl
This is the supplementary material, data, and evaluation source code for the paper "The MAP metric in Information Retrieval Fault Localization" by Thomas Hirsch and Birgit Hofer.

## Preliminaries
### Python environment
- Python 3.8
- pandas 
- numpy 
- matplotlib

## Datasets
The [Bench4BL](https://github.com/exatoa/Bench4BL) dataset has been used in this evaluation, with the addition of intermediate files taken from the [SABL](http://dx.doi.org/10.5281/zenodo.4681242) experiment performed on this Bench4BL dataset.
All data used in our evaluation is included in this repository.
However, if the data is to be re-imported directly from these benchmark and datasets they have to be downloaded first and their local paths have to be set in [paths.py](paths.py).

### Bench4BL
The Bench4BL dataset was published with the paper "Bench4BL: Reproducibility study on the performance of IR-based bug localization" by Lee, J., Kim, D., Bissyandé, T.F., Jung, W. and Le Traon, Y..
The dataset can be obtained [here](https://github.com/exatoa/Bench4BL).
Follow the steps described in the corresponding [README](https://github.com/exatoa/Bench4BL/blob/master/README.md) to set up the dataset.
The Bench4BL dataset contains the _old subjects_ subdataset, containing 558 bugs from AspectJ, JDT, PDE, SWT, and ZXing that have been widely used in older IRFL studies.
This _old subjects_ subdataset was used in answering our RQ1, as discussed below, the corresponding scripts use _old subjects_ in their name to highlight this.

#### SABL
The SABL dataset is the online appendix of the paper "An Extensive Study of Smell-Aware Bug Localization" by TTakahashi, A., Sae-Lim, N., Hayashi, S. and Saeki, M..
The dataset can be downloaded [here](http://dx.doi.org/10.5281/zenodo.4681242).
The experiments in this dataset build on top of Bench4BL and intermediate files are provided in the datapackage.

#### Rankings
Rankings for BLIA, BRTracer, and BugLocator were produced by running these tools on Bench4BL locally.
Rankings for AmaLgam and BLUiR were taken from the SABL experiment dataset.


## Structure
### Folders
Bench4BL ground truths:
- bench4bl_old_subjects_summary
- bench4bl_summary

Localization results of the included tools in Bench4BL:
- bench4bl_localization_results
- bench4bl_localization_results_sabl

Target projects size metrics:
- cloc_results
- cloc_results_old_subjects

Utility functions:
- utils

Output folders containing results, generated figures and tables:
- results
- results_old_subjects

### Scripts
Scripts for re-importing data from Bench4BL and SABL datasets:
- data_preparation_step_1_cloc_bench4bl.py
- data_preparation_step_1_cloc_old_subjects_bench4bl.py
- data_preparation_step_2_import_ground_truth_from_bench4bl.py
- data_preparation_step_2_import_ground_truth_from_old_subjects_bench4bl.py
- data_preparation_step_3_import_bench4bl_ranking_results.py
- data_preparation_step_3_import_sabl_ranking_results.py

Utilities:
- paths.py
- utils/bench4bl_utils.py
- utils/Logger.py


### Evaluation scripts for the corresponding research questions:
**Dataset analysis:**
- rq_0_dataset_analysis_bench4bl_issues.py

**RQ1: How big is the average ground truth in Bench4BL datasets, and what proportion of bugs have a ground truth containing multiple files?**
- rq_1_bench4bl_ground_truth_size.py
- rq_1_old_subjects_bench4bl_ground_truth_size.py

RQ2: Do the IRFL tools included in Bench4BL truncate their results?
- rq_2_ranking_lengths.py

**RQ3: How strong is $AP_{asrd}$ overestimating $AP_{mb}$ for truncated BugLocator retrieval results on the Bench4BL dataset?
RQ3a: How strong is $AP_{asrd}$ overestimating $AP_{mb}$ for truncated BugLocator retrieval results when considering the bloated ground truth issue found in Bench4BL?**
- rq_3_truncating_BugLocator_rankings_bench4bl.py

**RQ3b: How strong is $AP_{asrd}$ overestimating $AP_{mb}$ for truncated BugLocator retrieval results when undefined $AP$ values are simply ignored?**
- rq_3b_undefined_ap_BugLocator_rankings_bench4bl.py


## Licence
All code and results are licensed under [CCA v4](https://creativecommons.org/licenses/by/4.0/), according to LICENSE file.
Other licences may apply for some tools and datasets contained in this repo: [cloc-1.92.pl](https://github.com/AlDanial/cloc) under GPL v2, [Bench4BL](https://github.com/exatoa/Bench4BL) and [SABL](http://dx.doi.org/10.5281/zenodo.4681242) under CCA 4.0.

Files

map_bench4bl.zip

Files (2.1 GB)

Name Size Download all
md5:13d4d07fdf35b43ae37823bd5896ab89
2.1 GB Preview Download

Additional details

Funding

Automated Debugging in Use P 32653
FWF Austrian Science Fund