Supplementary material for 'The MAP metric in Information Retrieval Fault Localization'

Hirsch, Thomas; Hofer, Birgit

doi:10.5281/zenodo.7817016

Published April 11, 2023 | Version v1

Dataset Open

Supplementary material for 'The MAP metric in Information Retrieval Fault Localization'

1. Graz University of Technology

# map_bench4bl
This is the supplementary material, data, and evaluation source code for the paper "The MAP metric in Information Retrieval Fault Localization" by Thomas Hirsch and Birgit Hofer.

## Preliminaries
### Python environment
- Python 3.8
- pandas 
- numpy 
- matplotlib

## Datasets
The [Bench4BL](https://github.com/exatoa/Bench4BL) dataset has been used in this evaluation, with the addition of intermediate files taken from the [SABL](http://dx.doi.org/10.5281/zenodo.4681242) experiment performed on this Bench4BL dataset.
All data used in our evaluation is included in this repository.
However, if the data is to be re-imported directly from these benchmark and datasets they have to be downloaded first and their local paths have to be set in [paths.py](paths.py).

### Bench4BL
The Bench4BL dataset was published with the paper "Bench4BL: Reproducibility study on the performance of IR-based bug localization" by Lee, J., Kim, D., Bissyandé, T.F., Jung, W. and Le Traon, Y..
The dataset can be obtained [here](https://github.com/exatoa/Bench4BL).
Follow the steps described in the corresponding [README](https://github.com/exatoa/Bench4BL/blob/master/README.md) to set up the dataset.
The Bench4BL dataset contains the _old subjects_ subdataset, containing 558 bugs from AspectJ, JDT, PDE, SWT, and ZXing that have been widely used in older IRFL studies.
This _old subjects_ subdataset was used in answering our RQ1, as discussed below, the corresponding scripts use _old subjects_ in their name to highlight this.

#### SABL
The SABL dataset is the online appendix of the paper "An Extensive Study of Smell-Aware Bug Localization" by TTakahashi, A., Sae-Lim, N., Hayashi, S. and Saeki, M..
The dataset can be downloaded [here](http://dx.doi.org/10.5281/zenodo.4681242).
The experiments in this dataset build on top of Bench4BL and intermediate files are provided in the datapackage.

#### Rankings
Rankings for BLIA, BRTracer, and BugLocator were produced by running these tools on Bench4BL locally.
Rankings for AmaLgam and BLUiR were taken from the SABL experiment dataset.


## Structure
### Folders
Bench4BL ground truths:
- bench4bl_old_subjects_summary
- bench4bl_summary

Localization results of the included tools in Bench4BL:
- bench4bl_localization_results
- bench4bl_localization_results_sabl

Target projects size metrics:
- cloc_results
- cloc_results_old_subjects

Utility functions:
- utils

Output folders containing results, generated figures and tables:
- results
- results_old_subjects

### Scripts
Scripts for re-importing data from Bench4BL and SABL datasets:
- data_preparation_step_1_cloc_bench4bl.py
- data_preparation_step_1_cloc_old_subjects_bench4bl.py
- data_preparation_step_2_import_ground_truth_from_bench4bl.py
- data_preparation_step_2_import_ground_truth_from_old_subjects_bench4bl.py
- data_preparation_step_3_import_bench4bl_ranking_results.py
- data_preparation_step_3_import_sabl_ranking_results.py

Utilities:
- paths.py
- utils/bench4bl_utils.py
- utils/Logger.py


### Evaluation scripts for the corresponding research questions:
**Dataset analysis:**
- rq_0_dataset_analysis_bench4bl_issues.py

**RQ1: How big is the average ground truth in Bench4BL datasets, and what proportion of bugs have a ground truth containing multiple files?**
- rq_1_bench4bl_ground_truth_size.py
- rq_1_old_subjects_bench4bl_ground_truth_size.py

RQ2: Do the IRFL tools included in Bench4BL truncate their results?
- rq_2_ranking_lengths.py

**RQ3: How strong is $AP_{asrd}$ overestimating $AP_{mb}$ for truncated BugLocator retrieval results on the Bench4BL dataset?
RQ3a: How strong is $AP_{asrd}$ overestimating $AP_{mb}$ for truncated BugLocator retrieval results when considering the bloated ground truth issue found in Bench4BL?**
- rq_3_truncating_BugLocator_rankings_bench4bl.py

**RQ3b: How strong is $AP_{asrd}$ overestimating $AP_{mb}$ for truncated BugLocator retrieval results when undefined $AP$ values are simply ignored?**
- rq_3b_undefined_ap_BugLocator_rankings_bench4bl.py


## Licence
All code and results are licensed under [CCA v4](https://creativecommons.org/licenses/by/4.0/), according to LICENSE file.
Other licences may apply for some tools and datasets contained in this repo: [cloc-1.92.pl](https://github.com/AlDanial/cloc) under GPL v2, [Bench4BL](https://github.com/exatoa/Bench4BL) and [SABL](http://dx.doi.org/10.5281/zenodo.4681242) under CCA 4.0.

Files

map_bench4bl.zip

Files (2.1 GB)

Name	Size	Download all
map_bench4bl.zip md5:13d4d07fdf35b43ae37823bd5896ab89	2.1 GB	Preview Download

Additional details

FWF Austrian Science Fund
Automated Debugging in Use P 32653

	All versions	This version
Views	237	237
Downloads	25	25
Data volume	58.5 GB	58.5 GB

Supplementary material for 'The MAP metric in Information Retrieval Fault Localization'

Creators

Description

Files

map_bench4bl.zip

Files (2.1 GB)

Additional details

Funding