Published January 16, 2020 | Version 1
Journal article Open

DeepQL: Duplicate Bug Report Detection using Attention Mechanism and Replicated Cluster Information

Authors/Creators

Description

In large-scale software development environments, defect reports are maintained through Bug Tracking Systems (BTS) and analyzed by domain experts. Since different users may create bug reports in a non-standard manner, each user can report a particular problem with a unique set of words. Therefore, different reports may describe the same problem, generating duplication. In order to avoid redundant tasks for the development team, an expert needs to look at all new reports while trying to label duplicates. This scenario is not trivial neither scalable and has a direct impact on bug fix correction time. Recent efforts to find the best latent space to describe duplicates tend to focus on deep neural approaches that consider hybrid information from bug reports as textual and categorical features. Unfortunately, these approaches ignores that a single bug can have multiple previously identified duplicates and, therefore, multiple textual descriptions, titles and categorical information. In this work, we propose DeepQL, a duplicate bug report detection method that considers not only information on individual bugs, but also collective information from bug clusters. DeepQL combines attention mechanisms, which were not previously used in this task, with a novel loss function called Quintet Loss, that considers the centroid of duplicate bug report representation clusters and their contextual information. We validated our approach on the well-known open-source software repositories Eclipse, NetBeans, Firefox, and Open Office, that comprises more than 500 thousand bug reports. We evaluated both retrieval and classification of duplicates, reporting a mean accuracy of 68% on Recall@25 for retrieval and 90% AUROC for classification tasks.

Files

Files (25.7 kB)

Name Size Download all
md5:c5b8e1b85501dbaaabd2fee4adee2060
25.7 kB Download