Published April 12, 2021 | Version v1
Presentation Open

A Large-scale Study on API Misuses in the Wild

  • 1. Kennesaw State University
  • 2. Tianjin University
  • 3. The University of Texas at Dallas
  • 4. Peking University
  • 5. University of Illinois at Urbana-Champaign

Description

API misuses are prevalent and extremely harmful.
Despite various techniques have been proposed for API-misuse
detection, it is not even clear how different types of API misuses
distribute and whether existing techniques have covered all major
types of API misuses. Therefore, in this paper, we conduct the
first large-scale empirical study on API misuses based on 528,546
historical bug-fixing commits from GitHub (from 2011 to 2018).
By leveraging a state-of-the-art fine-grained AST differencing
tool, GumTree, we extract more than one million bug-fixing
edit operations, 51.7% of which are API misuses. We further
systematically classify API misuses into nine different categories
according to the edit operations and context. We also extract
various frequent API-misuse patterns based on the categories
and corresponding operations, which can be complementary to
existing API-misuse detection tools. Our study reveals various
practical guidelines regarding the importance of different types
of API misuses. Furthermore, based on our dataset, we perform
a user study to manually analyze the usage constraints of 10
patterns to explore whether the mined patterns can guide the
design of future API-misuse detection tools. Specifically, we find
that 7,541 potential misuses still exist in latest Apache projects
and 149 of them have been reported to developers. To date, 57
have already been confirmed and fixed (with 15 rejected misuses
correspondingly). The results indicate the importance of studying
historical API misuses and the promising future of employing our
mined patterns for detecting unknown API misuses.
 

Files

A Large-scale Study on API Misuses in the Wild.mp4

Files (251.2 MB)