Characterizing Task-Relevant Information in Natural Language Software Artifacts
Contains the supplementary material for the paper "Characterizing Task-Relevant Information
in Natural Language Software Artifacts". All contents are explained in the file README.md.
Abstract: To complete a software development task, a software developer often consults artifacts that contain largely natural language text, such as API documentation, bug reports, or Q&A forums. Not all information within these artifacts is relevant to a developer's current task forcing the developer to filter relevant information from large amounts of irrelevant information, a frustrating and time-consuming activity. Since failing to locate relevant information may lead to incorrect or incomplete solutions, many approaches mine potentially relevant text from such natural language artifacts. However, existing approaches identify text relevant for only certain categories of tasks (e.g., learning an API) and from a restricted set of artifact types. To explore how limitations on software development tasks and artifact types can be relaxed in future approaches, we conducted an experiment in which 20 participants identified which text appearing in 1874 sentences across 20 artifacts was relevant to six software development tasks. Participants created 2,463 distinct highlights in these sentences to indicate relevance. Although the results indicate variability in the text perceived as relevant, we observe consistency in the information considered key for task completion. The semantic meaning of relevant information, as identified through semantic frames, shows promise to automate the identification of relevant text. We discuss implications of our study for future research in the field.