Towards a Semantic Representation for Functional Software Requirements (MARP-5 Dataset + Req2Vec Code)
Creators
Description
Please cite this dataset as: Sonbol, R., Rebdawi, G. and Ghneim, N., 2020, September. Towards a Semantic Representation for Functional Software Requirements. In 2020 IEEE Seventh International Workshop on Artificial Intelligence for Requirements Engineering (AIRE) (pp. 1-8). IEEE.
https://ieeexplore.ieee.org/abstract/document/9233034/
This dataset (MARP-5) consists of 5,852 pairs of requirements (constructed based on a publicly available set of user stories created by Duke University). We annotated MARP-5 based on a 5-points Likert scale:(Extremely related, Very related, Somewhat related, Not very related, Not at all related).
The dataset was independently annotated by two annotators with graduate school educations. The inter-annotator agreement (Cohen’s kappa) between these two reaches 0.73 with a percentage agreement of 88.7% which represents a substantial agreement level. Finally, a third annotator (the first author of this paper) resolved conflicts to produce the final datasets.
The paper associated to the dataset "Towards a Semantic Representation for Functional Software Requirements" can be found here: https://ieeexplore.ieee.org/abstract/document/9233034/
In this paper, we propose a semantic representation, called ReqVec, for functional software requirements. ReqVec is calculated based on three main phases: First, a set of lexical and syntactic steps are performed to analyze textual requirements. Then, semantic dimensions for requirements are calculated based on a words classifier and the well-known word embedding model Word2vec. Finally, ReqVec is constructed based on the representations of these dimensions. Two experiments have been conducted to evaluate how the proposed ReqVec can capture meaningful semantic information to solve two well-known Requirements Engineering tasks: detecting semantic relation between requirements, and requirements categorization. The proposed representation was efficient enough to detect related requirements with 0.92 F-measure (using MARP-5 dataset) and to categorize requirements with 0.88 F-measure.
Files
AIRE PAPER.ipynb
Additional details
References
- Sonbol, R., Rebdawi, G. and Ghneim, N., 2020, September. Towards a Semantic Representation for Functional Software Requirements. In 2020 IEEE Seventh International Workshop on Artificial Intelligence for Requirements Engineering (AIRE) (pp. 1-8). IEEE.