Published September 1, 2020 | Version Version 1
Dataset Open

Towards a Semantic Representation for Functional Software Requirements (MARP-5 Dataset + Req2Vec Code)

Description

Please cite this dataset as: Sonbol, R., Rebdawi, G. and Ghneim, N., 2020, September. Towards a Semantic Representation for Functional Software Requirements. In 2020 IEEE Seventh International Workshop on Artificial Intelligence for Requirements Engineering (AIRE) (pp. 1-8). IEEE.

https://ieeexplore.ieee.org/abstract/document/9233034/


This dataset (MARP-5) consists of 5,852 pairs of requirements (constructed based on a publicly available set of user stories created by Duke University). We annotated MARP-5 based on a 5-points Likert scale:(Extremely related, Very related, Somewhat related, Not very related, Not at all related).
The dataset was independently annotated by two annotators with graduate school educations. The inter-annotator agreement (Cohen’s kappa) between these two reaches 0.73 with a percentage agreement of 88.7% which represents a substantial agreement level. Finally, a third annotator (the first author of this paper) resolved conflicts to produce the final datasets.

The paper associated to the dataset "Towards a Semantic Representation for Functional Software Requirements" can be found here: https://ieeexplore.ieee.org/abstract/document/9233034/

In this paper, we propose a semantic representation, called ReqVec, for functional software requirements. ReqVec is calculated based on three main phases: First, a set of lexical and syntactic steps are performed to analyze textual requirements. Then, semantic dimensions for requirements are calculated based on a words classifier and the well-known word embedding model Word2vec. Finally, ReqVec is constructed based on the representations of these dimensions. Two experiments have been conducted to evaluate how the proposed ReqVec can capture meaningful semantic information to solve two well-known Requirements Engineering tasks: detecting semantic relation between requirements, and requirements categorization. The proposed representation was efficient enough to detect related requirements with 0.92 F-measure (using MARP-5 dataset) and to categorize requirements with 0.88 F-measure.

Files

AIRE PAPER.ipynb

Files (773.3 kB)

Name Size Download all
md5:5ba43df4ad4c40c1800e748a71407f51
463.4 kB Download
md5:cb096762e71fed44b8d0a2c16db99f12
131.6 kB Preview Download
md5:d71da1f6395483b954347aaba5c6507a
89.8 kB Download
md5:66858ff3d0c2965376bc5ac9318529bc
88.5 kB Download

Additional details

References

  • Sonbol, R., Rebdawi, G. and Ghneim, N., 2020, September. Towards a Semantic Representation for Functional Software Requirements. In 2020 IEEE Seventh International Workshop on Artificial Intelligence for Requirements Engineering (AIRE) (pp. 1-8). IEEE.