PURE: a Dataset of Public Requirements Documents

Ferrari, Alessio; Spagnolo, Giorgio Oronzo; Gnesi, Stefania

doi:10.5281/zenodo.1414117

Published September 12, 2018 | Version 1.0

Dataset Open

PURE: a Dataset of Public Requirements Documents

1. ISTI-CNR
2. ISTI

Please cite this dataset as Ferrari, A., Spagnolo, G. O., & Gnesi, S. (2017, September). PURE: A dataset of public requirements documents. In 2017 IEEE 25th International Requirements Engineering Conference (RE) (pp. 502-505). IEEE.

https://ieeexplore.ieee.org/abstract/document/8049173

This dataset presents PURE (PUblic REquirements dataset), a dataset of 79 publicly available natural language requirements documents collected from the Web. The dataset includes 34,268 sentences and can be used for natural language processing tasks that are typical in requirements engineering, such as model synthesis, abstraction identification and document structure assessment. It can be further annotated to work as a benchmark for other tasks, such as ambiguity detection, requirements categorisation and identification of equivalent re-quirements. In the associated paper, we present the dataset and we compare its language with generic English texts, showing the peculiarities of the requirements jargon, made of a restricted vocabulary of domain-specific acronyms and words, and long sentences. We also present the common XML format to which we have manually ported a subset of the documents, with the goal of facilitating replication of NLP experiments. The XML documents are also available for download.

The paper associated to the dataset can be found here:

https://ieeexplore.ieee.org/document/8049173/

More info about the dataset is available here:

http://nlreqdataset.isti.cnr.it

Preprint of the paper available at ResearchGate:

https://goo.gl/HxJD7X

Files

requirements-xml.zip

Files (34.0 MB)

Name	Size	Download all
requirements-xml.zip md5:c81235c40f88a2c947ae66e0eddad585	378.7 kB	Preview Download
requirements.zip md5:bc319fe28619f6290badff328ca159dd	33.6 MB	Preview Download

Additional details

Is cited by: 10.1145/3168365.3168381 (DOI)
Is supplemented by: 10.1109/RE.2017.29 (DOI)

	All versions	This version
Views	25,375	21,205
Downloads	8,003	6,026
Data volume	195.7 GB	161.8 GB

PURE: a Dataset of Public Requirements Documents

Authors/Creators

Description

Files

requirements-xml.zip

Files (34.0 MB)

Additional details

Related works