Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published August 5, 2022 | Version v1
Dataset Open

Online Appendix of "Crowd-based Requirements Elicitation via Pull Feedback: Method and Case Studies"

  • 1. Royal Netherlands Marechaussee
  • 2. Tournify
  • 3. Utrecht University

Description

The online appendix contains the data sets used in the paper Crowd-based Requirements Elicitation via Pull Feedback: Method and Case Studies, by Jelle Wouters, Abel Menkveld, Sjaak Brinkkemper and Fabiano Dalpiaz. It consists of three data sets, a process-deliverable diagram, a file that contains charts, and two Jupyter (Python) notebook scripts. In this readme, we briefly explain how files should be read and interpreted.

Dataset-Tournify.xlsx

This file contains all data of the Tournify case. The following tabs are present:

  • Raw data: contains the raw data collected from the Tournify CrowdRE platform.

  • Automatically translated data: The readability and vagueness measures were calculated using English text. Most of the ideas collected were in Dutch, so the raw data was translated into English using a Google Translator API.

  • Readability scores: Consists of the Flesch and ARI readability scores, calculated using the Python scripts (see below). 

  • Vague hits: Consists of all the vague words found in the Tournify ideas. We identified those using a Python script. Using numbers we identified whether the hit was a True Positive (TP) or was a false positive, and in which category the false positive lied. 

  • Tagging-50-FD & Tagging-50-JW: These tabs contain the tagging of the ideas on qualities of the QUS-framework and the ISO/IEC 25010. Two researchers did this independently from another.

  • Compare-50: This sheet consists of all ideas for which a disparity exists between the two fields. When a disagreement exists, the field is colored green. By text in the field, we indicated what the final decision was. 

  • Result-50: This tab combines the results and shows the final decision after deliberation between the two authors.

  • Tagging-195-FB, Tagging-195-JW, Compare-195, Result-195: Same as above, but for the other 195 ideas in the case (we split this data set in two to try out the modus operandi first).

  • Result-total: Combines the Result-50 and the Result-195 sheet. Colored cells indicate that we marked the idea to be considered to present verbatim in the paper. 

  • Kappa-scores: Calculates the Kappa-scores that are presented in the paper.

Dataset-SSys.xlsx and Dataset-VSys.xlsx

This file contains all data for respectively the S-Sys and V-Sys cases. The following tabs are present:

  • Raw data: consists of the raw data collected in the CrowdRE platform of the case. As can be seen, some data is ‘not published’ (but was analyzed in the study and read by both researchers and therefore just redacted in the online appendix), and some data is ‘classified’ (and therefore not analyzed by both researchers as one researcher was not allowed to review the data). The classified ideas should be considered as non-existent in the rest of the data set. 

  • Automatically translated data, readability scores: These files are compiled in the same way as in the Tournify case.

  • Vagueness: This file is compiled in the same way as in the Tournify case, although we do provide a small explanation that describes a part of the idea to show why a true or false positive was indicated.

  • Tagging: The tagging of the QUS-framework and ISO/IEC 25010 was done here. As we did this in person, the small discussion held when discrepancies occurred is not presented in the file. The colored cells indicate ideas we considered for verbatim publication in the paper. The colors indicate why we want to publish a certain idea (for example, because it has all QUS-violations).

  • Kappa-scores: This shows the kappa scores and the discrepancies between the individual tagging. This is indicated in colors. 

Graphs_readability.aspx

This file was used to construct the graphs as presented in figures six and seven in the paper. For this, the readability scores of the three data sets were combined in one tab (one for Flesch and one for ARI) and used in a boxplot.

CREUS-pdd.drawio

This is the source file for the PDD presented in the paper.

Python scripts

Two Jupyter notebooks that "S-Sys V-Sys.ipynb" and "Tournify.ipynb" that we used to calculate the readability scores and to identify the vague words. As the data sets were structured a bit differently, the scripts are a bit different between the three cases. In the Tournify case, the script and the output are both present in the Jupyter notebook. For the S-Sys and V-Sys cases, we do include the script but omit the output due to confidentiality.

Note:

The data sets are included in our paper to allow readers to explore the data themselves. Please contact the corresponding author if you wish to use the data set for other reasons to obtain an explicit permission, since these user stories should be analyzed with proper domain knowledge in order to draw meaningful conclusions.

Files

S-Sys V-Sys.ipynb

Files (615.1 kB)

Name Size Download all
md5:2216bbf0e1e000e7dc22ed89e4b5c4a8
42.1 kB Download
md5:5ce49627a92d3cce5cd4b812bd58b4c0
48.6 kB Download
md5:a7555774c1a3836287733f7a9f34f081
358.1 kB Download
md5:925b07011b2c938f745c28d84d2b1e10
63.4 kB Download
md5:96c444d1fa9d3cf0651c1a4a720dcddf
42.5 kB Download
md5:7232d98112e9118bc083069003551bdc
5.6 kB Preview Download
md5:1f4b6b53d86d608c7999cc185478526e
54.7 kB Preview Download