Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published March 15, 2019 | Version v1.0.0
Dataset Open

Security Bug Conversations

  • 1. Rochester Institute of Technology
  • 2. Boston College

Description

This dataset will be released as part of the following publication.

  • Benjamin S. Meyers, Nuthan Munaiah, Andrew Meneely, and Emily Prud'hommeaux. Pragmatic Characteristics of Security Conversation: An Exploratory Linguistic Analysis. Forthcoming. Proceedings of the 12th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE 2019). Montréal, QC, Canada.

Files:

security_bug_conversations.csv

The full dataset containing over 2.1 million comments posted by developers discussing bugs in the Chromium project. The dataset also includes the values we calculated for the five pragmatic features (described in Section 3 of the paper cited above).

CSV Fields:

  • Organizational:
    • Bug ID: Unique identifier of a bug discussion in the Chromium project. The URL https://bugs.chromium.org/p/chromium/issues/detail?id=<Bug ID> may be used to access the bug online
    • Comment ID: Unique identifier of a comment in a bug discussion
  • Classification:
    • Is Security: Binary indicator of whether or not a comment is part of a bug that is about security
  • Natural Language:
    • Comment Text: The raw natural language text of the bug comment
  • Linguistic Metrics:
    • Min. Formality: Minimum of the formality of sentences in the bug comment
    • Max. Formality: Maximum of the formality of sentences in the bug comment
    • Max. Informativeness: Maximum of the informativeness of sentences in the bug comment
    • Max. Implicature: Maximum of the implicature of sentences in the bug comment
    • Min. Politeness: Minimum of the politeness of sentences in the bug comment
    • Max. Politeness: Maximum of the politeness of sentences in the bug comment
    • Number of Tokens
    • Number of Sentences
    • Has Doxastic Uncertainty: Binary indicator of presence of a sentence with doxastic uncertainty in the bug comment
    • Has Epistemic Uncertainty: Binary indicator of presence of a sentence with epistemic uncertainty in the bug comment
    • Has Conditional Uncertainty: Binary indicator of presence of a sentence with conditional uncertainty in the bug comment
    • Has Investigational Uncertainty: Binary indicator of presence of a sentence with investigational uncertainty in the bug comment
    • Has Uncertainty: Binary indicator of presence of a sentence with any uncertainty in the bug comment

Notes

Whenever possible, we would appreciate it if you cite both the paper that released this work and the DOI for this dataset. Thank you!

Files

security_bug_coversations.csv

Files (1.2 GB)

Name Size Download all
md5:bd04e9a1c4eeede6d75a44cba283f0c4
1.2 GB Preview Download