Dataset Open Access

Security Bug Conversations

Benjamin S. Meyers; Nuthan Munaiah; Andrew Meneely; Emily Prud'hommeaux

This dataset will be released as part of the following publication.

  • Benjamin S. Meyers, Nuthan Munaiah, Andrew Meneely, and Emily Prud'hommeaux. Pragmatic Characteristics of Security Conversation: An Exploratory Linguistic Analysis. Forthcoming. Proceedings of the 12th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE 2019). Montréal, QC, Canada.

Files:

security_bug_conversations.csv

The full dataset containing over 2.1 million comments posted by developers discussing bugs in the Chromium project. The dataset also includes the values we calculated for the five pragmatic features (described in Section 3 of the paper cited above).

CSV Fields:

  • Organizational:
    • Bug ID: Unique identifier of a bug discussion in the Chromium project. The URL https://bugs.chromium.org/p/chromium/issues/detail?id=<Bug ID> may be used to access the bug online
    • Comment ID: Unique identifier of a comment in a bug discussion
  • Classification:
    • Is Security: Binary indicator of whether or not a comment is part of a bug that is about security
  • Natural Language:
    • Comment Text: The raw natural language text of the bug comment
  • Linguistic Metrics:
    • Min. Formality: Minimum of the formality of sentences in the bug comment
    • Max. Formality: Maximum of the formality of sentences in the bug comment
    • Max. Informativeness: Maximum of the informativeness of sentences in the bug comment
    • Max. Implicature: Maximum of the implicature of sentences in the bug comment
    • Min. Politeness: Minimum of the politeness of sentences in the bug comment
    • Max. Politeness: Maximum of the politeness of sentences in the bug comment
    • Number of Tokens
    • Number of Sentences
    • Has Doxastic Uncertainty: Binary indicator of presence of a sentence with doxastic uncertainty in the bug comment
    • Has Epistemic Uncertainty: Binary indicator of presence of a sentence with epistemic uncertainty in the bug comment
    • Has Conditional Uncertainty: Binary indicator of presence of a sentence with conditional uncertainty in the bug comment
    • Has Investigational Uncertainty: Binary indicator of presence of a sentence with investigational uncertainty in the bug comment
    • Has Uncertainty: Binary indicator of presence of a sentence with any uncertainty in the bug comment

Whenever possible, we would appreciate it if you cite both the paper that released this work and the DOI for this dataset. Thank you!
Files (1.2 GB)
Name Size
security_bug_coversations.csv
md5:bd04e9a1c4eeede6d75a44cba283f0c4
1.2 GB Download
37
31
views
downloads
All versions This version
Views 3737
Downloads 3131
Data volume 37.2 GB37.2 GB
Unique views 3030
Unique downloads 2323

Share

Cite as