Dataset Open Access

Security Bug Conversations

Benjamin S. Meyers; Nuthan Munaiah; Andrew Meneely; Emily Prud'hommeaux

This dataset will be released as part of the following publication.

  • Benjamin S. Meyers, Nuthan Munaiah, Andrew Meneely, and Emily Prud'hommeaux. Pragmatic Characteristics of Security Conversation: An Exploratory Linguistic Analysis. Forthcoming. Proceedings of the 12th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE 2019). Montréal, QC, Canada.

Files:

security_bug_conversations.csv

The full dataset containing over 2.1 million comments posted by developers discussing bugs in the Chromium project. The dataset also includes the values we calculated for the five pragmatic features (described in Section 3 of the paper cited above).

CSV Fields:

  • Organizational:
    • Bug ID: Unique identifier of a bug discussion in the Chromium project. The URL https://bugs.chromium.org/p/chromium/issues/detail?id=<Bug ID> may be used to access the bug online
    • Comment ID: Unique identifier of a comment in a bug discussion
  • Classification:
    • Is Security: Binary indicator of whether or not a comment is part of a bug that is about security
  • Natural Language:
    • Comment Text: The raw natural language text of the bug comment
  • Linguistic Metrics:
    • Min. Formality: Minimum of the formality of sentences in the bug comment
    • Max. Formality: Maximum of the formality of sentences in the bug comment
    • Max. Informativeness: Maximum of the informativeness of sentences in the bug comment
    • Max. Implicature: Maximum of the implicature of sentences in the bug comment
    • Min. Politeness: Minimum of the politeness of sentences in the bug comment
    • Max. Politeness: Maximum of the politeness of sentences in the bug comment
    • Number of Tokens
    • Number of Sentences
    • Has Doxastic Uncertainty: Binary indicator of presence of a sentence with doxastic uncertainty in the bug comment
    • Has Epistemic Uncertainty: Binary indicator of presence of a sentence with epistemic uncertainty in the bug comment
    • Has Conditional Uncertainty: Binary indicator of presence of a sentence with conditional uncertainty in the bug comment
    • Has Investigational Uncertainty: Binary indicator of presence of a sentence with investigational uncertainty in the bug comment
    • Has Uncertainty: Binary indicator of presence of a sentence with any uncertainty in the bug comment

Whenever possible, we would appreciate it if you cite both the paper that released this work and the DOI for this dataset. Thank you!
Files (1.2 GB)
Name Size
security_bug_coversations.csv
md5:bd04e9a1c4eeede6d75a44cba283f0c4
1.2 GB Download
43
33
views
downloads
All versions This version
Views 4343
Downloads 3333
Data volume 39.6 GB39.6 GB
Unique views 3636
Unique downloads 2525

Share

Cite as