Conference paper Open Access
Yitagesu, Sofonias; Zhang, Xiaowang; Feng, Zhiyong; Li, Xiaohong; Xing, Zhenchang
<?xml version='1.0' encoding='utf-8'?> <resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd"> <identifier identifierType="DOI">10.5281/zenodo.4632063</identifier> <creators> <creator> <creatorName>Yitagesu, Sofonias</creatorName> <givenName>Sofonias</givenName> <familyName>Yitagesu</familyName> <affiliation>Tianjin University, China</affiliation> </creator> <creator> <creatorName>Zhang, Xiaowang</creatorName> <givenName>Xiaowang</givenName> <familyName>Zhang</familyName> <affiliation>Tianjin University, China</affiliation> </creator> <creator> <creatorName>Feng, Zhiyong</creatorName> <givenName>Zhiyong</givenName> <familyName>Feng</familyName> <affiliation>Tianjin University, China</affiliation> </creator> <creator> <creatorName>Li, Xiaohong</creatorName> <givenName>Xiaohong</givenName> <familyName>Li</familyName> <affiliation>Tianjin University, China</affiliation> </creator> <creator> <creatorName>Xing, Zhenchang</creatorName> <givenName>Zhenchang</givenName> <familyName>Xing</familyName> <affiliation>Australian National University, Australia</affiliation> </creator> </creators> <titles> <title>Automatic Part-of-Speech Tagging for Security Vulnerability Descriptions</title> </titles> <publisher>Zenodo</publisher> <publicationYear>2021</publicationYear> <subjects> <subject>Fine-Tuning, Part-of-Speech tagging, Unsupervised word embedding, Security vulnerability descriptions</subject> </subjects> <dates> <date dateType="Issued">2021-03-23</date> </dates> <language>en</language> <resourceType resourceTypeGeneral="ConferencePaper"/> <alternateIdentifiers> <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/4632063</alternateIdentifier> </alternateIdentifiers> <relatedIdentifiers> <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.4632062</relatedIdentifier> </relatedIdentifiers> <rightsList> <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights> <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights> </rightsList> <descriptions> <description descriptionType="Abstract"><p>Abstract&mdash;In this paper, we study the problem of part-of-speech (POS) tagging for security vulnerability descriptions (SVD). In<br> contrast to newswire articles, SVD often contains a high-level natural language description of the text composed of mixed<br> language studded with codes, domain-specific jargon, vague language, and abbreviations. Moreover, training data dedicated<br> to security vulnerability research is not widely available. Existing neural network-based POS tagging has often relied on manually<br> annotated training data or applying natural language processing (NLP) techniques, suffering from two significant drawbacks. The<br> former is extremely time-consuming and requires labor-intensive feature engineering and expertise. The latter is inadequate to<br> identify linguistically-informed words specific to the SVD domain. In this paper, we propose an automatic approach to assign POS<br> tags to tokens in SVD. Our approach uses the character-level representation to automatically extract orthographic features and<br> unsupervised word embeddings to capture meaningful syntactic and semantic regularities from SVD. The character level representations are then concatenated with the word embedding as a combined feature, which is then learned and used to predict<br> the POS tagging. To deal with the issue of the poor availability of annotated security vulnerability data, we implement a finetuning approach. Our approach provides public access to a POS annotated corpus of &sim;8M tokens, which serves as a training dataset in this domain. Our evaluation results show a significant improvement in accuracy (17.72%-28.22%) of POS tagging in SVD over the current approaches.</p></description> </descriptions> </resource>
All versions | This version | |
---|---|---|
Views | 322 | 322 |
Downloads | 261 | 261 |
Data volume | 359.6 MB | 359.6 MB |
Unique views | 296 | 296 |
Unique downloads | 226 | 226 |