There is a newer version of the record available.

Published June 11, 2023 | Version v2
Dataset Open

Natural Language Inference Dataset for Software Engineering

Authors/Creators

Description

Active research in requirements engineering and software engineering necessitates the application of Natural Language Processing (NLP) techniques to address unique challenges and enhance software quality. However, there is a dearth of effective Natural Language Inference (NLI) datasets for training neural network models to generate distributed sentence representations and tackle diverse NLP tasks. In this paper, we present a NLI dataset, tailored specifically to software engineering, empowers neural network models to effectively handle NLP tasks in this domain. The creation of this dataset involved meticulous annotation and careful consideration of diverse sources, including software documentation, user guides, App reviews and different articles related to software systems. Our dataset maintains compatibility with existing NLI datasets like Stanford Natural Language Inference, facilitating seamless adaptation of models without additional preprocessing.

Files

TrainNLI.txt

Files (376.6 MB)

Name Size Download all
md5:c0f7548839d2bbdbd6e6f650d13c5848
376.6 MB Preview Download

Additional details

Related works

Is cited by
10.5281/zenodo.8025053 (DOI)