Natural Language Inference Dataset for Software Engineering

For RE Conference

doi:10.5281/zenodo.8028035

Published June 11, 2023 | Version v2

Dataset Open

Natural Language Inference Dataset for Software Engineering

For RE Conference

Active research in requirements engineering and software engineering necessitates the application of Natural Language Processing (NLP) techniques to address unique challenges and enhance software quality. However, there is a dearth of effective Natural Language Inference (NLI) datasets for training neural network models to generate distributed sentence representations and tackle diverse NLP tasks. In this paper, we present a NLI dataset, tailored specifically to software engineering, empowers neural network models to effectively handle NLP tasks in this domain. The creation of this dataset involved meticulous annotation and careful consideration of diverse sources, including software documentation, user guides, App reviews and different articles related to software systems. Our dataset maintains compatibility with existing NLI datasets like Stanford Natural Language Inference, facilitating seamless adaptation of models without additional preprocessing.

Files

TrainNLI.txt

Files (376.6 MB)

Name	Size	Download all
TrainNLI.txt md5:c0f7548839d2bbdbd6e6f650d13c5848	376.6 MB	Preview Download

Additional details

Is cited by: 10.5281/zenodo.8025053 (DOI)

	All versions	This version
Views	1,011	62
Downloads	796	67
Data volume	176.2 GB	25.2 GB

Natural Language Inference Dataset for Software Engineering

Authors/Creators

Description

Files

TrainNLI.txt

Files (376.6 MB)

Additional details

Related works