Published March 26, 2025 | Version v1
Dataset Open

LLMs and Stack Overflow Discussions: Reliability, Impact, and Challenges

  • 1. ROR icon Polytechnique Montréal
  • 2. ROR icon Helmholtz Center for Information Security

Description

Since the release of ChatGPT in November 2022, the landscape of developer Q&A platforms, particularly Stack Overflow, has undergone significant changes. The ability of large language models (LLMs) to generate immediate, human-like responses to technical questions has started discussions on their potential to replace traditional Q&A platforms. This dataset was collected as part of an empirical study analyzing Stack Overflow questions and evaluating responses generated by ChatGPT and LLaMA.

The dataset supports research aimed at:

  1. Assessing the reliability of LLM-generated answers and their potential long-term impact on platforms like Stack Overflow.

  2. Tracking the evolution of user engagement with Stack Overflow post-ChatGPT’s release.
  3. Comparing the performance of ChatGPT and LLaMA across different topics.

Files

code.zip

Files (2.1 GB)

Name Size Download all
md5:b397dac8a87134b14d45a1cebe8527da
65.9 kB Preview Download
md5:2613564e0a2d314b43bac6f93622aeb6
17.9 MB Preview Download
md5:c62821fbcf4f77f1687bd949d62cb470
2.0 GB Preview Download