Dataset Open Access
PP-ind is a repository of research data on industrial pair programming sessions. Since 2007, our research group has collected audio-video-recordings and questionnaire data in 13 companies. A total of 57 developers worked together (mostly in groups of two, but also three or four) in 67 sessions with a mean length of 1:35 hours. A separate tech report provides many details on how this data was collected.
While we cannot share the original video recordings due to confidentiality agreements, we do provide transcripts of the pairs' dialog in this data set. Since we perform our analyses directly on the video material, we only transcribe our data on an is-needed basis, e.g., in preparation for a publication. This data set will therefore contain only few and partial transcripts, which may be amended in future versions.
session-<ID>-transcript.txt contain original quotations in the language spoken by the recorded developers. For non-English sessions, we also provide non-authoritative
session-<ID>-transcript_translated.txt files (following the same is-needed rule for translating the originals). All our analyses, however, are performed on the raw data as reflected in the original transcripts. See file
transcription-notation.txt for details on the special notation we use.