Published June 1, 2026 | Version 1.0

Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus: A 200-Question Benchmark

Authors/Creators

Description

This study presents a quantitative evaluation of native Microsoft Copilot Studio operating within the PROTEX behavioural homicide corpus, a structured repository of 285 homicide case files developed for behavioural and criminological research.

A benchmark consisting of 200 manually generated questions was constructed to assess factual retrieval, comparative behavioural reasoning, false-premise rejection, uncertainty preservation, and semantic contamination resistance. Responses were evaluated manually against corpus documentation using predefined scoring criteria.

Across 200 benchmark questions, Microsoft Copilot Studio achieved an accuracy rate of 97.5%, rising to 98.75% when partially correct responses were weighted proportionally. No confirmed hallucinations were observed. False-premise rejection, uncertainty preservation, and semantic contamination resistance each achieved perfect performance within the evaluated benchmark.

The findings are presented as a quantitative extension of an earlier PROTEX migration study examining retrieval stability, epistemic corpus design, and uncertainty preservation in enterprise AI environments. Together, the two studies suggest that corpus design and explicit representation of evidentiary uncertainty may play a significant role in improving retrieval reliability within specialized knowledge systems.

Files

Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus A 200-Question Benchmark.pdf

Additional details

Related works

Is derived from
Report: 10.5281/zenodo.20431380 (DOI)

References

  • Barciok, K. (2025). Epistemic Corpus Design and Retrieval Stability in Enterprise AI: A Case Study of the PROTEX Migration to Microsoft Copilot Studio Under Epistemic Pressure. Zenodo. https://zenodo.org/records/20431380