Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus: A 200-Question Benchmark
Authors/Creators
Description
This study presents a quantitative evaluation of native Microsoft Copilot Studio operating within the PROTEX behavioural homicide corpus, a structured repository of 285 homicide case files developed for behavioural and criminological research.
A benchmark consisting of 200 manually generated questions was constructed to assess factual retrieval, comparative behavioural reasoning, false-premise rejection, uncertainty preservation, and semantic contamination resistance. Responses were evaluated manually against corpus documentation using predefined scoring criteria.
Across 200 benchmark questions, Microsoft Copilot Studio achieved an accuracy rate of 97.5%, rising to 98.75% when partially correct responses were weighted proportionally. No confirmed hallucinations were observed. False-premise rejection, uncertainty preservation, and semantic contamination resistance each achieved perfect performance within the evaluated benchmark.
The findings are presented as a quantitative extension of an earlier PROTEX migration study examining retrieval stability, epistemic corpus design, and uncertainty preservation in enterprise AI environments. Together, the two studies suggest that corpus design and explicit representation of evidentiary uncertainty may play a significant role in improving retrieval reliability within specialized knowledge systems.
Files
Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus A 200-Question Benchmark.pdf
Files
(128.1 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:8396a8ed9e9b2fc60e9a9013331d9be9
|
128.1 kB | Preview Download |
Additional details
Related works
- Is derived from
- Report: 10.5281/zenodo.20431380 (DOI)
References
- Barciok, K. (2025). Epistemic Corpus Design and Retrieval Stability in Enterprise AI: A Case Study of the PROTEX Migration to Microsoft Copilot Studio Under Epistemic Pressure. Zenodo. https://zenodo.org/records/20431380