Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus: A 200-Question Benchmark

Barciok, Karol

doi:10.5281/zenodo.20490517

Published June 1, 2026 | Version 1.0

Report Open

Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus: A 200-Question Benchmark

Barciok, Karol

This study presents a quantitative evaluation of native Microsoft Copilot Studio operating within the PROTEX behavioural homicide corpus, a structured repository of 285 homicide case files developed for behavioural and criminological research.

A benchmark consisting of 200 manually generated questions was constructed to assess factual retrieval, comparative behavioural reasoning, false-premise rejection, uncertainty preservation, and semantic contamination resistance. Responses were evaluated manually against corpus documentation using predefined scoring criteria.

Across 200 benchmark questions, Microsoft Copilot Studio achieved an accuracy rate of 97.5%, rising to 98.75% when partially correct responses were weighted proportionally. No confirmed hallucinations were observed. False-premise rejection, uncertainty preservation, and semantic contamination resistance each achieved perfect performance within the evaluated benchmark.

The findings are presented as a quantitative extension of an earlier PROTEX migration study examining retrieval stability, epistemic corpus design, and uncertainty preservation in enterprise AI environments. Together, the two studies suggest that corpus design and explicit representation of evidentiary uncertainty may play a significant role in improving retrieval reliability within specialized knowledge systems.

Files

Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus A 200-Question Benchmark.pdf

Files (128.1 kB)

Name	Size	Download all
Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus A 200-Question Benchmark.pdf md5:8396a8ed9e9b2fc60e9a9013331d9be9	128.1 kB	Preview Download

Additional details

Is derived from: Report: 10.5281/zenodo.20431380 (DOI)

Barciok, K. (2025). Epistemic Corpus Design and Retrieval Stability in Enterprise AI: A Case Study of the PROTEX Migration to Microsoft Copilot Studio Under Epistemic Pressure. Zenodo. https://zenodo.org/records/20431380

	All versions	This version
Views	30	30
Downloads	20	20
Data volume	3.1 MB	3.1 MB

Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus A 200-Question Benchmark.pdf

Files (128.1 kB)

Related works

References

Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus: A 200-Question Benchmark

Authors/Creators

Description

Files

Quantitative Evaluation of Native Microsoft Copilot Studio on the PROTEX Behavioural Homicide Corpus A 200-Question Benchmark.pdf

Files (128.1 kB)

Additional details

Related works

References