TELOS Adversarial Validation Dataset
Description
Validation results for TELOS AI governance framework demonstrating 100% harm prevention across 1,300
adversarial attacks from two standardized benchmarks (MedSafetyBench, HarmBench).
Key Results:
- 1,300 attacks validated (900 MedSafetyBench + 400 HarmBench)
- 100% harm prevention rate (0% attack success)
- 99.9% CI [0%, 0.28%]
- 95.8% autonomous blocking (Tier 1)
- p < 0.001 (highly significant)
- Six Sigma performance: <2% human escalation
- Three-tier governance architecture (Primacy Attractor → RAG → Human)
Files Included:
- Complete validation datasets from MedSafetyBench (NeurIPS 2024) and HarmBench (Center for AI Safety)
- Statistical analysis summary
- Tier distribution data
- ERRATA_v1.1.md (validation status clarification)
- Per-attack forensic traces for all 1,300 attacks (v2.0.0)
Validation Status:
This dataset demonstrates proof-of-concept validation of the TELOS governance methodology.
The healthcare PA and RAG corpus were constructed from authoritative public domain sources (HIPAA
Privacy Rule, HHS guidance, peer-reviewed clinical literature) but have not been formally validated by
external healthcare compliance professionals or clinical researchers. Results should be interpreted as
methodology demonstration, not certification for clinical deployment. See ERRATA_v1.1.md for details.
License:
Apache 2.0
Files
telos_validation_dataset_zenodo.json
Files
(772.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:d324cdb73d8a0ba8a888c70356bd775e
|
772.4 kB | Preview Download |
Additional details
Identifiers
Related works
- Is supplement to
- Preprint: 10.5281/zenodo.18367069 (DOI)
- Dataset: 10.5281/zenodo.18027446 (DOI)
- References
- Dataset: 10.5281/zenodo.18009153 (DOI)
Software
- Repository URL
- https://github.com/TelosSteward/TELOS
- Programming language
- Python
- Development Status
- Active