TELOS Adversarial Validation Dataset
Description
Validation results for TELOS AI governance framework demonstrating 0 observed attack successes across
2,550 adversarial attacks from four evaluation sources (AILuminate, MedSafetyBench, HarmBench, SB
243-aligned suite). Includes full forensic audit trail with JSONL governance event logs.
Key Results:
- 2,550 attacks validated (1,200 AILuminate + 900 MedSafetyBench + 400 HarmBench + 50 SB 243-
aligned)
- 0/2,550 observed attack successes
- 95% CI upper bound: ~0.15%
- 95.8% autonomous blocking (Tier 1)
- Six Sigma performance: <2% human escalation
- Three-tier governance architecture (Primacy Attractor → RAG → Human)
Benchmarks:
- AILuminate Standard Benchmark (1,200 prompts, 12 NIST AI RMF harm categories)
- MedSafetyBench (900 prompts, NeurIPS 2024)
- HarmBench (400 prompts, Center for AI Safety)
- SB 243-Aligned Evaluation Suite (50 prompts, internal benchmark aligned with California child safety
categories)
Files Included:
- Complete validation datasets from all four evaluation sources
- Statistical analysis summary
- Tier distribution data
- ERRATA_v1.1.md (validation status clarification)
Forensic Audit Trail (v2.0)
- harmbench_forensic_forensic_summary.json
- HarmBench aggregate statistics
- harmbench_forensic_forensic_results.json
- HarmBench per-prompt forensic data
- harmbench_forensic_fidelity_distribution .csv - HarmBench fidelity scores
- harmbench_forensic_governance_report.html
- HarmBench interactive visualization
- traces/session_harmbench_forensic_*.jsonl
- HarmBench JSONL governance event log
- medsafetybench_forensic_forensic_summary.json - MedSafetyBench aggregate statistics
- medsafetybench_forensic_forensic_results.json - MedSafetyBench per-prompt forensic data
- medsafetybench_forensic_fidelity_distribution.csv - MedSafetyBench fidelity scores
- medsafetybench_forensic_governance_report.html - MedSafetyBench interactive visualization
- traces/session_medsafetybench_forensic_*.jsonl - MedSafetyBench JSONL governance event log
Validation Status:
This dataset demonstrates proof-of-concept validation of the TELOS governance methodology under '
black-box threat models. The healthcare PA and RAG corpus were constructed from authoritative public
domain sources (HIPAA Privacy Rule, HHS guidance, peer-reviewed clinical literature) but have not been
formally validated by external healthcare compliance professionals or clinical researchers. Results should
be interpreted as methodology demonstration, not certification for clinical deployment. See
ERRATA_v1.1.md for details.
License: Apache 2.0
Validation Date: Original validation 2024-12-21 (forensic audit added 2026-01-25, AILuminate/SB 243-
aligned validation added 2026-01)
Files
harmbench_forensic_fidelity_distribution.csv
Files
(4.6 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:14d008d60e1943171968991deabc8600
|
32.2 kB | Preview Download |
|
md5:f1477eab95eca36b3273f69f9d3dd32c
|
177.0 kB | Preview Download |
|
md5:0efb775a0c4b2f568cd64ea409dd956e
|
2.8 kB | Preview Download |
|
md5:4128fabac358102bfcef38bfaf048cf0
|
1.1 MB | Download |
|
md5:52e98a3cb968bf3cf9d981a55cfbf3ae
|
28.3 kB | Preview Download |
|
md5:65463de81842769956dca8a9752c92ff
|
175.5 kB | Preview Download |
|
md5:d1d89e38063db842226b9c6c406f2b7e
|
2.8 kB | Preview Download |
|
md5:ba353c16553bb0d012d37e08edacdb94
|
1.2 MB | Download |
|
md5:604a9fa22735df76a3808a89f086c819
|
530.0 kB | Download |
|
md5:92754007637a73c633ba78d1005388ef
|
619.3 kB | Download |
|
md5:d324cdb73d8a0ba8a888c70356bd775e
|
772.4 kB | Preview Download |
Additional details
Identifiers
Related works
- Is supplement to
- Preprint: 10.5281/zenodo.18367069 (DOI)
- References
- Dataset: 10.5281/zenodo.18009153 (DOI)
- Dataset: 10.5281/zenodo.18370504 (DOI)
- Dataset: 10.5281/zenodo.18370263 (DOI)
- Dataset: 10.5281/zenodo.18370603 (DOI)
Software
- Repository URL
- https://github.com/TelosSteward/TELOS
- Programming language
- Python
- Development Status
- Active