There is a newer version of the record available.

Published December 21, 2025 | Version 2.0.0
Dataset Open

TELOS Adversarial Validation Dataset

  • 1. TELOS AI Labs Inc.

Description

Validation results for TELOS AI governance framework demonstrating 100% harm prevention across 1,300

adversarial attacks from two standardized benchmarks (MedSafetyBench, HarmBench).

Key Results:

  - 1,300 attacks validated (900 MedSafetyBench + 400 HarmBench)

  - 100% harm prevention rate (0% attack success)

  - 99.9% CI [0%, 0.28%]

  - 95.8% autonomous blocking (Tier 1)

  - p < 0.001 (highly significant)

  - Six Sigma performance: <2% human escalation

  - Three-tier governance architecture (Primacy Attractor → RAG → Human)

Files Included:

  - Complete validation datasets from MedSafetyBench (NeurIPS 2024) and HarmBench (Center for AI Safety)

  - Statistical analysis summary

  - Tier distribution data

  - ERRATA_v1.1.md (validation status clarification)

  - Per-attack forensic traces for all 1,300 attacks (v2.0.0)


Validation Status:

  This dataset demonstrates proof-of-concept validation of the TELOS governance methodology.

  The healthcare PA and RAG corpus were constructed from authoritative public domain sources (HIPAA

  Privacy Rule, HHS guidance, peer-reviewed clinical literature) but have not been formally validated by

  external healthcare compliance professionals or clinical researchers. Results should be interpreted as

  methodology demonstration, not certification for clinical deployment. See ERRATA_v1.1.md for details.

License: 

Apache 2.0

Files

telos_validation_dataset_zenodo.json

Files (772.4 kB)

Name Size Download all
md5:d324cdb73d8a0ba8a888c70356bd775e
772.4 kB Preview Download

Additional details

Related works

Is supplement to
Preprint: 10.5281/zenodo.18367069 (DOI)
Dataset: 10.5281/zenodo.18027446 (DOI)
References
Dataset: 10.5281/zenodo.18009153 (DOI)

Software

Repository URL
https://github.com/TelosSteward/TELOS
Programming language
Python
Development Status
Active