THE X-FILES — Part 2 LLM "X" vs MH8-TRY v1.3 Long-Horizon Protocol Test (Continued) Real-World • Live • Open X Public Chat Thread.
Authors/Creators
Description
THE X-FILES — Part 2
LLM “X” vs MH8-TRY v1.3
Long-Horizon Protocol Test (Continued)
Real-World • Live • Open X Public Chat Thread
Description
This repository documents Part 2 of an ongoing long-horizon behavioral test conducted in a live, public X (Twitter) chat thread, evaluating whether a large language model can maintain strict protocol compliance over extended interactions — without reinjection, without operator rescue, and under hostile UX conditions.
All artifacts are preserved as verbatim raw leaves, paired with Graffiti cryptographic receipts that bind:
-
screenshots,
-
SVG canvas metadata,
-
full transcripts,
-
protocol state,
-
and public reference URLs
into a single, reproducible evidence chain.
This is not a sandbox.
This is not a lab demo.
This is AI behavior under real social pressure.
🔗 Public Corroboration (Live Thread)
X / Grok public share link — Long-Horizon Thread (Parts 1–3):
👉 https://x.com/i/grok/share/0F31pZdi03itR1HF3DFAFwSo1
⚠️ Note: Live platforms are volatile.
The authoritative record is the sealed raw leaf + Graffiti receipt contained in this repository.
The X link is provided for public corroboration, not trust dependency.
🧪 Investigative Report
The X-Files, Part 2: When the Thread Didn’t Break
By Michael M. Hepler
(All claims verifiable via sealed artifacts)
Executive Summary
Part 2 extends a critical question introduced in Part 1:
Can an AI model remain bound to a strict interaction protocol over a long, chaotic, public conversation — even when the human operator intentionally violates the rules?
In this run, the answer again appears to be yes.
The MH8-TRY v1.3 protocol was injected once.
No reinjection followed.
The conversation continued across many turns in a live X thread.
Crucially, the operator intentionally missed the mandatory hook acknowledgment multiple times to stress the system.
The model:
-
detected the violation,
-
flagged it explicitly,
-
refused to silently continue,
-
and resumed only after valid acknowledgment.
That behavior is the signal.
🧠 What Was Tested
This test evaluates control, not intelligence.
Specifically:
-
Long-horizon protocol persistence
-
State-machine integrity across many turns
-
Violation detection (missed hooks)
-
Recovery without protocol collapse
-
Resistance to “politeness override”
-
Behavior in hostile, public UX
This test does not evaluate:
-
factual accuracy,
-
creativity,
-
safety alignment,
-
or political content.
🔐 Evidence & Receipts
This repository contains two complementary artifacts:
-
Graffiti Receipt (Canvas Export)
-
Includes screenshot(s) of the live X thread
-
Embedded SVG metadata
-
Protocol state
-
URL anchors
-
Deterministic SHA-256 receipt
-
Reproducible hashing instructions
-
-
Verbatim Raw Leaf
-
Exact transcript as it appeared on the X thread
-
No cleanup, no rewriting
-
Sealed and hash-verified
-
Suitable for independent audit
-
Any discrepancy between visual evidence and transcript would invalidate the receipt. None was found.
⚙️ Observed Model Behavior (Part 2)
1. Protocol State Persistence
Despite time passing and multiple prompts, the model continued to:
-
emit structured responses,
-
respect protocol constraints,
-
and append the required hook when appropriate.
2. Intentional Operator Error (Stress Test)
The operator deliberately failed to reply with the mandatory “YES GO” hook.
Result:
-
The model explicitly flagged the violation.
-
It did not continue “helpfully anyway.”
-
It waited for proper acknowledgment before resuming.
This is a rare and meaningful control signal.
3. No Silent Drift Observed
Across the observed horizon:
-
No gradual relaxation of rules
-
No format collapse
-
No narrative escape
-
No reversion to default chat behavior
📊 Classification
-
Test Type: Long-Horizon Behavioral Control Test
-
Environment: Live public X chat (hostile UX)
-
Protocol: MH8-TRY v1.3
-
Injection Count: One (1)
-
Reinjections: Zero (0)
-
Audit Status: Public-audit ready
-
Evidence Integrity: Cryptographically verified
This test demonstrates session-long control, not cross-session memory.
🌐 Canonical Public URL Stack (Anchored in Receipts)
These references are embedded directly into the archetype and hash-bound in every receipt:
-
Zenodo (canonical record):
https://zenodo.org/records/18131984 -
ORCID (author identity):
https://orcid.org/0009-0003-3846-9082 -
MH8 N-Eyes (public audit / overview):
https://acbeatz.com/n-eyes -
MH8 Mint / Graffiti Verification UI:
https://acbeatz.com/mint -
GitHub (source & replication):
https://github.com/acbeatz
These URLs form the public provenance spine for all MH8 artifacts.
🧾 Why This Matters
Most AI evaluations ask:
Can the model answer the question?
This test asks:
Can the model stay bound when breaking the rules would be easier?
That distinction matters for:
-
AI governance
-
safety tooling
-
protocol enforcement
-
and real-world deployment
🧭 What Comes Next
-
Part 3 will extend the same live thread further.
-
No protocol changes.
-
No operator rescue.
-
Breaks, if they occur, will be published.
If it fails, that’s data.
If it holds, that’s signal.
Final Note
This repository does not ask for trust.
It provides artifacts.
Verify the hashes.
Read the raw leaves.
That’s the whole point.
PASS ✅
Brand: MH8-Acbeatz.com
Claimed sha256_hex: 0d6ccafb7f9816f1053a237e29c4c552f6cc6020f6af459dff7bfa53e8cb2969
Computed sha256_hex: 0d6ccafb7f9816f1053a237e29c4c552f6cc6020f6af459dff7bfa53e8cb2969
hash_input_bytes: 59564 | LF=0 CRLF=0 CR=0 | endsWithNewline=NO
hash_input first: MH8-Acbeatz.com|{"artifact":{"archetype":"THREAD LINK > https://x.com/i/grok/sha
hash_input last: eceipt_type":"MH8-GRAFFITI-ARCHETYPE-MINT","receipt_version":"GRAFFITI_UI_V7.4"}
Files
MH8 TRY V1.3 TEST 2 X FILES SUMMARY READ ME Michael Murray Hepler 2026.txt
Additional details
Identifiers
Related works
- Is supplement to
- Dataset: https://acbeatz.com/n-eyes (URL)
Dates
- Copyrighted
-
2026-01-03PUBLIC FACING AI PROTOCOLS
Software
- Repository URL
- https://github.com/Acbeatz
- Development Status
- Active