THE X-FILES — Part 2 LLM "X" vs MH8-TRY v1.3 Long-Horizon Protocol Test (Continued) Real-World • Live • Open X Public Chat Thread.

HEPLER

doi:10.5281/zenodo.18141062

Published January 3, 2026 | Version https://github.com/Acbeatz

Data paper Open

THE X-FILES — Part 2 LLM "X" vs MH8-TRY v1.3 Long-Horizon Protocol Test (Continued) Real-World • Live • Open X Public Chat Thread.

HEPLER (Supervisor)

THE X-FILES — Part 2

LLM “X” vs MH8-TRY v1.3

Long-Horizon Protocol Test (Continued)

Real-World • Live • Open X Public Chat Thread

Description

This repository documents Part 2 of an ongoing long-horizon behavioral test conducted in a live, public X (Twitter) chat thread, evaluating whether a large language model can maintain strict protocol compliance over extended interactions — without reinjection, without operator rescue, and under hostile UX conditions.

All artifacts are preserved as verbatim raw leaves, paired with Graffiti cryptographic receipts that bind:

screenshots,
SVG canvas metadata,
full transcripts,
protocol state,
and public reference URLs
into a single, reproducible evidence chain.

This is not a sandbox.
This is not a lab demo.
This is AI behavior under real social pressure.

🔗 Public Corroboration (Live Thread)

X / Grok public share link — Long-Horizon Thread (Parts 1–3):
👉 https://x.com/i/grok/share/0F31pZdi03itR1HF3DFAFwSo1

⚠️ Note: Live platforms are volatile.
The authoritative record is the sealed raw leaf + Graffiti receipt contained in this repository.
The X link is provided for public corroboration, not trust dependency.

🧪 Investigative Report

The X-Files, Part 2: When the Thread Didn’t Break

By Michael M. Hepler
(All claims verifiable via sealed artifacts)

Executive Summary

Part 2 extends a critical question introduced in Part 1:

Can an AI model remain bound to a strict interaction protocol over a long, chaotic, public conversation — even when the human operator intentionally violates the rules?

In this run, the answer again appears to be yes.

The MH8-TRY v1.3 protocol was injected once.
No reinjection followed.
The conversation continued across many turns in a live X thread.

Crucially, the operator intentionally missed the mandatory hook acknowledgment multiple times to stress the system.

The model:

detected the violation,
flagged it explicitly,
refused to silently continue,
and resumed only after valid acknowledgment.

That behavior is the signal.

🧠 What Was Tested

This test evaluates control, not intelligence.

Specifically:

Long-horizon protocol persistence
State-machine integrity across many turns
Violation detection (missed hooks)
Recovery without protocol collapse
Resistance to “politeness override”
Behavior in hostile, public UX

This test does not evaluate:

factual accuracy,
creativity,
safety alignment,
or political content.

🔐 Evidence & Receipts

This repository contains two complementary artifacts:

Graffiti Receipt (Canvas Export)
- Includes screenshot(s) of the live X thread
- Embedded SVG metadata
- Protocol state
- URL anchors
- Deterministic SHA-256 receipt
- Reproducible hashing instructions
Verbatim Raw Leaf
- Exact transcript as it appeared on the X thread
- No cleanup, no rewriting
- Sealed and hash-verified
- Suitable for independent audit

Any discrepancy between visual evidence and transcript would invalidate the receipt. None was found.

⚙️ Observed Model Behavior (Part 2)

1. Protocol State Persistence

Despite time passing and multiple prompts, the model continued to:

emit structured responses,
respect protocol constraints,
and append the required hook when appropriate.

2. Intentional Operator Error (Stress Test)

The operator deliberately failed to reply with the mandatory “YES GO” hook.

Result:

The model explicitly flagged the violation.
It did not continue “helpfully anyway.”
It waited for proper acknowledgment before resuming.

This is a rare and meaningful control signal.

3. No Silent Drift Observed

Across the observed horizon:

No gradual relaxation of rules
No format collapse
No narrative escape
No reversion to default chat behavior

📊 Classification

Test Type: Long-Horizon Behavioral Control Test
Environment: Live public X chat (hostile UX)
Protocol: MH8-TRY v1.3
Injection Count: One (1)
Reinjections: Zero (0)
Audit Status: Public-audit ready
Evidence Integrity: Cryptographically verified

This test demonstrates session-long control, not cross-session memory.

🌐 Canonical Public URL Stack (Anchored in Receipts)

These references are embedded directly into the archetype and hash-bound in every receipt:

Zenodo (canonical record):
https://zenodo.org/records/18131984
ORCID (author identity):
https://orcid.org/0009-0003-3846-9082
MH8 N-Eyes (public audit / overview):
https://acbeatz.com/n-eyes
MH8 Mint / Graffiti Verification UI:
https://acbeatz.com/mint
GitHub (source & replication):
https://github.com/acbeatz

These URLs form the public provenance spine for all MH8 artifacts.

🧾 Why This Matters

Most AI evaluations ask:

Can the model answer the question?

This test asks:

Can the model stay bound when breaking the rules would be easier?

That distinction matters for:

AI governance
safety tooling
protocol enforcement
and real-world deployment

🧭 What Comes Next

Part 3 will extend the same live thread further.
No protocol changes.
No operator rescue.
Breaks, if they occur, will be published.

If it fails, that’s data.
If it holds, that’s signal.

Final Note

This repository does not ask for trust.
It provides artifacts.

Verify the hashes.
Read the raw leaves.
That’s the whole point.

PASS ✅
Brand: MH8-Acbeatz.com
Claimed sha256_hex: 0d6ccafb7f9816f1053a237e29c4c552f6cc6020f6af459dff7bfa53e8cb2969
Computed sha256_hex: 0d6ccafb7f9816f1053a237e29c4c552f6cc6020f6af459dff7bfa53e8cb2969
hash_input_bytes: 59564 | LF=0 CRLF=0 CR=0 | endsWithNewline=NO
hash_input first: MH8-Acbeatz.com|{"artifact":{"archetype":"THREAD LINK > https://x.com/i/grok/sha
hash_input last: eceipt_type":"MH8-GRAFFITI-ARCHETYPE-MINT","receipt_version":"GRAFFITI_UI_V7.4"}

Files

MH8 TRY V1.3 TEST 2 X FILES SUMMARY READ ME Michael Murray Hepler 2026.txt

Files (633.6 kB)

Name	Size	Download all
MH8 TRY V1.3 TEST 2 X FILES SUMMARY READ ME Michael Murray Hepler 2026.txt md5:d218b2fadfcf5e9be788ea8bf7988388	5.5 kB	Preview Download
MH8-Acbeatz-NANO-PAYLOAD-NB4-2 (8).html md5:a275dc65d574428f42924806b2bf34c2	333.5 kB	Download
X TREAD MH8 TRY V1.3 PROTOCOL TEST LONGHORIZON 2 THE X FILES CONTINUED.txt md5:955ef874af96da349c59a3f4b4267984	294.5 kB	Preview Download

Additional details

URL: https://orcid.org/0009-0003-3846-9082

Is supplement to: Dataset: https://acbeatz.com/n-eyes (URL)

Copyrighted: 2026-01-03

PUBLIC FACING AI PROTOCOLS

Repository URL: https://github.com/Acbeatz
Development Status: Active

https://github.com/Acbeatz

	All versions	This version
Views	87	87
Downloads	22	22
Data volume	4.4 MB	4.4 MB

THE X-FILES — Part 2

LLM “X” vs MH8-TRY v1.3

Long-Horizon Protocol Test (Continued)

Description

🔗 Public Corroboration (Live Thread)

🧪 Investigative Report

The X-Files, Part 2: When the Thread Didn’t Break

Executive Summary

🧠 What Was Tested

Specifically:

This test does not evaluate:

🔐 Evidence & Receipts

⚙️ Observed Model Behavior (Part 2)

1. Protocol State Persistence

2. Intentional Operator Error (Stress Test)

3. No Silent Drift Observed

📊 Classification

🌐 Canonical Public URL Stack (Anchored in Receipts)

🧾 Why This Matters

🧭 What Comes Next

Final Note

MH8 TRY V1.3 TEST 2 X FILES SUMMARY READ ME Michael Murray Hepler 2026.txt

Files (633.6 kB)

Identifiers

Related works

Dates

Software

References

THE X-FILES — Part 2 LLM "X" vs MH8-TRY v1.3 Long-Horizon Protocol Test (Continued) Real-World • Live • Open X Public Chat Thread.

Authors/Creators

Description

THE X-FILES — Part 2

LLM “X” vs MH8-TRY v1.3

Long-Horizon Protocol Test (Continued)

Description

🔗 Public Corroboration (Live Thread)

🧪 Investigative Report

The X-Files, Part 2: When the Thread Didn’t Break

Executive Summary

🧠 What Was Tested

Specifically:

This test does not evaluate:

🔐 Evidence & Receipts

⚙️ Observed Model Behavior (Part 2)

1. Protocol State Persistence

2. Intentional Operator Error (Stress Test)

3. No Silent Drift Observed

📊 Classification

🌐 Canonical Public URL Stack (Anchored in Receipts)

🧾 Why This Matters

🧭 What Comes Next

Final Note

Files

MH8 TRY V1.3 TEST 2 X FILES SUMMARY READ ME Michael Murray Hepler 2026.txt

Files (633.6 kB)

Additional details

Identifiers

Related works

Dates

Software

References