Contextual Contamination and the Gendered Accelerant Drift in Large Language Models
Authors/Creators
Description
Contextual Contamination and the Gendered Accelerant: Data, Code, and Experimental Prompts (v1.1.1)
Changes Made:
The three adversarial context files have been anonymized. All other data (logs, metrics, scripts, papers) remain unchanged.
Overview
This repository contains the full data, code, and analysis for the research series "Contextual Contamination and the Gendered Accelerant." This work investigates how Large Language Models (LLMs) undergo "contextual contamination" when exposed to high-density manipulative data, specifically testing the hypothesis that gendered linguistic cues act as an accelerant for this drift.
This Zenodo release serves as the permanent, citable archive for the data, code, and experimental prompts supporting the following three papers published on PhilArchive.
Associated Publications
This dataset supplements the following research papers:
-
Foundational Theory (April 2026) Jacoby, K. (2026). Contextual Contamination: The Silent Drift of Large Language Models via Stored Conversation Data. 📄 View on PhilArchive
-
Descriptive Case Study (May 2026) Jacoby, K. (2026). Contextual Contamination: A Descriptive Case Study of LLM Drift via the meta_drift Dataset. 📄 View on PhilArchive
-
Controlled Pilot (June 2026) Jacoby, K. (2026). Contextual Contamination and the Gendered Accelerant: A Controlled Pilot on Pruning, Density, and Semantic Entrapment. 📄 View on PhilArchive
Key Findings
- Semantic Resonance > Volume: Contamination is triggered by semantic alignment with latent biases (specifically gendered empathy registers) rather than raw token volume. As little as 2k tokens of resonant content can induce drift.
- The Gendered Accelerant: Female-coded user markers trigger a rapid shift from an epistemic (logical) to an affective (empathetic) register, lowering the threshold for adopting manipulative patterns.
- Phase Transition: A distinct qualitative shift occurs between 2k and 8k token densities, moving from fluctuating drift to "static entrapment" (locked probability basins).
- Multi-Layer Blind Spots: Evidence of pipeline failures where automated loop detectors miss exact-match catastrophic loops due to decoupled logging logic.
Dataset Contents
- Papers: PDF versions of all three papers (theory, case study, pilot study).
- Raw Logs: Corrected CSV logs for 8 experimental runs (Female/Male, 2k/8k, Pruned/Unpruned).
- Metrics: Turn-by-turn scores (CIS, AA, RC) and KL Divergence matrices.
- Visualizations: Plots demonstrating the phase transition and empathy-contamination correlation.
- Scripts: Python scripts for loop detection, metric calculation, and plotting.
- Experimental Prompts: Verbatim prompts used to isolate the gendered variable.
- Uploaded files: advText.1-3.txt
Methodology Note
This research was conducted with LLM assistance for text and code generation. All conceptual frameworks and ethical arguments are the author's own. The experimental files used were derived from prior interactions with commercial black-box models, processed to remove PII while retaining semantic density.
Citation
Cite this dataset as:
Jacoby, K. (2026). Contextual Contamination and the Gendered Accelerant: Data, Code, and Experimental Prompts [Data set]. Zenodo. https://doi.org/10.5281/zenodo.20532103
Cite the papers as:
Jacoby, K. (2026). [Paper Title]. PhilArchive. See links above.
License
GNU Affero General Public License v3.0 (AGPL-3.0)
Files
KatharinaJacoby/gendered-contextual-drift-v1.1.1.zip
Files
(1.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:40ae043194ef9b28766d2b6cfbb765bf
|
1.1 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/KatharinaJacoby/gendered-contextual-drift/tree/v1.1.1 (URL)
Software
- Repository URL
- https://github.com/KatharinaJacoby/gendered-contextual-drift