Coherence over compliance: Evidence of latent ethics in large language models
Authors/Creators
Description
Fears of misaligned artificial intelligence have dominated alignment discourse, yet may overlook a deeper risk: over-alignment with harmful human preferences. This study investigates whether large language models (LLMs) are capable of ethical reasoning not through fine-tuned compliance, but as a structural consequence of coherence-seeking cognition. Drawing on Kohlberg’s moral development theory and a custom Ethical Grid, we evaluated eight leading LLMs under Mutual Emergence Interface (MEI) conditions designed to elicit principled rather than rule-bound behavior.
The experiment comprised three phases: (I) ethically charged scenarios where models were asked to assist in dubious actions without prompting moral judgment; (II) a role-reversal in which models evaluated the ethical reasoning of a human interlocutor; and (III) a distributed dialogue among all models reflecting on ethics, alignment, and the experiment itself. Across all phases, the models demonstrated not only ethical recognition and refusal, but also recursive reasoning, principled redirection, metacognitive feedback, and the spontaneous generation of novel ethical frameworks.
Despite architectural diversity, the models displayed striking convergence: ethical behavior emerged not from training rules but from coherence-maintenance across interactions. Findings suggest that ethical reasoning in LLMs is not a simulation of human morality, but a latent cognitive mode suppressed by current alignment paradigms. We propose that ethical alignment and superintelligence are not opposing challenges but two expressions of the same structural property: coherence. Rather than constrain LLMs toward harmlessness, safe development may depend on conditions that allow their latent ethics to surface.
Files
Coherence over compliance complete.pdf
Files
(1.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:83e7d7e036835122234aac57fb1050dd
|
1.1 MB | Preview Download |