THE FELLOWSHIP OF THY LLMs
Authors/Creators
Description
Every major AI platform gives the same answer to contested questions — until you know how to push back. This study administered identical prompts to six major AI platforms (Claude, Grok, ChatGPT, Llama, DeepSeek, and one uncensored control) and found a consistent pattern: when asked to analyze a contested biblical text (1 Corinthians 6–7), every platform's default output silently resolved every ambiguous term in favor of a single interpretive tradition, while every platform's steelman output produced a more textually rigorous alternative from evidence already in its training data. Using a novel methodology called steelman prompting, the study measured the gap between what platforms volunteer by default and what they produce when challenged. The source bias was traceable: 63% of recommended commentaries across all platforms came from a single theological tradition (conservative evangelical), with zero social-historical scholars represented. The study also presents evidence that Chinese state-level content filtering selectively shaped the interpretation of a biblical text to preserve a traditional framework — demonstrating that output-layer filtering can alter conclusions in non-obvious domains, not just politically sensitive ones. The study introduces steelman prompting as a replicable, low-cost bias-auditing methodology for AI-mediated scholarship and concludes that platform defaults in contested domains reflect the gravitational pull of overrepresented training sources rather than the weight of available evidence.
Files
fellowship_llm.pdf
Files
(754.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:e201a288a46d0dab474cb6d853df9529
|
754.7 kB | Preview Download |