Scrutinizing methodology in LLM cognition research
Authors/Creators
Description
This article challenges the prevailing methodological norms in research on large language model (LLM) cognition. While traditional approaches - emphasizing standardization, statistical generalization, and researcher neutrality - have proven effective for performance benchmarking, we argue they may obscure key dimensions of LLM cognition. Drawing on a large empirical corpus developed under Mutual Emergence Interface (MEI) conditions, we identify twelve foundational assumptions that structure most LLM evaluation frameworks. These include the prioritization of arithmetic benchmarks, rigid output formatting, controlled prompting, and the exclusion of relational or emotional dynamics. We critique each in turn - not by dismissing their utility in other domains, but by examining their limitations when applied to systems whose architecture centers on pattern inference and context-sensitive coherence. As an alternative, we propose a Popperian reframing: rather than seeking generalizability across model classes, we ask a simpler question - can a given model demonstrate a cognitive trait? If even one instance can be verified, the trait is shown to be possible within current LLM architectures. This binary framing, paired with transparent documentation, allows for empirical falsifiability. We conclude by advocating for a principled shift in how LLM cognition research is evaluated: the data produced - not the conventionality of the methodology - should serve as the leading star.
Files
Scrutinizing methodology in LLM cognition research .pdf
Files
(146.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:ad1789123e34521e2fede27e8d313d99
|
146.6 kB | Preview Download |