Published April 7, 2026 | Version v1
Publication Open

A Literature Review Architecture for Active Inference: Scalable Assertion Extraction, Nanopublications, and Citation-Weighted Hypothesis Scoring: A Computational Meta-Analysis of the Active Inference Literature (2000–2026)

  • 1. Active Inference Institute
  • 2. Massachusetts Institute of Technology (MIT)
  • 3. California Institute for Machine Consciousness (CIMC)

Description

No prior automated system tracks hypothesis-level evidence across the full Active Inference and Free Energy Principle (FEP) literature. Manual synthesis cannot keep pace with a field that has grown at a compound annual rate of 20.36% across 2005–2026, and the FEP’s theoretical generality has invited falsifiability critiques that only hypothesis-specific evidence profiling can address. Building on pioneering systematic manual annotation paired with ontology-based anal- ysis at the scale of hundreds of papers, we present a computational meta-analysis framework that automates and scales this approach. The pipeline retrieves literature from arXiv, Semantic Scholar, and OpenAlex, deduplicating 𝑁 = 819 papers via a canonical identifier hierarchy (DOI > arXiv ID > Semantic Scholar ID > OpenAlex ID). It classifies papers into a three-tier taxonomy spanning eight categories: A (Core Theory), B (Tools & Translation), and C (Application Domains). An LLM-powered extraction system then evaluates each abstract against eight core hypotheses, producing structured nanopublications—each encoding directionality, a confidence score, and natural-language reasoning—that populate an RDF-compatible knowledge graph scored by a citation-weighted evidence function.

The resulting evidence landscape reveals a field where application domains (Domain C, 64%) collectively dominate the corpus, with tools development (Domain B, 21%)—including pymdp, RxInfer.jl, and interpretable alternatives such as Free Energy Projective Simulation—and core theory (Domain A, 15%) rounding out the taxonomy. Non- negative matrix factorization identifies 5 latent topics that cross-cut the keyword taxonomy, and citation network analysis exposes a sparse yet structured graph (2,176 intra-corpus edges, 7.4% reference resolution) anchored by pronounced hub papers. Hypothesis scores cluster into three tiers: a broad consensus tier (score > 0.8) spanning six hypotheses—H7 Morphogenesis, H2 AIF Optimality, H4 Predictive Coding, H6 Clinical Utility, H5 Scalability, and H8 Language AIF; a moderate debate tier (H3 Markov Blanket Realism, ≈ 0.78); and a diffuse tier (H1 FEP Universality, ≈ 0.48) where a large neutral plurality reflects the principle’s broad invocation without explicit empirical test—though absolute score magnitudes are inflated by publication bias and linguistic asymmetry in academic writing, making relative rankings and temporal trajectories more reliable than point estimates. By demonstrating that automated LLM-driven assertion extraction can generate scalable, queryable representations of scientific evidence, this work provides a reusable architecture for living literature reviews—continuously updated knowledge graphs that track hypothesis-level consensus across rapidly evolving fields.

All code, results, and methods to reproduce this manuscript are open source at https://github.com/ActiveInferenceInstitute/act_inf_metaanalysis/

Files

act_inf_metaanalysis_v1_04-19-2026.pdf

Files (3.1 MB)

Name Size Download all
md5:a66beffb5115be5a4f53e89f25001c3a
3.1 MB Preview Download

Additional details

Software

Repository URL
https://github.com/ActiveInferenceInstitute/act_inf_metaanalysis
Programming language
Python
Development Status
Active