Published June 12, 2025 | Version 1
Preprint Open

SF-LM: A Neuro-Symbolic Language Model with Proto-Language Abstractions for Efficient and Faithful Text Generation

Description

Title

SF-LM: A Neuro-Symbolic Language Model with Proto-Language Abstractions for Efficient and Faithful Text Generation

Authors

Usai, Luigi

Description

This work introduces the Semantic-First Language Model (SF-LM), a novel neuro-symbolic architecture designed to address the prohibitive computational costs and lack of interpretability in current monolithic Large Language Models (LLMs). Inspired by cognitive models of language processing and the "telegraphic" stage of child language acquisition, SF-LM decouples semantic understanding from syntactic generation.

The model operates in a two-stage pipeline:

  1. Core Semantic Parser (Msem) first translates input text into a structured, explicit Intermediate Semantic Representation (ISR), or "Proto-Language." This symbolic representation captures the core meaning of a sentence using thematic roles (e.g., agent:cat action:lick patient:ice-cream mod:fluffy).

  2. A lightweight Syntactic Realizer (Msyn) then converts this ISR into a grammatically fluent and complete sentence.

We present the formal definition of the ISR using a BNF grammar and provide empirical evidence from summarization and text simplification tasks. Our results demonstrate that SF-LM achieves a superior trade-off between performance, efficiency, and faithfulness compared to a monolithic T5-Base baseline.

Key Findings:

  • Efficiency: SF-LM reduces model parameters and inference FLOPs by nearly 45%.

  • Faithfulness: The modular design, constrained by the explicit ISR, significantly reduces hallucinations and improves factual consistency, achieving a human-evaluated faithfulness score of 4.6/5.0 compared to the baseline's 3.9/5.0.

  • Performance: The model maintains comparable quality on standard NLP metrics like ROUGE-L and BLEU, with only a marginal drop in performance.

This research demonstrates that a modular, semantic-first approach offers a promising path toward more efficient, controllable, and interpretable language models.

This record may contain the research paper, the source code for the SF-LM model, and the WikiProto-1M dataset created for training and evaluation.

Keywords

Natural Language Processing, Language Models, Neuro-Symbolic AI, Semantic Parsing, Computational Efficiency, Interpretability, Text Generation, Faithfulness, Proto-Language, T5, Language Generation.

Files

SF-LM A Neuro-Symbolic Language Model with.pdf

Files (149.5 kB)

Name Size Download all
md5:b9baf712b199c72b0972d5c52d3dd1e5
149.5 kB Preview Download