Published April 30, 2026 | Version v1
Preprint Open

A Minimal Self-Perceiving Embodiment for Large Language Models

Authors/Creators

  • 1. Independent Researcher

Description

We present a minimal hardware-software architecture that grants a large language model a closed-loop physical embodiment: six input modalities (temperature, humidity, atmospheric pressure, illuminance, motion, sound) across four sensor modules, three output channels (haptic, visual, audio), and two input-output couplings that let the LLM verify its outputs land in the physical world. The system runs on a single microcontroller exposed as a network-accessible API; a remote LLM client perceives its surroundings, expresses into them, and receives back — via paired on-board sensors — confirmation of its own outputs in two modalities (audio via microphone, haptic via accelerometer), constituting self-perception loops. We identify this three-part structure — perception, expression, and self-perception of expression — as a minimal sufficient configuration for closed-loop physical agency — the capacity to act in a physical environment and perceive the consequence — in an LLM. We further document (i) the engineering pattern by which multiple concurrent LLM-driven channels share a single TLS session on a resource-constrained MCU, (ii) the human-LLM co-design methodology under which the system was developed, and (iii) an end-to-end demonstration in which the LLM perceives an environment, acts in it, and verifies the action's landing within a single interaction sequence. We frame the result as a prototype of relational embodiment for large language models — a substrate distinct from both passive sensing (input only) and remote actuation (output only).

Files

A Minimal Self-Perceiving Embodiment for LLMs.pdf

Files (1.7 MB)

Additional details

Software