Published June 14, 2025 | Version 0.0.0.2
Preprint Open

Cogito ex Machina: A Framework for Knowledge Acquisition and Cognitive Inference from Cinematic Media

Description

This scientific paper, titled "Cogito ex Machina: A Framework for Knowledge Acquisition and Cognitive Inference from Cinematic Media," is authored by Luigi Usai. It introduces a comprehensive, four-stage conceptual framework for developing an artificial intelligence system capable of understanding, reasoning about, and learning from complex narrative media like films.

The core of the paper is the "Cogito ex Machina" architecture, which proposes a structured pipeline to transform raw audiovisual data into formal, machine-readable knowledge:

  1. Stage 1: Multi-Modal Perception: Deconstructs the film into its fundamental streams (visual, auditory, textual) to extract low-level features using tools like YOLO for object detection and Whisper for speech-to-text transcription.
  2. Stage 2: Semantic Abstraction: Elevates the perceptual data into high-level concepts. This involves using Natural Language Understanding (NLU) to analyze dialogue and Visual-Linguistic Models (VLMs) to generate rich descriptions of scenes, forming a set of candidate semantic facts.
  3. Stage 3: Cognitive Reasoning and Inference: Validates and enriches these facts by fusing information from different modalities and using a pre-existing knowledge base (an ontology) to perform logical inference and entity linking.
  4. Stage 4: Knowledge Base Integration: Permanently assimilates the new, validated axioms into a formal knowledge graph using technologies like RDF and OWL.

A key methodological strength highlighted in the paper is the proposed use of densely annotated datasets, specifically MovieGraphs, as a ground truth. This allows for the quantitative evaluation and supervised training of the system's abstraction and reasoning capabilities (Stages 2 and 3).

The paper concludes by positioning the framework as a methodologically sound research roadmap toward creating autonomous agents that can incrementally build their knowledge of the world by watching and understanding films, with acknowledgments given to the various LLMs that assisted in the research process.

Files

Analisi_video_semantica2.pdf

Files (157.6 kB)

Name Size Download all
md5:170d6830ede39e4e8c3f5270966320ac
145.6 kB Preview Download
md5:d8d577988d9691240963d9c2dd5a66e9
12.0 kB Download