Published June 22, 2025 | Version v2
Preprint Open

USAILUIGIX: A Framework for Semantically Querying Filmic Media via Ontological Anchoring and RDF-star Knowledge Graphs

Description

The exponential growth of audiovisual data has created a critical need for systems that can understand and index content at a semantic level, moving beyond simple metadata tagging. This work presents USAILUIGIX, a comprehensive framework that addresses this challenge by performing deep semantic encoding of filmic media. The system architecture is built to transform a linear, passive video stream into a dynamic, multi-layered, and machine-readable knowledge graph.

The methodology involves a dual-stream analysis pipeline:

  1. Visual Stream Analysis: Individual frames are processed by a multimodal AI model (e.g., BLIP) to generate a holistic caption. This caption is then parsed using Natural Language Processing techniques to extract a canonical Subject-Predicate-Object (SVO) triple, representing the frame's core action or state.

  2. Auditory Stream Analysis: The film's audio track is transcribed using a robust speech recognition model (e.g., Whisper) to produce time-stamped textual data corresponding to dialogue and significant sound events.

The central innovation of USAILUIGIX lies in its knowledge representation strategy. We leverage RDF-star (RDF)*, a crucial extension of the RDF standard, to model the extracted information. This choice overcomes the well-known limitations of standard RDF reification, allowing us to elegantly and directly annotate semantic triples with essential metadata. For instance, a triple such as <<:man :throws :ball>> can be directly annotated with its source frame (usgx:extractedFrom :frame_001), the natural language caption it was derived from, and a model-generated confidence score. This creates a rich, context-aware knowledge graph where every piece of information is explicitly linked to its origin.

The populated graph can be interrogated using SPARQL-star, enabling sophisticated, content-based queries that are impossible with conventional media analysis tools. This framework serves as a robust proof-of-concept for the symbol grounding problem in a real-world, multimodal context and provides a foundational tool for a new generation of applications in computational film studies, intelligent archival systems, and AI-driven narrative analysis. All components of this research, including the Python source code, the formal OWL ontology, and experimental results, are made available to ensure full transparency and reproducibility.

Files

requirements.txt

Files (279.4 kB)

Name Size Download all
md5:0e5df7cacbcd00c644fe231ad78d820e
240 Bytes Preview Download
md5:2cce77404da7a0592510e08dbc4339bd
7.6 kB Preview Download
md5:a515ec84a40c8a1319c25219c0700bd0
107.9 kB Preview Download
md5:fb380af0661a396a2448488981cd7c38
151.5 kB Preview Download
md5:04b04f13ec26fef53e52ced25001ac0d
12.2 kB Download

Additional details

Related works

Cites
Preprint: 10.5281/zenodo.15711264 (DOI)