USAILUIGIX: A Framework for Semantically Querying Filmic Media via Ontological Anchoring and RDF-star Knowledge Graphs
Authors/Creators
Description
The exponential growth of audiovisual data has created a critical need for systems that can understand and index content at a semantic level, moving beyond simple metadata tagging. This work presents USAILUIGIX, a comprehensive framework that addresses this challenge by performing deep semantic encoding of filmic media. The system architecture is built to transform a linear, passive video stream into a dynamic, multi-layered, and machine-readable knowledge graph.
The methodology involves a dual-stream analysis pipeline:
-
Visual Stream Analysis: Individual frames are processed by a multimodal AI model (e.g., BLIP) to generate a holistic caption. This caption is then parsed using Natural Language Processing techniques to extract a canonical Subject-Predicate-Object (SVO) triple, representing the frame's core action or state.
-
Auditory Stream Analysis: The film's audio track is transcribed using a robust speech recognition model (e.g., Whisper) to produce time-stamped textual data corresponding to dialogue and significant sound events.
The central innovation of USAILUIGIX lies in its knowledge representation strategy. We leverage RDF-star (RDF)*, a crucial extension of the RDF standard, to model the extracted information. This choice overcomes the well-known limitations of standard RDF reification, allowing us to elegantly and directly annotate semantic triples with essential metadata. For instance, a triple such as <<:man :throws :ball>> can be directly annotated with its source frame (usgx:extractedFrom :frame_001), the natural language caption it was derived from, and a model-generated confidence score. This creates a rich, context-aware knowledge graph where every piece of information is explicitly linked to its origin.
The populated graph can be interrogated using SPARQL-star, enabling sophisticated, content-based queries that are impossible with conventional media analysis tools. This framework serves as a robust proof-of-concept for the symbol grounding problem in a real-world, multimodal context and provides a foundational tool for a new generation of applications in computational film studies, intelligent archival systems, and AI-driven narrative analysis. All components of this research, including the Python source code, the formal OWL ontology, and experimental results, are made available to ensure full transparency and reproducibility.
Files
requirements.txt
Files
(279.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:0e5df7cacbcd00c644fe231ad78d820e
|
240 Bytes | Preview Download |
|
md5:2cce77404da7a0592510e08dbc4339bd
|
7.6 kB | Preview Download |
|
md5:a515ec84a40c8a1319c25219c0700bd0
|
107.9 kB | Preview Download |
|
md5:fb380af0661a396a2448488981cd7c38
|
151.5 kB | Preview Download |
|
md5:04b04f13ec26fef53e52ced25001ac0d
|
12.2 kB | Download |
Additional details
Related works
- Cites
- Preprint: 10.5281/zenodo.15711264 (DOI)