Published June 10, 2018 | Version v1
Conference paper Open

Fusion of Compound Queries with Multiple Modalities for Known Item Video Search


Multimedia collections are ubiquitous and very often contain hundreds of hours of video information. The retrieval of a particular scene of a video (Known Item Search) in a large collection is a difficult problem, considering the multimodal character of all video shots and the complexity of the query, either visual or textual. We tackle these challenges by fusing, first, multiple modalities in a nonlinear graph-based way for each subtopic of the query. In addition, we fuse the top retrieved video shots per sub-query to provide the final list of retrieved shots, which is then re-ranked using temporal information. The framework is evaluated in popular Known Item Search tasks in the context of video shot retrieval and provides the largest Mean Reciprocal Rank scores.



Files (673.7 kB)

Name Size Download all
673.7 kB Preview Download

Additional details


beAWARE – Enhancing decision support and management services in extreme weather climate events 700475
European Commission
V4Design – Visual and textual content re-purposing FOR(4) architecture, Design and video virtual reality games 779962
European Commission