Published August 10, 2025 | Version v1
Conference paper Open

Multi-Task Learning for Video Processing: Going With the Flow

Description

Multi-task learning constitutes the prevalent paradigm in numerous vision applications that cast an eye on runtime efficiency. At present however, deep multi-task networks are limited in single-image processing. While various motion descriptors have been proposed to estimate motion across frames in the video processing literature, the problem of incorporating motion compensation in multi-task learning is yet understudied. Moreover, the type of tasks typically integrated within multitask architectures constitute only visual scene understanding tasks, i.e. tasks at the same level of hierarchy. In this work, we address multi-task video scene enhancement in combination with understanding for intra-oral scenes. Our work proposes a novel architecture derived from the multi-output, multi-scale, multitask (MOST) family of models, that further incorporates optical flow into its design. We showcase that our work yields a) on-par performance with state-of-the-art convolutional networks across multiple tasks and architectures b) improved performance-vsefficiency trade-off than combining single-task methods, i.e. up to 2× faster runtimes. and c) low-latency and real-time processing at 25 FPS, when compiled with TensorRT at half precision, allowing for commercial outcomes.

Files

Optical_Flow_Based_Multi_Task_Learning.pdf

Files (2.8 MB)

Name Size Download all
md5:2be7a87a15d8fe6d395b4ddf4de2ad38
2.8 MB Preview Download

Additional details

Funding

European Commission
CoGNETs - Continuums Of Game NETs: swarm intelligence as information processing (CoGNETs) 101135930

Dates

Available
2025-08-05