Journal article Open Access

TRACK: A New Method from a Re-examination of Deep Architectures for Head Motion Prediction in 360-degree Videos

Miguel Fabian Romero Rondon; Lucile Sassatelli; Ramon Aparicio-Pardo; Frédéric Precioso

We consider predicting the user's head motion in 360° videos, with 2 modalities only: the past user's positions and the video content (not knowing other users' traces). We make two main contributions. First, we re-examine existing deep-learning approaches for this problem and identify hidden flaws from a thorough root-cause analysis. Second, from the results of this analysis, we design a new proposal establishing state-of-the-art performance.
First, re-assessing the existing methods that use both modalities, we obtain the surprising result that they all perform worse than baselines using the user’s trajectory only. A root-cause analysis of the metrics, datasets and neural architectures shows in particular that (i) the content can inform the prediction for horizons longer than 2 to 3 sec. (existing methods consider shorter horizons), and that (ii) to compete with the baselines, it is necessary to have a recurrent unit dedicated to process the positions, but this is not sufficient.
Second, from a re-examination of the problem supported with the concept of Structural-RNN, we design a new deep neural architecture, named TRACK. TRACK achieves state-of-the-art performance on all considered datasets and prediction horizons, outperforming competitors by up to 20% on focus-type videos and horizons 2-5 seconds.

The entire framework (codes and datasets) is online and received an ACM reproducibility badge https://gitlab.com/miguelfromeror/head-motion-prediction

Files (1.7 MB)
Name Size
final_TPAMI_2021.pdf
md5:25236be7cbb21f0aabc76b3687dbd645
1.7 MB Download
180
18
views
downloads
Views 180
Downloads 18
Data volume 31.3 MB
Unique views 164
Unique downloads 17

Share

Cite as