Journal article Open Access

TRACK: A New Method from a Re-examination of Deep Architectures for Head Motion Prediction in 360-degree Videos

Miguel Fabian Romero Rondon; Lucile Sassatelli; Ramon Aparicio-Pardo; Frédéric Precioso

We consider predicting the user's head motion in 360° videos, with 2 modalities only: the past user's positions and the video content (not knowing other users' traces). We make two main contributions. First, we re-examine existing deep-learning approaches for this problem and identify hidden flaws from a thorough root-cause analysis. Second, from the results of this analysis, we design a new proposal establishing state-of-the-art performance.
First, re-assessing the existing methods that use both modalities, we obtain the surprising result that they all perform worse than baselines using the user’s trajectory only. A root-cause analysis of the metrics, datasets and neural architectures shows in particular that (i) the content can inform the prediction for horizons longer than 2 to 3 sec. (existing methods consider shorter horizons), and that (ii) to compete with the baselines, it is necessary to have a recurrent unit dedicated to process the positions, but this is not sufficient.
Second, from a re-examination of the problem supported with the concept of Structural-RNN, we design a new deep neural architecture, named TRACK. TRACK achieves state-of-the-art performance on all considered datasets and prediction horizons, outperforming competitors by up to 20% on focus-type videos and horizons 2-5 seconds.

The entire framework (codes and datasets) is online and received an ACM reproducibility badge

Files (1.7 MB)
Name Size
1.7 MB Download
Views 180
Downloads 18
Data volume 31.3 MB
Unique views 164
Unique downloads 17


Cite as