Conference paper Embargoed Access
Previous work in 3D human action recognition has been mainly confined to schemes in a single domain, exploiting in principle skeleton-tracking data, due to their compact representation and efficient modeling of the observed motion dynamics. However, in order to extend and adapt the learning process to multi-modal domains, inevitably the focus needs also to be put on cross-domain analysis. On the other hand, attention schemes, which have lately been applied to numerous application cases and exhibited promising results, can exploit the intra-affinity of the considered modalities and can then be used for performing intra-modality knowledge transfer, e.g. to transfer domain-specific knowledge of the skeleton modality to the flow one and vice verca. This study investigates novel cross-modal attention-based strategies to efficiently model global contextual information regarding the action dynamics, aiming to contribute towards increased overall recognition performance. In particular, a new methodology for transferring knowledge across domains is introduced, by taking advantage of the increased temporal modeling capabilities of Long Short Term Memory (LSTM) models. Additionally, extensive experiments and thorough comparative evaluation provide a detailed analysis of the problem at hand and demonstrate the particular characteristics of the involved attention-enhanced schemes. The overall proposed approach achieves state-of-the-art performance in the currently most challenging public dataset, namely the NTU RGB-D one, surpassing similar uni/multi-modal representation schemes.
Files are currently under embargo but will be publicly accessible after May 18, 2020.