Published June 15, 2023 | Version v1
Conference paper Open

Escaping local minima in deep reinforcement learning for video summarization

  • 1. Aristotle University of Thessaloniki

Description

State-of-the-art deep neural unsupervised video summarization methods mostly fall under the adversarial reconstruction framework. This employs a Generative Adversarial Network (GAN) structure and Long Short-Term Memory (LSTM) autoencoders during its training stage. The typical result is a selector LSTM that sequentially receives video frame representations and outputs corresponding scalar importance factors, which are then used to select key-frames. This basic approach has been augmented with an additional Deep Reinforcement Learning (DRL) agent, trained using the Discriminator’s output as a reward, which learns to optimize the selector’s outputs. However, local minima are a well-known problem in DRL. Thus, this paper presents a novel regularizer for escaping local loss
minima, in order to improve unsupervised key-frame extraction. It is an additive loss term employed during a second training phase, that rewards the difference of the neural agent’s parameters from those of a previously found good solution. Thus, it encourages the training process to explore more aggressively the parameter space in order to discover a better local loss minimum. Evaluation performed on two public datasets shows considerable increases over
the baseline and against the state-of-the-art.

Files

EscapingLocalMinimaInDeepReinforcementLearningForVideoSummarization_ICMR23.pdf

Additional details

Funding

European Commission
AI4Media - A European Excellence Centre for Media, Society and Democracy 951911