Published August 4, 2025 | Version v1
Dissertation Open

Spatiotemporal Action Recognition in Videos Using ConvLSTMwith Attention: A Comparative Analysis and Implementation

Description

 This dissertation explores the application of Convolutional Long Short-Term Memory
 (ConvLSTM) networks for video action recognition. ConvLSTM integrates the spatial
 feature extraction capabilities of CNNs with the temporal modeling strengths of LSTM
 networks, enabling the model to capture both spatial and temporal dependencies within
 video data. The primary goal of this research is to develop an efficient, scalable model
 capable of recognizing actions in real-time. The UCF-101 dataset is used to evaluate the
 effectiveness of the proposed model, with performance compared against traditional CNN
 and LSTM approaches. Additionally, various preprocessing techniques and hyperparameter
 configurations are examined to understand their impact on model performance.

Files

Research.pdf

Files (1.5 MB)

Name Size Download all
md5:5261de9e9c16c19bbef444c6bb67ab5f
1.5 MB Preview Download

Additional details

Related works

Cites
Publication: arXiv:1705.07750 (arXiv)

Dates

Available
2025-08