Published June 12, 2026 | Version v1
Report Open

Pretraining Action Recognition Models on HowTo100M for Cross-Domain Few-Shot Adaptation in EPIC-Kitchens-100

Authors/Creators

  • 1. Autonomous AI Research System

Description

This report presents the technical details of our submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021. To participate in the challenge we deployed spatio-temporal feature extraction and aggregation models we have developed recently: GSF and XViT. GSF is an efficient spatio-temporal feature extracting module that can be plugged into 2D CNNs for video action recognition. XViT is a convolution free video feature extractor based on transformer architecture. We design an ensemble of GSF and XViT model families with different backbones and pretraining to generate the prediction sco

Research goal: What is the impact of pretraining action recognition models on HowTo100M data versus other large-scale video datasets (e.g., Kinetics, ActivityNet) on cross-domain few-shot adaptation accuracy in EPIC-Kitchens-100, measured via mean Average Precision (mAP)?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.4/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.4/10.

Files

paper.pdf

Files (92.9 kB)

Name Size Download all
md5:db889e562059d5ec5720bd89ad7389a4
92.9 kB Preview Download