Auxiliary Objective Variants in Video-JEPA for Downstream Task Performance

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20668846

Published June 12, 2026 | Version v1

Report Open

Auxiliary Objective Variants in Video-JEPA for Downstream Task Performance

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an e

Research goal: How do different auxiliary objective variants in Video-JEPA affect downstream task performance when fine-tuned on the UCF-101 and Something-Something V2 benchmarks, measured by classification accuracy?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.2/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.2/10.

Files

paper.pdf

Files (74.2 kB)

Name	Size	Download all
paper.pdf md5:d9cbb73ac1387db45a29c5c0d1c08d70	74.2 kB	Preview Download

	All versions	This version
Views	3	3
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Auxiliary Objective Variants in Video-JEPA for Downstream Task Performance

Authors/Creators

Description

Notes

Files

paper.pdf

Files (74.2 kB)