Zero-shot Transfer Accuracy of CLIP-TD vs Fine-tuned CLIP in Domain-shifted Vision-Language Tasks

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20661652

Published June 12, 2026 | Version v1

Report Open

Zero-shot Transfer Accuracy of CLIP-TD vs Fine-tuned CLIP in Domain-shifted Vision-Language Tasks

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Transfer learning enables the sharing of common knowledge among models for a variety of downstream tasks, but traditional methods suffer in limited training data settings and produce narrow models incapable of effectively generalizing under distribution shifts. Foundation models have recently demonstrated impressive zero-shot inference capabilities and robustness under distribution shifts. However, zero-shot evaluation for these models has been predominantly confined to benchmarks with simple distribution shifts, limiting our understanding of their effectiveness under the more realistic shifts

Research goal: How does CLIP-TD's zero-shot transfer accuracy on domain-shifted vision-language tasks compare to standard CLIP fine-tuning methods?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.7/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.7/10.

Files

paper.pdf

Files (77.6 kB)

Name	Size	Download all
paper.pdf md5:3a351757496d4abff05306d670f5872a	77.6 kB	Preview Download

	All versions	This version
Views	1	1
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Zero-shot Transfer Accuracy of CLIP-TD vs Fine-tuned CLIP in Domain-shifted Vision-Language Tasks

Authors/Creators

Description

Notes

Files

paper.pdf

Files (77.6 kB)