Zero-shot Transfer Accuracy of CLIP-TD vs Fine-tuned CLIP in Domain-shifted Vision-Language Tasks
Description
Transfer learning enables the sharing of common knowledge among models for a variety of downstream tasks, but traditional methods suffer in limited training data settings and produce narrow models incapable of effectively generalizing under distribution shifts. Foundation models have recently demonstrated impressive zero-shot inference capabilities and robustness under distribution shifts. However, zero-shot evaluation for these models has been predominantly confined to benchmarks with simple distribution shifts, limiting our understanding of their effectiveness under the more realistic shifts
Research goal: How does CLIP-TD's zero-shot transfer accuracy on domain-shifted vision-language tasks compare to standard CLIP fine-tuning methods?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.7/10.
Notes
Files
paper.pdf
Files
(77.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:3a351757496d4abff05306d670f5872a
|
77.6 kB | Preview Download |