How does Flamingo's zero-shot or few-shot generalization ability scale with increasing model size or different
Description
Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence. However, building general-purpose vision-language models is challenging due to the rich input distributions and task diversity resulting from the additional visual input. Although vision-language pretraining has been widely studied, vision-language instruction tuning remains under-explored. In this paper, we conduct a systematic and comprehensive study on vision-language instruction tuning based on the pretrained BLIP-2 models. We gather 26 publicly available
Research goal: How does Flamingo's zero-shot or few-shot generalization ability scale with increasing model size or different pretraining datasets in multimodal tasks?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 9.3/10.
Notes
Files
paper.pdf
Files
(84.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:dd0c145c7bfc1cba6930599c441285a6
|
84.9 kB | Preview Download |