Scaling Pretraining Data and Few-Shot Learning in Self-Supervised Sequence Models
Description
Prior work on language models (LMs) shows that training on a large number of diverse tasks improves few-shot learning (FSL) performance on new tasks. We take this to the extreme, automatically extracting 413,299 tasks from internet tables - orders of magnitude more than the next-largest public datasets. Finetuning on the resulting dataset leads to improved FSL performance on Natural Language Processing (NLP) tasks, but not proportionally to dataset scale. In fact, we find that narrow subsets of our dataset sometimes outperform more diverse datasets. For example, finetuning on software document
Research goal: What is the impact of scaling pretraining dataset size on the few-shot learning capabilities of self-supervised sequence models across diverse modalities?
Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 7.5/10.
Notes
Files
paper.pdf
Files
(85.7 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:12cbc39d91264f4fe6ece00f38de1111
|
85.7 kB | Preview Download |