Presentation Open Access

How to Feed Your Robot: Building and Maintaining Open Machine Learning Datasets

Evan Tachovsky

While algorithms and computing power get all the press, the special sauce behind many recent machine learning breakthroughs are meticulously labeled training data. Developing and maintaining these data sets as public goods is both an art and a science. In this talk I'll present a new set of best practices gleaned from interview with ~20 data set builders, maintainers, and funders. Topics include: encouraging collaboration between rival data teams; finding and addressing ethical issues with crowd labeling; launching competitions to spur data set use; and revenue generation models for sustainability.

Files (856.8 kB)
Name Size
2019-05-08 How to Feed Your Robot.pdf
md5:df657eaa028196dca4899bd5949fef6d
856.8 kB Download
37
15
views
downloads
All versions This version
Views 3737
Downloads 1515
Data volume 12.9 MB12.9 MB
Unique views 3636
Unique downloads 1515

Share

Cite as