Shifting the Frame: The Labors of ImageNet and AI Data
Description
Artificial intelligence (AI) technologies like ChatGPT, Stable Diffusion, and LaMDA have led a multi-billion dollar industry in generative AI, and a potentially much larger industry in AI more generally. However, these technologies would not exist were it not for the immense amount of data mined to make them run, low-paid and exploited annotation labor required for labeling and content moderation, and questionable arrangements around consent to use these data. Although datasets used to train and evaluate commercial models are often obscured from view under the shroud of trade secrecy, we can learn a great deal about these systems by interrogating certain publicly available datasets which are considered foundational in academic AI research.
In this talk, I investigate a single dataset, ImageNet. It is not an understatement to say that without ImageNet, we may not have the current wave of deep learning techniques which power nearly all modern AI technologies. I begin from three vantage points: the histories of ImageNet from the perspective of its curators and its linguistic predecessor WordNet, the testimony of the data annotators which labeled millions of ImageNet images, and the data subjects and the creators of the images within ImageNet. Academically, I situate this analysis within a larger theory and practice of infrastructure studies. Practically, I point to a vision for technology which is not based on practices of unrestricted data mining, exploited labor, and the use of images without meaningful consent.
Files
Alex Hanna - csv,conf 2023.pdf
Files
(5.5 MB)
Name | Size | Download all |
---|---|---|
md5:0a8e678375f858f7f71a084f07ee653e
|
5.5 MB | Preview Download |