It's About Time: A View of Crowdsourced Data Before and During the Pandemic
Data attained through crowdsourcing have an essential role in the development of computer vision algorithms. Crowdsourced data might include reporting biases, since crowdworkers usually describe what is “worth saying” in addition to images’ content. We explore how the unprecedented events of 2020, including the unrest surrounding racial discrimination, and the COVID-19 pandemic, might be reflected in responses to an open-ended annotation task on people images, originally executed in 2018 and replicated in 2020. Analyzing themes of Identity and Health conveyed in workers’ tags, we find evidence that supports the potential for temporal sensitivity in crowdsourced data. The 2020 data exhibit more race-marking of images depicting non-Whites, as well as an increase in tags describing Weight. We relate our findings to the emerging research on crowdworkers’ moods. Furthermore, we discuss the implications of (and suggestions for) designing tasks on proprietary platforms, having demonstrated the possibility for additional, unexpected variation in crowdsourced data due to significant events.