Published December 17, 2019 | Version 1.0
Software Open

Data and analysis code for "Garbage In, Garbage Out?" paper in Proc ACM FAT* 2020

Authors/Creators

  • 1. UC-Berkeley @BIDS

Description

This repository contains data and analysis code for the paper "Garbage In, Garbage Out? Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled Training Data Comes From?", which is to appear in the Proceedings of ACM FAT* 2020.

The Jupyter notebook data_analysis_viz.ipynb loads gigo_noscores_dataset_anon.csv, which has the final labels for each of the papers, including metadata about where they were published. Annotation information scores are calculated, and results are calculated and plotted. The notebook then exports gigo_final_dataset_anon.csv, which contains the same columns as gigo_noscores_dataset_anon.csv, but also includes information scores for each paper and some imputed metadata categories about publication type. 

We have chosen to de-identify the papers presented in this publicly-released dataset, and so papers are only referred to with a unique id. If you are interested in obtaining the identifying information for research purposes, please contact Stuart Geiger.

Files

staeiou/gigo-fat2020-1.0.zip

Files (971.4 kB)

Name Size Download all
md5:2e0fdf5c846f76f24f648f8afb0f47a9
971.4 kB Preview Download

Additional details

Related works