Published September 1, 2015 | Version v1
Report Open

Aggregating Labels in Crowdsourcing Data

  • 1. CERN openlab Summer Student
  • 2. Summer Student Supervisor

Description

Project Specification

Crowdsourcing is gaining popularity in academia with the launch of crowdsourcing platforms such as Crowdcrafting [Lombraña, 2015] and GeoTagX [UNOSAT, 2015]. There have been a number of proposed algorithms for the aggregation of true labels and a confusion matrix from crowdsourced labels for ordinal, nominal and binary labels.

The work here consists of an implementation of the Dawid Skene [Dawid 1979] adaptation of the Expectation Maximization algorithm [Dempster 1977] for the extraction of true labels from binary data.

The second part of the project is the planning of the 2015 edition of an open-source promoting coding event for CERN Summer Students called the CERN Webfest. 

Abstract

Crowdsourcing is a method in which multiple individuals with possibly no prior knowledge in the field solve a number of tasks. The solutions given by the individuals are then aggregated to infer the true solution from the common knowledge of the individuals.

In this paper we give a short overview of some of the aggregation methods and hybrid crowdsourcing solutions used. We then implement the label aggregation model proposed by Dawid and Skene [Dawid 1979] for open source and open science websites such as Crowdcrafting.org [Lombraña, 2015] and the UNOSAT project GeoTagX [UNOSAT, 2015].

Finally we also discuss the organization and results of the CERN Webfest 2015, a hackathon for CERN Summer Students.

Files

Files (606.7 kB)

Name Size Download all
md5:8cb2b622d9f4691215bc60e0cd7f87cc
606.7 kB Download