Aggregating Labels in Crowdsourcing Data

doi:10.5281/zenodo.31866

Published September 1, 2015 | Version v1

Report Open

Aggregating Labels in Crowdsourcing Data

1. CERN openlab Summer Student
2. Summer Student Supervisor

Project Specification

Crowdsourcing is gaining popularity in academia with the launch of crowdsourcing platforms such as Crowdcrafting [Lombraña, 2015] and GeoTagX [UNOSAT, 2015]. There have been a number of proposed algorithms for the aggregation of true labels and a confusion matrix from crowdsourced labels for ordinal, nominal and binary labels.

The work here consists of an implementation of the Dawid Skene [Dawid 1979] adaptation of the Expectation Maximization algorithm [Dempster 1977] for the extraction of true labels from binary data.

The second part of the project is the planning of the 2015 edition of an open-source promoting coding event for CERN Summer Students called the CERN Webfest.

Abstract

Crowdsourcing is a method in which multiple individuals with possibly no prior knowledge in the field solve a number of tasks. The solutions given by the individuals are then aggregated to infer the true solution from the common knowledge of the individuals.

In this paper we give a short overview of some of the aggregation methods and hybrid crowdsourcing solutions used. We then implement the label aggregation model proposed by Dawid and Skene [Dawid 1979] for open source and open science websites such as Crowdcrafting.org [Lombraña, 2015] and the UNOSAT project GeoTagX [UNOSAT, 2015].

Finally we also discuss the organization and results of the CERN Webfest 2015, a hackathon for CERN Summer Students.

Files

Files (606.7 kB)

Name	Size	Download all
SummerStudentReport-MartiaPriisalu.doc md5:8cb2b622d9f4691215bc60e0cd7f87cc	606.7 kB	Download

	All versions	This version
Views	103	101
Downloads	2,690	2,690
Data volume	1.7 GB	1.7 GB

Aggregating Labels in Crowdsourcing Data

Creators

Description

Files

Files (606.7 kB)