Toloker Graph: Interaction of Crowd Annotators

Likhobaba, Daniil; Pavlichenko, Nikita; Ustalov, Dmitry

doi:10.5281/zenodo.7620796

Published February 8, 2023 | Version graph

Dataset Open

Toloker Graph: Interaction of Crowd Annotators

1. Toloka

Contributors

Data collector (4):

Related person (2):

1. Toloka

The graph contains 11,758 nodes and 519,000 edges representing interactions between crowd annotators on a project labeled on the Toloka crowdsourcing platform (see the Toloka overview for the details on the used terminology).

Each node represents an individual annotator; nodes are provided with four numerical and three categorical features. An edge is drawn between a pair of annotators if they annotated the same task. Also, each node is provided with a label showing whether the annotator was banned on this project, or not.

Nodes are stored in the nodes.tsv file in the TSV format of the following structure:

id: unique identifier of the annotator
approved_rate: percentage of the approved labels of this annotator
skipped_rate: percentage of the skipped tasks of this annotator
expired_rate: percentage of the expired tasks of this annotator
rejected_rate: percentage of the rejected labels of this annotator
education: level of education as self-reported by this annotator (none, basic, middle, high)
english_profile: knowledge of English as self-reported by this annotator (0 for no, 1 for yes)
english_tested: whether the annotator passed the Toloka language test for English (0 for no, 1 for yes)
banned: whether the annotator was banned on this project (0 for no, 1 for yes)

The *_rate attributes should sum up to 1.

Edges are stored in the edges.tsv file in the TSV format of the following structure:

source: source identifier of the annotator
target: target identifier of the annotator

As the graph is undirected, source and target can be interchanged for the given pair of nodes.

Files

Files (5.8 MB)

Name	Size	Download all
edges.tsv md5:4220f9624aa6e418ba7908c685202e17	5.3 MB	Download
nodes.tsv md5:9bc8ccf2d2c5170b436b47b154a640a3	527.2 kB	Download

Additional details

Is cited by: Conference paper: https://arxiv.org/abs/2302.11640 (URL)
Is compiled by: Other: https://toloka.ai/ (URL)
Is identical to: Dataset: https://github.com/Toloka/TolokerGraph (URL); Dataset: https://huggingface.co/datasets/toloka/TolokerGraph (URL)
Is source of: Software documentation: https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.datasets.HeterophilousGraphDataset.html#torch_geometric.datasets.HeterophilousGraphDataset (URL)

	All versions	This version
Views	904	902
Downloads	128	128
Data volume	417.9 MB	417.9 MB

Toloker Graph: Interaction of Crowd Annotators

Authors/Creators

Contributors

Data collector (4):

Related person (2):

Description

Files

Files (5.8 MB)

Additional details

Related works