Toloker Graph: Interaction of Crowd Annotators
Contributors
Data collectors:
Related persons:
- 1. Toloka
Description
The graph contains 11,758 nodes and 519,000 edges representing interactions between crowd annotators on a project labeled on the Toloka crowdsourcing platform (see the Toloka overview for the details on the used terminology).
Each node represents an individual annotator; nodes are provided with four numerical and three categorical features. An edge is drawn between a pair of annotators if they annotated the same task. Also, each node is provided with a label showing whether the annotator was banned on this project, or not.
Nodes are stored in the nodes.tsv file in the TSV format of the following structure:
id: unique identifier of the annotatorapproved_rate: percentage of the approved labels of this annotatorskipped_rate: percentage of the skipped tasks of this annotatorexpired_rate: percentage of the expired tasks of this annotatorrejected_rate: percentage of the rejected labels of this annotatoreducation: level of education as self-reported by this annotator (none,basic,middle,high)english_profile: knowledge of English as self-reported by this annotator (0for no,1for yes)english_tested: whether the annotator passed the Toloka language test for English (0for no,1for yes)banned: whether the annotator was banned on this project (0for no,1for yes)
The *_rate attributes should sum up to 1.
Edges are stored in the edges.tsv file in the TSV format of the following structure:
source: source identifier of the annotatortarget: target identifier of the annotator
As the graph is undirected, source and target can be interchanged for the given pair of nodes.
Files
Files
(5.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:4220f9624aa6e418ba7908c685202e17
|
5.3 MB | Download |
|
md5:9bc8ccf2d2c5170b436b47b154a640a3
|
527.2 kB | Download |
Additional details
Related works
- Is cited by
- Conference paper: https://arxiv.org/abs/2302.11640 (URL)
- Is compiled by
- Other: https://toloka.ai/ (URL)
- Is identical to
- Dataset: https://github.com/Toloka/TolokerGraph (URL)
- Dataset: https://huggingface.co/datasets/toloka/TolokerGraph (URL)
- Is source of
- Software documentation: https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.datasets.HeterophilousGraphDataset.html#torch_geometric.datasets.HeterophilousGraphDataset (URL)