Published April 9, 2026 | Version v2
Dataset Open

Human-in-the-Loop Crowdsourced Annotation Dataset for Ukrainian Folk Art with Reproducible Jupyter Notebooks

  • 1. CrowdHeritage
  • 2. Datoptron
  • 3. ROR icon Europeana Foundation

Description

This record contains the open datasets of the pilot "Human-in-the-Loop Crowdsourced Annotation of Ukrainian Folk Art", developed by Web2Learn.

Particularly, it contains: 

-Spreadsheet with 5946 descriptive tags (annotations) including AI- and human-generated annotations, validated by humans and screened against the DE-BIAS vocabulary for potentially contentious language.

It includes 6 columns:

  • created (date the annotation was created)
  • value (the annotation)
  • europeana_id (the unique identifier of the corresponding record in Europeana that the annotation describes),
  • upvotes (number of positive validations received from contributors)
  • downvotes (number of negative validations / rejections received from contributors)
  • recommendation (indicates whether the annotation was accepted or rejected based on upvote/downvote predominance). 

In row 4125, the annotation “slave” was identified through the DE-BIAS vocabulary screening as problematic and manually revised to “enslaved person” as per recommendation.

-Spreadsheet with an anonymised list of 69 human contributors who participated in the annotation campaign.

It includes 6 columns:

  • number (numerical identifier assigned to each anonymous contributor)
  • annotated records (total number of records the contributor annotated)
  • total user contributions (total number of actions performed by the contributor)
  • inserted tags (number of new descriptive tags created by the contributor)
  • upvotes (number of positive validation votes submitted by the contributor)
  • downvotes  (number of negative validation votes submitted by the contributor).

One user entry was manually removed as it represented the AI-generated contribution and not a human contributor.


Explore the full pilot on GitHub, with a workflow layout designed to support reproducibility, with helpful notes, detailed README files for each step, machine-readable data output files and end-to-end executable code provided through Jupyter Notebooks: https://github.com/Web2LearnEU/AISTER-Crowdsourcing-Pilot

The pilot serves as an open repository for digital humanities research, freely available for reproduction, adaptation, and reuse by scholars, students, and educators. It is also open for creative reuses.

The pilot is implemented within the framework of the AISTER project. It operationalises and analyses a human-in-the-loop (HITL) crowdsourcing framework for metadata enrichment in Europeana collections. The objective is to enhance the accessibility and discoverability of Ukrainian ethnographic heritage by improving the quality of descriptive metadata combining artificial intelligence tools (natural language processing, computer vision) and human participation, while contributing to a better understanding in HITL approaches to AI-assisted metadata generation in cultural heritage. 
The pilot includes a crowdsourcing campaign, set up on the CrowdHeritage platform, maintained by Datoptron. Participants are invited to browse images from the ethnographic collection of the Krovets Online Museum of Traditional Art of Ukraine on Europeana, which includes more than 300 folk art paintings depicting scenes from everyday rural life and religious themes. By reviewing keywords automatically generated with computational methods, participants corrected terms, rejected inaccurate ones, and added additional keywords by recognising scenes, objects and figures.

The repository contains the complete workflow with all activity captured, including almost 55,000 annotation marks of AI-generated and crowdsourced content of 70 contributors:

  1. Automatically generated annotations (description tags) for artefacts on Europeana using AI tools (natural language processing, computer vision) and Europeana APIs, and
  2. Human-in-the-loop crowdsourced annotations on the CrowdHeritage platform to validate the AI-generated content, also enabling participants to contribute additional user-generated annotation.


Disclaimer
Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them.

Files

README-.md

Files (186.1 kB)

Name Size Download all
md5:17d56a5ae9fd4353a01bc57c3c02f614
10.2 kB Download
md5:ba98b7a4a444f65c0a5777b0f04006e4
164.1 kB Download
md5:7cc4da424fdcf6f342eae4e9eb240b3a
11.8 kB Preview Download

Additional details

Related works

Funding

Erasmus+
AI-enabled Citizen Participation in University-driven Ukrainian Cultural Heritage Safeguarding 000290738

Dates

Submitted
2026-04-09