Published May 15, 2025 | Version v1
Dataset Open

RoMEMES v2

  • 1. Research Institute for Artificial Intelligence "Mihai Drăgănescu", Romanian Academy
  • 2. ROR icon Alexandru Ioan Cuza University

Description

RoMEMESv2 is a dataset of Romanian language memes, collected from public social media platforms. The dataset was manually annotated with:

  • associated text in Romanian language;
  • image complexity;
  • polarity;
  • sentiment;
  • political content.

In addition, the dataset contains associated metadata and the text part was automatically annotated in the RELATE platform with part-of-speech tags, lemmas, and dependency parsing.

Files and folders in this dataset:

  • metadata.tsv - contains metadata and annotations; the first column is the file ID;
  • LICENSE - contains licensing information;
  • README - is this file;
  • images - folder containing image files, following the file naming convention ID.extension, where extension is the original file extension (sometimes this may not correspond with the mime/type of the file, as indicated in metadata.tsv);
  • text - folder containing text files, following the file naming convention ID.txt; this is only the message from the meme, without additional text (text from logos, unrelated text, etc.);
  • conllup - folder containing automatic text annotations for the files in the "text" folder, created in the RELATE platform, following the file naming convention ID.conllup;
  • text_complete - folder with the complete text extracted from the meme (contains additional text which may not be directly related to the meme message);
  • conllup_complete - folder containing automatic text annotations for the files in the "text_complete" folder, created in the RELATE platform, following the file naming convention ID.conllup.


A first version of this corpus was released here: RoMEMES https://doi.org/10.5281/zenodo.13120215
The current version has more data and the additional text_complete and conllup_complete folders. These are new levels of annotation, which were not available in the initial release. To maintain compatibility with existing code, the rest of the data is in the same format. Currently not all memes have the text_complete annotation. In case a text file is missing in one of the folders, use the text from the other folder.

 

Files

romemes_v2.zip

Files (185.6 MB)

Name Size Download all
md5:4140998417957305bb7ae4f76739aac4
185.6 MB Preview Download