A Dataset for Troll Classification of Tamil Memes
- 1. National University of Ireland Galway
Description
Social media are interactive platforms that facilitate the creation or sharing of information, ideas or other forms of expression among people. This exchange is not free from offensive, trolling or malicious contents targeting users or communities. One way of trolling is by making memes, which in most cases combines an image with a concept or catchphrase. The challenge of dealing with memes is that they are region-specific and their meaning is often obscured in humour or sarcasm. To facilitate the computational modelling of trolling in the memes for Indian languages, we created a meme dataset for Tamil (TamilMemes). We annotated and released the dataset containing suspected trolls and not-troll memes. In this paper, we use the a image classification to address the difficulties involved in the classification of troll memes with the existing methods. We found that the identification of a troll meme with such an image classifier is not feasible which has been corroborated with precision, recall and F1-score.
The internet has facilitated its user-base with a platform to communicate and express their views without any censorship. On the other hand, this freedom of expression or free speech can be abused by its user or a troll to demean an individual or a group. Demeaning people based on their gender, sexual orientation, religious believes or any other characteristics –trolling– could cause great distress in the online community. Hence, the content posted by a troll needs to be identified and dealt with before causing any more damage. Amongst all the forms of troll content, memes are most prevalent due to their popularity and ability to propagate across cultures. A troll uses a meme to demean, attack or offend its targetted audience. In this shared task, we provide a resource (TamilMemes) that could be used to train a system capable of identifying a troll meme in the Tamil language. In our TamilMemes dataset, each meme has been categorized into either a “troll” or a “not_troll” class. Along with the meme images, we also provided the Latin transcripted text from memes. We received 10 system submissions from the participants which were evaluated using the weighted average F1-score. The system with the weighted average F1-score of 0.55 secured the first rank.
@inproceedings{suryawanshi-etal-2020-tamil-meme,
title = "A Dataset for Troll Classification of {Tamil} Memes",
author = "Suryawanshi, Shardul and
Chakravarthi, Bharathi Raja and
Verma, Pranav and
Arcan, Mihael and
McCrae, John P and
Buitelaar, Paul",
booktitle = "Proceedings of the 5th Workshop on Indian Language Data Resource and Evaluation (WILDRE-5)",
month = May,
year = "2020",
address = "Marseille, France",
publisher = "European Language Resources Association (ELRA)"
}
@inproceedings{suryawanshi-chakravarthi-2021-findings,
title = "Findings of the Shared Task on Troll Meme Classification in {T}amil",
author = "Suryawanshi, Shardul and
Chakravarthi, Bharathi Raja",
booktitle = "Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages",
month = apr,
year = "2021",
address = "Kyiv",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2021.dravidianlangtech-1.16",
pages = "126--132",
abstract = "The internet has facilitated its user-base with a platform to communicate and express their views without any censorship. On the other hand, this freedom of expression or free speech can be abused by its user or a troll to demean an individual or a group. Demeaning people based on their gender, sexual orientation, religious believes or any other characteristics {--}trolling{--} could cause great distress in the online community. Hence, the content posted by a troll needs to be identified and dealt with before causing any more damage. Amongst all the forms of troll content, memes are most prevalent due to their popularity and ability to propagate across cultures. A troll uses a meme to demean, attack or offend its targetted audience. In this shared task, we provide a resource (TamilMemes) that could be used to train a system capable of identifying a troll meme in the Tamil language. In our TamilMemes dataset, each meme has been categorized into either a {``}troll{''} or a {``}not{\_}troll{''} class. Along with the meme images, we also provided the Latin transcripted text from memes. We received 10 system submissions from the participants which were evaluated using the weighted average F1-score. The system with the weighted average F1-score of 0.55 secured the first rank.",
}
Files
Tamil_troll_memes-dataset.zip
Files
(326.0 MB)
Name | Size | Download all |
---|---|---|
md5:f2c05d9540dae47d54a30b4508cb60cd
|
326.0 MB | Preview Download |