Published January 7, 2022 | Version 1.0
Dataset Open

H-Prop and H-Prop-News Propaganda Datasets in Hindi

  • 1. Symbiosis Institute of Technology (SIT), Symbiosis International (Deemed University), Pune
  • 2. DIT-Università di Bologna

Description

The H-Prop dataset contains 28,630 articles created by translating a portion of Proppy Corpus in Hindi. Each article is labeled as either “propagandistic” (positive class) or “non-propagandistic” (negative class). The labeling done indirectly in Proppy corpus using a technique known as distant supervision is retained. 

The H-Prop-News dataset contains 5,500 Hindi News articles collected from 30+ prominent Hindi News websites. Each article is labeled as either “propagandistic” (positive class) or “non-propagandistic” (negative class). The labeling was done by human annotators and the inter-annotator agreement using Cohen’s Kappa measure observed is 0.81.

## Data format

We provide the H-Prop dataset in three tsv files, including training, testing and validation partitions. The H-Prop-News dataset is provided in csv files including training, testing and validation partitions.

Each line represents one article in H-Prop dataset with the following information:

1. article_text: the text of the article translated from Proppy corpus.
2. propaganda_label: label for articles retained from Proppy corpus.

Each line represents one article in H-Prop-News dataset with the following information:

1. news_website: Name of the news source website
2. article_url: the direct URL for the published article in its source website
3. news_headline: news headline
4. article_text: the text of the article retrieved via parsehub tool
5. propaganda_label: label for articles

## About

The H-Prop dataset was translated using IBM Watson Language Translator. 

## Credit

Please cite the dataset as:
[HProp-News] Deptii Chaudhari, Ambika Pawar, and Alberto Barrón-Cedeño. 2022. H-Prop and H-Prop-News: Computational Propaganda Datasets in Hindi. doi: 10.5281/zenodo.5828240

## Authors

Deptii Chaudhari;
Ambika Pawar;
Alberto Barrón-Cedeno

Files

H-Prop-News_Test_1.0.csv

Files (309.6 MB)

Name Size Download all
md5:b530491c5d68e65e99790d33b681309e
5.7 MB Preview Download
md5:e74921616be2ec0b16cfd384822a1320
24.8 MB Preview Download
md5:a8f7c7919d93a8baf5ebc72ae8205a08
2.9 MB Preview Download
md5:6b80f9dd44655bc68a1f98d85eebea05
61.2 MB Download
md5:beeaaea90bbbd78e1166168e4b03a76c
180.4 MB Download
md5:fb824010a6e2211b28d538424e7aca53
34.5 MB Download
md5:6e71723d13680f8b970b6a5fc358aff4
2.1 kB Preview Download